Building an Agentic Workflow with n8n and Local LLM (Ollama): Air-Gapped Enterprise Automation

A persistent barrier to enterprise AI adoption is Data Privacy. Organizations cannot safely route customer complaint emails, signed NDAs, or proprietary CRM records through public APIs hosted by OpenAI or Anthropic. The compliance risk is real and specific.

The operational cost problem compounds this: “token burn” for 24/7 automated workflows (ticket classification, scheduled processing) scales with load and has no ceiling.

The solution is an Air-Gapped AI stack: self-hosting both the automation platform (n8n) and the language model (Ollama) entirely within the organization’s own infrastructure.

This guide covers the full setup: Docker Compose, Agent configuration, Memory, Tools, and RAG integration.

1. System Architecture

n8n is not an Autonomous Agent framework like LangGraph or AutoGen. It is an Agentic Workflow platform: the LLM decides which Tool to call from a pre-defined set, rather than following a hardcoded sequence. That is a meaningful step beyond standard automation, but it does not include autonomous planning or sub-agent spawning.

Example workflow:

Ingests customer complaint Emails via IMAP protocols.
The AI Agent processes the Email context, autonomously deciding to invoke the Search CRM Tool to verify the sender’s identity and contract status.
The AI Agent subsequently invokes the Query Vector DB Tool to extract the most recent relevant warranty policies.
The AI Agent drafts an appropriate response and executes the Send Email Tool to reply, or autonomously generates a Jira Escalation Ticket.

graph TD
    subgraph s1 ["Activation Layer (Triggers)"]
        A[Email/Webhook Input]
        Schedule[Cron Job Schedules]
    end

    subgraph s2 ["n8n Workflow Engine (Air-Gapped)"]
        B{AI Agent Node}
        
        subgraph s3 ["Execution Capabilities (Tools)"]
            T1[Search CRM Tool]
            T2[Query Vector DB Tool]
            T3[Send Email Tool]
        end
        
        B -->|Invokes Tool 1| T1
        B -->|Invokes Tool 2| T2
        B -->|Invokes Tool 3| T3
    end

    subgraph s4 ["LLM Infrastructure (Local Component)"]
        D[Ollama - Qwen/Llama 3]
    end

    A -->|Trigger Signal| B
    Schedule -->|Trigger Signal| B
    B <-->|ReAct Prompting & Cognitive Processing| D
    
    style B fill:#3b82f6,stroke:#1d4ed8,stroke-width:2px,color:#fff
    style D fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff

Note: With Local LLMs (7B/8B), the Tool-Calling loop is less reliable than this diagram suggests. Smaller models frequently misidentify the correct Tool or skip reasoning steps. For production workloads where reliability matters, use a model with at least 13B parameters (e.g., Mixtral 8x7B, Llama 3 70B).

2. System Configuration & Docker Compose

Docker Compose is the recommended approach to containerize n8n, PostgreSQL, and Ollama within a unified, isolated internal network – ensuring all data stays off the public internet and the stack is reproducible across environments.

Server requirements: Linux VPS (Ubuntu 24.04), minimum 16GB RAM. GPU (Nvidia T4/A10G) is required to run larger models effectively.

Initialize the docker-compose.yml configuration:

# Requires Docker Compose V2 (Docker 23+). The 'version' field is deprecated and not needed.
services:
  postgres:
    image: postgres:16
    restart: unless-stopped
    environment:
      - POSTGRES_USER=n8n
      - POSTGRES_PASSWORD=n8n_secure_password
      - POSTGRES_DB=n8n
    volumes:
      - db_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U n8n"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - ai-network

  n8n:
    image: docker.n8n.io/n8nio/n8n
    restart: unless-stopped
    ports:
      - "5678:5678"
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=n8n
      - DB_POSTGRESDB_USER=n8n
      - DB_POSTGRESDB_PASSWORD=n8n_secure_password
      - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
    volumes:
      - n8n_data:/home/node/.n8n
    depends_on:
      postgres:
        condition: service_healthy
    networks:
      - ai-network

  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    ports:
      - "11434:11434"
    environment:
      # Required: allows other containers on the network to connect to Ollama
      - OLLAMA_HOST=0.0.0.0
    volumes:
      - ollama_data:/root/.ollama
    healthcheck:
      test: ["CMD-SHELL", "curl -sf http://localhost:11434/api/tags || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    # Uncomment the block below if the VPS instance includes Nvidia GPU provisioning
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]
    networks:
      - ai-network

volumes:
  db_data:
  n8n_data:
  ollama_data:

networks:
  ai-network:
    driver: bridge

Execute the Docker cluster deployment (requires Docker Compose V2):

docker compose up -d

Once all containers reach an Up state, pull the target AI model into the Ollama container:

# Use pull (not run) to download the model without opening an interactive shell
docker exec -it ollama ollama pull llama3:8b
# Alternatively, deploy qwen2.5:7b for enhanced multi-lingual/code parsing contexts.

3. Configuring the AI Agent in n8n

Access the n8n Web UI at http://<VPS_IP>:5678 and follow these steps.

3.1 Connecting Ollama as the Language Model

Within the AI Agent node, locate the Chat Model section, click “Add”, and select the Ollama Chat Model node. Create a new Credential and set the Base URL to: http://ollama:11434

Because n8n and Ollama share the ai-network Docker bridge, all traffic stays internal – no data transits the public internet. This is the core security advantage of this architecture.

Note: If Ollama and n8n run on separate hosts (not the same Docker Compose stack), use http://host.docker.internal:11434 instead.

3.2 Memory Node & System Prompt Engineering

Drag the AI Agent node onto the Workflow canvas. Under the Memory section, select the type appropriate for your use case:

Simple Memory: Stores context in the current execution’s RAM. Suitable for chatbot interactions with multiple consecutive turns within a single session. Not suitable for event-driven workflows where each trigger is a discrete event (e.g., each incoming email is a new execution – context is fully reset between triggers).
Postgres Memory (recommended for production): Persists conversation history to the database, maintaining context across separate trigger executions. Uses the existing postgres service already defined in the Docker Compose stack in Section 2 – no additional infrastructure required.

Select the model (e.g., llama3:8b).

Compact Local models (7B/8B) are more susceptible to Hallucinations than cloud models when parsing Tool schemas. Apply these three techniques together to constrain the behavior:

Strict System Prompt – define the operational boundary explicitly
Low Temperature – set temperature: 0 in the node config to reduce randomness
Output Validation – add an If or Code node downstream to validate the Agent’s output format before triggering any write operations

You are an Internal Customer Support Assistant operating for Nick Nguyen Co.
Your explicit objective is to analyze requests and UTILIZE DESIGNATED TOOLS TO EXTRACT FACTUAL INFORMATION.
Under ABSOLUTELY NO CIRCUMSTANCES should you fabricate corporate policies, pricing structures, or client context if the data does not actively return from Tool execution.
If a Tool execution fails, report the root error parameters immediately.
Consistently formulate your response utilizing the primary language detected in the User Input block.

3.3 Provisioning Execution Tools

An Agent without Tools is a text generator. Connect Tool nodes to the AI Agent via the “AI Tool” connection points:

HTTP Request Tool: Allows the Agent to fire API requests targeting Hubspot/Jira.
Calculator Tool: Enables the Agent to compute financial quotes from extracted variables.
Code Tool (JavaScript): Allows the Agent to execute micro-scripts for string formatting or RegExp operations.

Critical Note: For compact LLMs to invoke Tools accurately, the Tool Name and Description fields must be highly specific. For instance: do not use Search_CRM; use Fetch_Client_Profile_Using_Email_Parameter. The LLM reads the Description to decide which Tool to call – vague descriptions lead to wrong Tool selection or missed calls.

3.4 Adding a Vector Store (RAG)

Instead of injecting entire policy documents into the Prompt (which overflows the Context Window on Local Models), connect the AI Agent node to a Vector Store (e.g., Qdrant or Pinecone). When a user queries “What is the standard refund period?”, the Agent autonomously queries the Vector Database, extracts the most semantically relevant paragraphs, processes the technical context, and formulates the precise email reply.

4. Limitations & When to Go Further

n8n is the right tool when the problem has clear structure with LLM reasoning: a defined trigger, a fixed tool pool (under 15-20 tools), and tasks that do not require deep multi-step reasoning chains. Use cases like support email automation, internal RAG Q&A, and ticket triage all fall within this scope.

Consider a code-first agent framework (LangGraph, CrewAI, AutoGen) when:

The task requires multi-agent coordination: a manager AI orchestrating specialized sub-agents
The workflow needs complex recursive loops – the agent must evaluate its own output and retry with a different strategy
Long-horizon stateful planning is needed beyond simple conversation history
The tool pool is large and requires dynamic tool discovery rather than pre-wired nodes

A common hybrid architecture in production: use LangGraph (Python) as the reasoning engine, and n8n as the orchestration layer – handling event triggers, SaaS integrations, human-in-the-loop approvals, and audit logging.

5. Cost Scalability & Conclusion

With this stack deployed on a 16GB RAM VPS, infrastructure cost is fixed by compute tier – not per-token billing. Reference points: Hetzner CPX41 (8 vCPU, 16GB RAM) runs ~$23/month; Vultr 16GB runs ~$80/month. Scaling to handle higher load requires upgrading the compute tier, but there is no incremental API cost per request.

By contrast, routing the same workload through OpenAI gpt-4o or Anthropic Claude 3.5 introduces variable billing exposure – a traffic spike or large data ingestion event can quickly accumulate hundreds of dollars in a single day, without accounting for the compliance risk of transmitting customer data to third-party cloud infrastructure.

Maintaining a self-hosted AI ecosystem (Air-Gapped AI) establishes control over the organization’s core data pathways while capitalizing on the productivity gains of autonomous automation. Transitioning to Cloud AI APIs should only occur after your engineering team has fully evaluated the cost and Data Compliance trade-offs.

For a broader look at how Agentic AI is changing software development workflows, see: Agentic AI in Software Development.

#n8n#automation#llm#agentic-ai#ollama#self-hosted#security