A persistent barrier to enterprise AI adoption is Data Privacy. Organizations cannot safely route customer complaint emails, signed NDAs, or proprietary CRM records through public APIs hosted by OpenAI or Anthropic. The compliance risk is real and specific.
The operational cost problem compounds this: “token burn” for 24/7 automated workflows (ticket classification, scheduled processing) scales with load and has no ceiling.
The solution is an Air-Gapped AI stack: self-hosting both the automation platform (n8n) and the language model (Ollama) entirely within the organization’s own infrastructure.
This guide covers the full setup: Docker Compose, Agent configuration, Memory, Tools, and RAG integration.
1. System Architecture
n8n is not an Autonomous Agent framework like LangGraph or AutoGen. It is an Agentic Workflow platform: the LLM decides which Tool to call from a pre-defined set, rather than following a hardcoded sequence. That is a meaningful step beyond standard automation, but it does not include autonomous planning or sub-agent spawning.
Example workflow:
- Ingests customer complaint Emails via IMAP protocols.
- The AI Agent processes the Email context, autonomously deciding to invoke the Search CRM Tool to verify the sender’s identity and contract status.
- The AI Agent subsequently invokes the Query Vector DB Tool to extract the most recent relevant warranty policies.
- The AI Agent drafts an appropriate response and executes the Send Email Tool to reply, or autonomously generates a Jira Escalation Ticket.
graph TD
subgraph s1 ["Activation Layer (Triggers)"]
A[Email/Webhook Input]
Schedule[Cron Job Schedules]
end
subgraph s2 ["n8n Workflow Engine (Air-Gapped)"]
B{AI Agent Node}
subgraph s3 ["Execution Capabilities (Tools)"]
T1[Search CRM Tool]
T2[Query Vector DB Tool]
T3[Send Email Tool]
end
B -->|Invokes Tool 1| T1
B -->|Invokes Tool 2| T2
B -->|Invokes Tool 3| T3
end
subgraph s4 ["LLM Infrastructure (Local Component)"]
D[Ollama - Qwen/Llama 3]
end
A -->|Trigger Signal| B
Schedule -->|Trigger Signal| B
B <-->|ReAct Prompting & Cognitive Processing| D
style B fill:#3b82f6,stroke:#1d4ed8,stroke-width:2px,color:#fff
style D fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
Note: With Local LLMs (7B/8B), the Tool-Calling loop is less reliable than this diagram suggests. Smaller models frequently misidentify the correct Tool or skip reasoning steps. For production workloads where reliability matters, use a model with at least 13B parameters (e.g., Mixtral 8x7B, Llama 3 70B).
2. System Configuration & Docker Compose
Docker Compose is the recommended approach to containerize n8n, PostgreSQL, and Ollama within a unified, isolated internal network – ensuring all data stays off the public internet and the stack is reproducible across environments.
Server requirements: Linux VPS (Ubuntu 24.04), minimum 16GB RAM. GPU (Nvidia T4/A10G) is required to run larger models effectively.
Initialize the docker-compose.yml configuration:
# Requires Docker Compose V2 (Docker 23+). The 'version' field is deprecated and not needed.
services:
postgres:
image: postgres:16
restart: unless-stopped
environment:
- POSTGRES_USER=n8n
- POSTGRES_PASSWORD=n8n_secure_password
- POSTGRES_DB=n8n
volumes:
- db_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U n8n"]
interval: 10s
timeout: 5s
retries: 5
networks:
- ai-network
n8n:
image: docker.n8n.io/n8nio/n8n
restart: unless-stopped
ports:
- "5678:5678"
environment:
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_PORT=5432
- DB_POSTGRESDB_DATABASE=n8n
- DB_POSTGRESDB_USER=n8n
- DB_POSTGRESDB_PASSWORD=n8n_secure_password
- N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
volumes:
- n8n_data:/home/node/.n8n
depends_on:
postgres:
condition: service_healthy
networks:
- ai-network
ollama:
image: ollama/ollama:latest
restart: unless-stopped
ports:
- "11434:11434"
environment:
# Required: allows other containers on the network to connect to Ollama
- OLLAMA_HOST=0.0.0.0
volumes:
- ollama_data:/root/.ollama
healthcheck:
test: ["CMD-SHELL", "curl -sf http://localhost:11434/api/tags || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
# Uncomment the block below if the VPS instance includes Nvidia GPU provisioning
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
networks:
- ai-network
volumes:
db_data:
n8n_data:
ollama_data:
networks:
ai-network:
driver: bridge
Execute the Docker cluster deployment (requires Docker Compose V2):
docker compose up -d
Once all containers reach an Up state, pull the target AI model into the Ollama container:
# Use pull (not run) to download the model without opening an interactive shell
docker exec -it ollama ollama pull llama3:8b
# Alternatively, deploy qwen2.5:7b for enhanced multi-lingual/code parsing contexts.
3. Configuring the AI Agent in n8n
Access the n8n Web UI at http://<VPS_IP>:5678 and follow these steps.
3.1 Connecting Ollama as the Language Model
Within the AI Agent node, locate the Chat Model section, click “Add”, and select the Ollama Chat Model node.
Create a new Credential and set the Base URL to:
http://ollama:11434
Because n8n and Ollama share the ai-network Docker bridge, all traffic stays internal – no data transits the public internet. This is the core security advantage of this architecture.
Note: If Ollama and n8n run on separate hosts (not the same Docker Compose stack), use http://host.docker.internal:11434 instead.
3.2 Memory Node & System Prompt Engineering
Drag the AI Agent node onto the Workflow canvas. Under the Memory section, select the type appropriate for your use case:
- Simple Memory: Stores context in the current execution’s RAM. Suitable for chatbot interactions with multiple consecutive turns within a single session. Not suitable for event-driven workflows where each trigger is a discrete event (e.g., each incoming email is a new execution – context is fully reset between triggers).
- Postgres Memory (recommended for production): Persists conversation history to the database, maintaining context across separate trigger executions. Uses the existing
postgresservice already defined in the Docker Compose stack in Section 2 – no additional infrastructure required.
Select the model (e.g., llama3:8b).
Compact Local models (7B/8B) are more susceptible to Hallucinations than cloud models when parsing Tool schemas. Apply these three techniques together to constrain the behavior:
- Strict System Prompt – define the operational boundary explicitly
- Low Temperature – set
temperature: 0in the node config to reduce randomness - Output Validation – add an
IforCodenode downstream to validate the Agent’s output format before triggering any write operations
You are an Internal Customer Support Assistant operating for Nick Nguyen Co.
Your explicit objective is to analyze requests and UTILIZE DESIGNATED TOOLS TO EXTRACT FACTUAL INFORMATION.
Under ABSOLUTELY NO CIRCUMSTANCES should you fabricate corporate policies, pricing structures, or client context if the data does not actively return from Tool execution.
If a Tool execution fails, report the root error parameters immediately.
Consistently formulate your response utilizing the primary language detected in the User Input block.
3.3 Provisioning Execution Tools
An Agent without Tools is a text generator. Connect Tool nodes to the AI Agent via the “AI Tool” connection points:
- HTTP Request Tool: Allows the Agent to fire API requests targeting Hubspot/Jira.
- Calculator Tool: Enables the Agent to compute financial quotes from extracted variables.
- Code Tool (JavaScript): Allows the Agent to execute micro-scripts for string formatting or RegExp operations.
Critical Note: For compact LLMs to invoke Tools accurately, the Tool Name and Description fields must be highly specific. For instance: do not use Search_CRM; use Fetch_Client_Profile_Using_Email_Parameter. The LLM reads the Description to decide which Tool to call – vague descriptions lead to wrong Tool selection or missed calls.
3.4 Adding a Vector Store (RAG)
Instead of injecting entire policy documents into the Prompt (which overflows the Context Window on Local Models), connect the AI Agent node to a Vector Store (e.g., Qdrant or Pinecone). When a user queries “What is the standard refund period?”, the Agent autonomously queries the Vector Database, extracts the most semantically relevant paragraphs, processes the technical context, and formulates the precise email reply.
4. Limitations & When to Go Further
n8n is the right tool when the problem has clear structure with LLM reasoning: a defined trigger, a fixed tool pool (under 15-20 tools), and tasks that do not require deep multi-step reasoning chains. Use cases like support email automation, internal RAG Q&A, and ticket triage all fall within this scope.
Consider a code-first agent framework (LangGraph, CrewAI, AutoGen) when:
- The task requires multi-agent coordination: a manager AI orchestrating specialized sub-agents
- The workflow needs complex recursive loops – the agent must evaluate its own output and retry with a different strategy
- Long-horizon stateful planning is needed beyond simple conversation history
- The tool pool is large and requires dynamic tool discovery rather than pre-wired nodes
A common hybrid architecture in production: use LangGraph (Python) as the reasoning engine, and n8n as the orchestration layer – handling event triggers, SaaS integrations, human-in-the-loop approvals, and audit logging.
5. Cost Scalability & Conclusion
With this stack deployed on a 16GB RAM VPS, infrastructure cost is fixed by compute tier – not per-token billing. Reference points: Hetzner CPX41 (8 vCPU, 16GB RAM) runs ~$23/month; Vultr 16GB runs ~$80/month. Scaling to handle higher load requires upgrading the compute tier, but there is no incremental API cost per request.
By contrast, routing the same workload through OpenAI gpt-4o or Anthropic Claude 3.5 introduces variable billing exposure – a traffic spike or large data ingestion event can quickly accumulate hundreds of dollars in a single day, without accounting for the compliance risk of transmitting customer data to third-party cloud infrastructure.
Maintaining a self-hosted AI ecosystem (Air-Gapped AI) establishes control over the organization’s core data pathways while capitalizing on the productivity gains of autonomous automation. Transitioning to Cloud AI APIs should only occur after your engineering team has fully evaluated the cost and Data Compliance trade-offs.
For a broader look at how Agentic AI is changing software development workflows, see: Agentic AI in Software Development.