Run large language models locally inside your nSelf stack. One command installs the Ollama container, wires it to the AI plugin, and lets you pull any Ollama-compatible model.
# Install the Ollama container and wire it up
nself ollama install
# Pull a model (downloads from ollama.com/library)
nself ollama pull llama3.2
# Check that everything is running
nself ollama statusnself ollama <SUBCOMMAND> [FLAGS]nself ollama integrates Ollama into your nSelf stack as a managed optional service. After installation the CLI setsOLLAMA_BASE_URL in your backend environment so that nself ai local and the AI plugin can route inference requests to your local models instead of a remote provider.
Ollama runs as a Docker container alongside your existing stack. GPU passthrough is automatic on Linux hosts with an NVIDIA or AMD GPU and the appropriate drivers installed. On macOS Apple Silicon, Ollama uses the Metal framework for acceleration.
Pull the Ollama Docker image, add the container definition to your stack, setOLLAMA_BASE_URL=http://ollama:11434 in the backend env, and rebuild Nginx routing. Idempotent — safe to run again after updates.
nself ollama install
# Pulling ollama/ollama:latest...
# Adding ollama service to docker-compose...
# Setting OLLAMA_BASE_URL=http://ollama:11434
# Rebuilding Nginx config...
# ✓ Ollama installed. Run: nself ollama pull <model>Pin a specific Ollama version:
nself ollama install --version 0.3.12Enable GPU passthrough explicitly:
nself ollama install --gpu nvidia # or: --gpu amdShow whether the Ollama container is running, the current OLLAMA_BASE_URL, and a list of locally available models with their sizes.
nself ollama status
# Container: running (ollama/ollama:latest)
# OLLAMA_BASE_URL: http://ollama:11434
# GPU: nvidia (detected)
#
# MODELS
# NAME SIZE MODIFIED
# llama3.2:latest 2.0 GB 2 hours ago
# mistral:latest 4.1 GB 3 days agoDownload a model from ollama.com/library into the Ollama container. Supports the same tag syntax as the Ollama CLI (model:tag).
nself ollama pull llama3.2
nself ollama pull mistral:7b
nself ollama pull codellama:13b-instruct
nself ollama pull nomic-embed-text # embedding model for vector searchThe model is immediately available for inference after the pull completes. No restart required.
Delete a model from local storage to reclaim disk space.
nself ollama remove mistral:latest
# ✓ mistral:latest removed (freed 4.1 GB)| Flag | Applies to | Type | Description |
|---|---|---|---|
--version | install | string | Pin a specific Ollama image tag (default: latest) |
--gpu | install | string | GPU backend: nvidia, amd, or none (default: auto-detect) |
--json | status | bool | Output status as JSON |
--quiet | pull | bool | Suppress progress bar during download |
Once Ollama is installed and a model is pulled, switch the AI plugin to local inference:
# Set the default model for local inference
nself ai local --model llama3.2
# Test a completion request
nself ai complete "Summarize the nSelf architecture in two sentences."
# Switch back to a remote provider
nself ai provider set openaiOLLAMA_BASE_URL — base URL for the Ollama API (set by nself ollama install; default: http://ollama:11434)OLLAMA_MODELS — Docker volume mount path for model storage (default: /root/.ollama)OLLAMA_NUM_GPU — number of GPU layers to offload (set automatically when --gpu is specified)nself ollama install
nself ollama pull llama3.2
nself ollama pull nomic-embed-text
nself ai local --model llama3.2 --embedding-model nomic-embed-text
nself ollama statusnself ollama pull llama3.2:3b-q4_0
nself ai local --model llama3.2:3b-q4_0nself ollama install --gpu nvidia
nself ollama pull llama3.2
nself ollama status
# GPU: nvidia (detected, 4 layers offloaded)