nself ollama

Run large language models locally inside your nSelf stack. One command installs the Ollama container, wires it to the AI plugin, and lets you pull any Ollama-compatible model.

Quick start

# Install the Ollama container and wire it up
nself ollama install

# Pull a model (downloads from ollama.com/library)
nself ollama pull llama3.2

# Check that everything is running
nself ollama status

Synopsis

nself ollama <SUBCOMMAND> [FLAGS]

Description

nself ollama integrates Ollama into your nSelf stack as a managed optional service. After installation the CLI setsOLLAMA_BASE_URL in your backend environment so that nself ai local and the AI plugin can route inference requests to your local models instead of a remote provider.

Ollama runs as a Docker container alongside your existing stack. GPU passthrough is automatic on Linux hosts with an NVIDIA or AMD GPU and the appropriate drivers installed. On macOS Apple Silicon, Ollama uses the Metal framework for acceleration.

Subcommands

nself ollama install

Pull the Ollama Docker image, add the container definition to your stack, setOLLAMA_BASE_URL=http://ollama:11434 in the backend env, and rebuild Nginx routing. Idempotent — safe to run again after updates.

nself ollama install
# Pulling ollama/ollama:latest...
# Adding ollama service to docker-compose...
# Setting OLLAMA_BASE_URL=http://ollama:11434
# Rebuilding Nginx config...
# ✓ Ollama installed. Run: nself ollama pull <model>

Pin a specific Ollama version:

nself ollama install --version 0.3.12

Enable GPU passthrough explicitly:

nself ollama install --gpu nvidia   # or: --gpu amd

nself ollama status

Show whether the Ollama container is running, the current OLLAMA_BASE_URL, and a list of locally available models with their sizes.

nself ollama status
# Container:       running (ollama/ollama:latest)
# OLLAMA_BASE_URL: http://ollama:11434
# GPU:             nvidia (detected)
#
# MODELS
# NAME              SIZE      MODIFIED
# llama3.2:latest   2.0 GB    2 hours ago
# mistral:latest    4.1 GB    3 days ago

nself ollama pull <model>

Download a model from ollama.com/library into the Ollama container. Supports the same tag syntax as the Ollama CLI (model:tag).

nself ollama pull llama3.2
nself ollama pull mistral:7b
nself ollama pull codellama:13b-instruct
nself ollama pull nomic-embed-text   # embedding model for vector search

The model is immediately available for inference after the pull completes. No restart required.

nself ollama remove <model>

Delete a model from local storage to reclaim disk space.

nself ollama remove mistral:latest
# ✓ mistral:latest removed (freed 4.1 GB)

Flags

Flag	Applies to	Type	Description
`--version`	install	string	Pin a specific Ollama image tag (default: `latest`)
`--gpu`	install	string	GPU backend: `nvidia`, `amd`, or `none` (default: auto-detect)
`--json`	status	bool	Output status as JSON
`--quiet`	pull	bool	Suppress progress bar during download

Integration with nself ai

Once Ollama is installed and a model is pulled, switch the AI plugin to local inference:

# Set the default model for local inference
nself ai local --model llama3.2

# Test a completion request
nself ai complete "Summarize the nSelf architecture in two sentences."

# Switch back to a remote provider
nself ai provider set openai

Environment variables

OLLAMA_BASE_URL — base URL for the Ollama API (set by nself ollama install; default: http://ollama:11434)
OLLAMA_MODELS — Docker volume mount path for model storage (default: /root/.ollama)
OLLAMA_NUM_GPU — number of GPU layers to offload (set automatically when --gpu is specified)

Examples

Full local AI setup from scratch

nself ollama install
nself ollama pull llama3.2
nself ollama pull nomic-embed-text
nself ai local --model llama3.2 --embedding-model nomic-embed-text
nself ollama status

Use a quantized model to save RAM

nself ollama pull llama3.2:3b-q4_0
nself ai local --model llama3.2:3b-q4_0

Install on an NVIDIA GPU server

nself ollama install --gpu nvidia
nself ollama pull llama3.2
nself ollama status
# GPU: nvidia (detected, 4 layers offloaded)

nself ai — configure AI providers, models, and inference routing
nself plugin — manage the AI plugin bundle

nself ollama

Run large language models locally inside your nSelf stack. One command installs the Ollama container, wires it to the AI plugin, and lets you pull any Ollama-compatible model.

Quick start

# Install the Ollama container and wire it up
nself ollama install

# Pull a model (downloads from ollama.com/library)
nself ollama pull llama3.2

# Check that everything is running
nself ollama status

Synopsis

nself ollama <SUBCOMMAND> [FLAGS]

Description

Subcommands

nself ollama install

nself ollama install
# Pulling ollama/ollama:latest...
# Adding ollama service to docker-compose...
# Setting OLLAMA_BASE_URL=http://ollama:11434
# Rebuilding Nginx config...
# ✓ Ollama installed. Run: nself ollama pull <model>

Pin a specific Ollama version:

nself ollama install --version 0.3.12

Enable GPU passthrough explicitly:

nself ollama install --gpu nvidia   # or: --gpu amd

nself ollama status

Show whether the Ollama container is running, the current OLLAMA_BASE_URL, and a list of locally available models with their sizes.

nself ollama status
# Container:       running (ollama/ollama:latest)
# OLLAMA_BASE_URL: http://ollama:11434
# GPU:             nvidia (detected)
#
# MODELS
# NAME              SIZE      MODIFIED
# llama3.2:latest   2.0 GB    2 hours ago
# mistral:latest    4.1 GB    3 days ago

nself ollama pull <model>

Download a model from ollama.com/library into the Ollama container. Supports the same tag syntax as the Ollama CLI (model:tag).

nself ollama pull llama3.2
nself ollama pull mistral:7b
nself ollama pull codellama:13b-instruct
nself ollama pull nomic-embed-text   # embedding model for vector search

The model is immediately available for inference after the pull completes. No restart required.

nself ollama remove <model>

Delete a model from local storage to reclaim disk space.

nself ollama remove mistral:latest
# ✓ mistral:latest removed (freed 4.1 GB)

Flags

Flag	Applies to	Type	Description
`--version`	install	string	Pin a specific Ollama image tag (default: `latest`)
`--gpu`	install	string	GPU backend: `nvidia`, `amd`, or `none` (default: auto-detect)
`--json`	status	bool	Output status as JSON
`--quiet`	pull	bool	Suppress progress bar during download

Integration with nself ai

Once Ollama is installed and a model is pulled, switch the AI plugin to local inference:

# Set the default model for local inference
nself ai local --model llama3.2

# Test a completion request
nself ai complete "Summarize the nSelf architecture in two sentences."

# Switch back to a remote provider
nself ai provider set openai

Environment variables

OLLAMA_BASE_URL — base URL for the Ollama API (set by nself ollama install; default: http://ollama:11434)
OLLAMA_MODELS — Docker volume mount path for model storage (default: /root/.ollama)
OLLAMA_NUM_GPU — number of GPU layers to offload (set automatically when --gpu is specified)

Examples

Full local AI setup from scratch

nself ollama install
nself ollama pull llama3.2
nself ollama pull nomic-embed-text
nself ai local --model llama3.2 --embedding-model nomic-embed-text
nself ollama status

Use a quantized model to save RAM

nself ollama pull llama3.2:3b-q4_0
nself ai local --model llama3.2:3b-q4_0

Install on an NVIDIA GPU server

nself ollama install --gpu nvidia
nself ollama pull llama3.2
nself ollama status
# GPU: nvidia (detected, 4 layers offloaded)

nself ai — configure AI providers, models, and inference routing
nself plugin — manage the AI plugin bundle

nself ollama

Quick start

Synopsis

Description

Subcommands

nself ollama install

nself ollama status

nself ollama pull <model>

nself ollama remove <model>

Flags

Integration with nself ai

Environment variables

Examples

Full local AI setup from scratch

Use a quantized model to save RAM

Install on an NVIDIA GPU server

Related

nself ollama

Quick start

Synopsis

Description

Subcommands

nself ollama install

nself ollama status

nself ollama pull <model>

nself ollama remove <model>

Flags

Integration with nself ai

Environment variables

Examples

Full local AI setup from scratch

Use a quantized model to save RAM

Install on an NVIDIA GPU server

Related