Local Models (Ollama)

Run AI models on your own hardware. Completely private, no API keys, no rate limits, no cost. Ollama manages model lifecycle and ɳSelf handles installation, benchmarking, and task routing.

Installation

# Automatic (recommended)
nself doctor --ai

# Manual
nself ai local install

The installer detects your OS, installs Ollama as a system service, configures firewall rules, and pulls a model matched to your available RAM.

RAM-Based Model Recommendations

Available RAM	Recommended Model	Size	Expected Speed
4 GB	gemma2:2b	1.6 GB	8-12 tok/s
8 GB	llama3.2:3b	2.0 GB	12-18 tok/s
16 GB	llama3.1:8b	4.7 GB	15-25 tok/s
32 GB+	llama3.1:70b-q4	26 GB	8-15 tok/s

Model Management

# List installed models
nself ai local models

# Pull a new model
nself ai local pull gemma2:2b

# Remove a model
nself ai local remove gemma2:2b

# Benchmark a model
nself ai local benchmark gemma2:2b

# Check status
nself ai local status

Task Assignment

Each AI task class can be assigned to a specific local model. By default, the system picks the best available model based on benchmark results.

# Set a model for a specific task
nself ai local assign --task embeddings --model nomic-embed-text

# Test a task assignment
nself ai local test --task chat

# View assignments
nself ai local assignments

Advanced Settings

background_only_local — Only use local models for background tasks (embeddings, classification). Interactive tasks go to cloud.
oom_auto_swap — Automatically swap to a smaller model if the current one causes an OOM error.
auto_benchmark — Automatically benchmark newly pulled models.

nself ai local config set background_only_local true
nself ai local config set oom_auto_swap true

Troubleshooting

Ollama not reachable: Check systemctl status ollama (Linux) or launchctl list | grep ollama (macOS). Run nself ai local restart.
Slow inference: Run nself ai local benchmark and compare with expected speeds above. Consider a smaller model.
OOM errors: Enable oom_auto_swap or switch to a smaller model manually.