Run AI models on your own hardware. Completely private, no API keys, no rate limits, no cost. Ollama manages model lifecycle and ɳSelf handles installation, benchmarking, and task routing.
# Automatic (recommended)
nself doctor --ai
# Manual
nself ai local installThe installer detects your OS, installs Ollama as a system service, configures firewall rules, and pulls a model matched to your available RAM.
| Available RAM | Recommended Model | Size | Expected Speed |
|---|---|---|---|
| 4 GB | gemma2:2b | 1.6 GB | 8-12 tok/s |
| 8 GB | llama3.2:3b | 2.0 GB | 12-18 tok/s |
| 16 GB | llama3.1:8b | 4.7 GB | 15-25 tok/s |
| 32 GB+ | llama3.1:70b-q4 | 26 GB | 8-15 tok/s |
# List installed models
nself ai local models
# Pull a new model
nself ai local pull gemma2:2b
# Remove a model
nself ai local remove gemma2:2b
# Benchmark a model
nself ai local benchmark gemma2:2b
# Check status
nself ai local statusEach AI task class can be assigned to a specific local model. By default, the system picks the best available model based on benchmark results.
# Set a model for a specific task
nself ai local assign --task embeddings --model nomic-embed-text
# Test a task assignment
nself ai local test --task chat
# View assignments
nself ai local assignmentsnself ai local config set background_only_local true
nself ai local config set oom_auto_swap truesystemctl status ollama (Linux) or launchctl list | grep ollama (macOS). Run nself ai local restart.nself ai local benchmark and compare with expected speeds above. Consider a smaller model.oom_auto_swap or switch to a smaller model manually.