The routing engine picks the best AI provider for each request based on task type, available capacity, latency, and cost. It runs automatically but every decision is configurable and logged.
Each task class has an ordered list of tiers. The router tries the first tier; if unavailable (rate limited, exhausted, or not configured), it falls to the next.
| Tier | Cost | Speed | Quality |
|---|---|---|---|
| Local (Ollama) | $0 | Depends on hardware | Good for most tasks |
| Gemini Free | $0 | Fast | High (Gemini 2.0 Flash) |
| Gemini Paid | Pay-per-use | Fast | Highest |
| OpenAI | Pay-per-use | Fast | High |
| Anthropic | Pay-per-use | Fast | Highest |
| Fallback | Varies | Slow | Minimum viable |
ɳSelf defines 11 task classes, each with its own default tier chain:
nself ai routing list
TASK CLASS TIER CHAIN BG-LOCAL TIMEOUT
chat gemini_free → local → anthropic no 30s
embeddings local → gemini_free yes 60s
classify local → gemini_free yes 10s
extract gemini_free → local no 30s
summarize gemini_free → local no 30s
code anthropic → gemini_free → local no 60s
search_rerank local → gemini_free yes 10s
voice_tts local no 5s
voice_stt local → gemini_free no 10s
image_describe gemini_free → local no 15s
translate gemini_free → local no 15s# Edit a task class
nself ai routing edit chat
# Set tier chain directly
nself ai routing set chat --tiers "local,gemini_free,anthropic"
# Toggle background-only local
nself ai routing set embeddings --bg-local true
# Set timeout
nself ai routing set code --timeout 120sChanges take effect immediately via PostgreSQL NOTIFY. All connected services reload within 2 seconds.
The routing engine tracks quality metrics per task class: tier distribution, latency p95, quality scores, and cost per day. View in the admin UI at Settings > AI > Routing or via CLI:
nself ai routing quality