AI Routing

The routing engine picks the best AI provider for each request based on task type, available capacity, latency, and cost. It runs automatically but every decision is configurable and logged.

Tier Chain

Each task class has an ordered list of tiers. The router tries the first tier; if unavailable (rate limited, exhausted, or not configured), it falls to the next.

Tier	Cost	Speed	Quality
Local (Ollama)	$0	Depends on hardware	Good for most tasks
Gemini Free	$0	Fast	High (Gemini 2.0 Flash)
Gemini Paid	Pay-per-use	Fast	Highest
OpenAI	Pay-per-use	Fast	High
Anthropic	Pay-per-use	Fast	Highest
Fallback	Varies	Slow	Minimum viable

Task Classes

ɳSelf defines 11 task classes, each with its own default tier chain:

nself ai routing list

TASK CLASS      TIER CHAIN                          BG-LOCAL  TIMEOUT
chat            gemini_free → local → anthropic     no        30s
embeddings      local → gemini_free                 yes       60s
classify        local → gemini_free                 yes       10s
extract         gemini_free → local                 no        30s
summarize       gemini_free → local                 no        30s
code            anthropic → gemini_free → local     no        60s
search_rerank   local → gemini_free                 yes       10s
voice_tts       local                               no        5s
voice_stt       local → gemini_free                 no        10s
image_describe  gemini_free → local                 no        15s
translate       gemini_free → local                 no        15s

Customizing Routes

# Edit a task class
nself ai routing edit chat

# Set tier chain directly
nself ai routing set chat --tiers "local,gemini_free,anthropic"

# Toggle background-only local
nself ai routing set embeddings --bg-local true

# Set timeout
nself ai routing set code --timeout 120s

Changes take effect immediately via PostgreSQL NOTIFY. All connected services reload within 2 seconds.

Quality Dashboard

The routing engine tracks quality metrics per task class: tier distribution, latency p95, quality scores, and cost per day. View in the admin UI at Settings > AI > Routing or via CLI:

nself ai routing quality

How Routing Decisions Are Made

Incoming request is classified into a task class.
Router reads the tier chain for that class.
For each tier (in order): check if the tier is available (configured, not rate-limited, not exhausted).
If background_only_local is set and the request is interactive, skip the local tier.
First available tier wins. If all tiers fail, return an error with the specific failure reasons.
Decision is logged to the routing audit table.

Tier Chain

Each task class has an ordered list of tiers. The router tries the first tier; if unavailable (rate limited, exhausted, or not configured), it falls to the next.

Tier

Cost

Speed

Quality

Local (Ollama)

Depends on hardware

Good for most tasks

Gemini Free

Fast

High (Gemini 2.0 Flash)

Gemini Paid

Pay-per-use

Fast

Highest

OpenAI

Pay-per-use

Fast

High

Anthropic

Pay-per-use

Fast

Highest

Fallback

Varies

Slow

Minimum viable

Task Classes

ɳSelf defines 11 task classes, each with its own default tier chain:

nself ai routing list TASK CLASS TIER CHAIN BG-LOCAL TIMEOUT chat gemini_free → local → anthropic no 30s embeddings local → gemini_free yes 60s classify local → gemini_free yes 10s extract gemini_free → local no 30s summarize gemini_free → local no 30s code anthropic → gemini_free → local no 60s search_rerank local → gemini_free yes 10s voice_tts local no 5s voice_stt local → gemini_free no 10s image_describe gemini_free → local no 15s translate gemini_free → local no 15s

Customizing Routes

# Edit a task class nself ai routing edit chat # Set tier chain directly nself ai routing set chat --tiers "local,gemini_free,anthropic" # Toggle background-only local nself ai routing set embeddings --bg-local true # Set timeout nself ai routing set code --timeout 120s

Changes take effect immediately via PostgreSQL NOTIFY. All connected services reload within 2 seconds.

How Routing Decisions Are Made

Incoming request is classified into a task class.

Router reads the tier chain for that class.

For each tier (in order): check if the tier is available (configured, not rate-limited, not exhausted).

If background_only_local is set and the request is interactive, skip the local tier.

First available tier wins. If all tiers fail, return an error with the specific failure reasons.

Decision is logged to the routing audit table.