Route AI completions through OpenAI, Anthropic, Google Gemini, or a local Ollama instance from a single OpenAI-compatible endpoint — with streaming, multi-provider fallback, usage tracking, and optional spend caps, all running on your own infrastructure.
nself-ai is part of the ɳClaw bundle and is the foundation for nself-claw, nself-mux AI features, and nself-voice. Set your key with nself license set nself_pro_... before installing.
nself license set nself_pro_...
nself plugin install ai
nself build
nself start| Variable | Required | Default | Description |
|---|---|---|---|
PLUGIN_AI_DEFAULT_PROVIDER | Yes | — | Default provider: openai, anthropic, gemini, or ollama |
PLUGIN_AI_MONTHLY_BUDGET_USD | No | unlimited | Hard monthly spend cap in USD. Requests fail with 429 when exceeded. |
PLUGIN_OPENAI_API_KEY | If using OpenAI | — | OpenAI API key (starts with sk-) |
PLUGIN_ANTHROPIC_API_KEY | If using Anthropic | — | Anthropic API key (starts with sk-ant-) |
PLUGIN_GEMINI_API_KEY | If using Gemini | — | Google AI Studio or Cloud Console key |
PLUGIN_OLLAMA_URL | If using Ollama | http://127.0.0.1:11434 | Local Ollama server URL. No API key required. |
PLUGIN_AI_FALLBACK_PROVIDER | No | — | Secondary provider to try on timeout or rate-limit from the default provider |
PLUGIN_AI_REQUEST_TIMEOUT_MS | No | 60000 | Per-request timeout in milliseconds before triggering fallback |
Only configure the provider(s) you plan to use. Setting PLUGIN_AI_DEFAULT_PROVIDER=ollama with a local Ollama instance requires no paid API key.
# Non-streaming completion (OpenAI-compatible format)
curl -X POST http://127.0.0.1:8010/ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "What is nself?" }
]
}'The endpoint is SSE-compatible and works with the OpenAI JavaScript SDK:
import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'http://127.0.0.1:8010/ai/v1',
apiKey: 'not-used', // auth handled by nself
})
const stream = await client.chat.completions.create({
model: 'claude-3-5-sonnet-20241022',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
})
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
}| Provider | Model IDs | Notes |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini | Auto-routed by gpt- or o1 prefix |
| Anthropic | claude-3-opus-20240229, claude-3-5-sonnet-20241022, claude-3-haiku-20240307 | Auto-routed by claude- prefix |
| Google Gemini | gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash | Auto-routed by gemini- prefix |
| Ollama | Any locally installed model: llama3.2, mistral, phi4, etc. | Falls back to default provider if prefix unmatched |
| Endpoint | Method | Description |
|---|---|---|
/ai/v1/chat/completions | POST | Chat completion — OpenAI-compatible, supports stream: true |
/ai/v1/models | GET | List available models across all configured providers |
/admin/usage | GET | Token usage and cost breakdown by provider, model, and day |
/admin/providers | GET | Live health and latency stats for each configured provider |
/health | GET | Plugin liveness check |
nself-ai creates tables in the np_ai schema. Hasura auto-tracks these for instant GraphQL access.
| Table | Key Columns | Purpose |
|---|---|---|
np_ai.requests | id, provider, model, prompt_tokens, completion_tokens, cost_usd, latency_ms, created_at | Audit log of every inference request |
np_ai.monthly_usage | provider, model, month, total_tokens, total_cost_usd | Aggregated monthly spend per provider/model |
np_ai.provider_config | provider, enabled, priority, monthly_budget_usd | Runtime provider configuration and budget limits |
query AiUsageThisMonth {
np_ai_monthly_usage(
where: { month: { _eq: "2026-05" } }
order_by: { total_cost_usd: desc }
) {
provider
model
total_tokens
total_cost_usd
}
}| Event | Trigger | Payload Includes |
|---|---|---|
ai.completion.success | After a successful inference | provider, model, tokens, latency_ms |
ai.completion.error | Provider returned an error or timed out | provider, error_code, message |
ai.budget.warning | Monthly spend reaches 80% of cap | provider, spent_usd, budget_usd |
ai.budget.exceeded | Monthly spend cap hit — requests now return 429 | provider, spent_usd, budget_usd |
ai.provider.fallback | Fallback provider activated after primary timeout | primary_provider, fallback_provider, reason |
Events are forwarded to nself-mux if installed, enabling rules like "notify on budget warning" or "log all AI errors to Slack".
| Option | Cost | Data control | Multi-provider | Usage tracking |
|---|---|---|---|---|
| nself-ai | $0.99/mo + pass-through API costs | Full — your server, your DB | Yes — 4 providers + fallback | Yes — per-request in Postgres |
| OpenAI API directly | Pay-per-token only | OpenAI stores prompts by default | No — OpenAI only | Dashboard only, no custom queries |
| Azure OpenAI | Pay-per-token + Azure fees | Azure-managed — not your infra | No — Azure models only | Azure Monitor only |
| LiteLLM / OpenRouter | Free OSS or hosted SaaS pricing | Depends on hosting | Yes | Basic — no Postgres integration |
Check /admin/usage to see if the monthly budget cap has been hit. Increase PLUGIN_AI_MONTHLY_BUDGET_USD or wait for the calendar month to reset.
The API key for that provider is missing or invalid. Verify the key in your .env.secrets file, then run nself build && nself start to reload environment variables.
Usually a proxy timeout. Set PLUGIN_AI_REQUEST_TIMEOUT_MS to a higher value (e.g., 120000) and ensure your Nginx config has proxy_read_timeout 120s.
Run ollama pull llama3.2 on the host before starting nself. The plugin checks Ollama at PLUGIN_OLLAMA_URL and will return a 503 if the model is not loaded.
Run nself db hasura reload-metadata to force Hasura to re-track the np_ai schema tables.
nself plugin remove ainself-claw and the AI features of nself-mux depend on nself-ai. Remove those plugins first, or use nself plugin remove ai --force.
Port: 8010 | Bundle: ɳClaw ($0.99/mo) or ɳSelf+ ($3.99/mo) | Last Updated: May 2026 | Plugin Version 1.1.0