Free Plugin — MIT licensed

Ollama Plugin

Run local LLMs via Ollama inside your nSelf stack. Pull any model Ollama supports — Llama, Mistral, Gemma, Phi — and expose it to your apps via the nSelf AI plugin interface. No API keys, no usage costs.

Free Forever — no license key required. For premium hosted models (Claude, GPT-4, Gemini), see the paid AI plugin in the ɳClaw bundle.

Overview

Local Models — run LLMs entirely on your own hardware; no data leaves your server
Model Management — pull, list, and delete models via nself plugin run ollama
OpenAI-Compatible API — drop-in replacement for apps expecting /v1/chat/completions
GPU Acceleration — uses CUDA or Metal automatically when available
Streaming — supports SSE streaming responses

Installation

nself plugin install ollama

# Pull a model
nself plugin run ollama pull llama3

nself plugin status ollama

Configuration

# Ollama API port (default: 11434)
OLLAMA_PORT=11434

# Default model for /api/generate calls
OLLAMA_DEFAULT_MODEL=llama3

# GPU device (default: auto-detect)
# Set to "cpu" to force CPU-only
OLLAMA_DEVICE=auto

Usage

# Chat completion (OpenAI-compatible)
curl http://your-nself-host:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List available models
nself plugin run ollama list