Convert text to natural-sounding audio from your own server — use ElevenLabs for high-quality cloud voices with custom cloning, or Piper for fully offline synthesis at zero cost per character.
nself-voice is part of the ɳClaw bundle. Set your key with nself license set nself_pro_... before installing.
nself license set nself_pro_...
nself plugin install voice
nself build
nself start| Variable | Required | Default | Description |
|---|---|---|---|
PLUGIN_VOICE_PROVIDER | Yes | — | Voice provider: elevenlabs or piper |
PLUGIN_ELEVENLABS_API_KEY | ElevenLabs only | — | ElevenLabs API key. Not needed when using Piper. |
PLUGIN_PIPER_MODEL_PATH | Piper only | /models/piper-en-us.onnx | Path to Piper ONNX model file on your server |
PLUGIN_VOICE_DEFAULT_SPEED | No | 1.0 | Default playback speed multiplier (0.5–2.0) |
PLUGIN_VOICE_DEFAULT_FORMAT | No | mp3 | Default audio output format: mp3 or wav |
PLUGIN_VOICE_CACHE_TTL_SECONDS | No | 3600 | How long to cache synthesized audio in np_voice.cache before re-generating |
PLUGIN_VOICE_MAX_TEXT_CHARS | No | 5000 | Maximum text length per synthesis request |
PLUGIN_VOICE_PROVIDER=elevenlabs
PLUGIN_ELEVENLABS_API_KEY=sk_xxxxxPiper runs entirely on your server — no internet required, no per-character cost:
PLUGIN_VOICE_PROVIDER=piper
PLUGIN_PIPER_MODEL_PATH=/models/piper-en-us.onnx
# Download a model: https://github.com/rhasspy/piper/releases# Returns audio/mpeg binary
curl -X POST http://127.0.0.1:3714/voice/synthesize \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to nself. Your backend is ready.",
"voice": "Rachel",
"speed": 1.0
}' --output speech.mp3async function speak(text: string) {
const res = await fetch('/voice/synthesize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, speed: 1.0 }),
})
const blob = await res.blob()
const url = URL.createObjectURL(blob)
const audio = new Audio(url)
audio.play()
}| Endpoint | Method | Description |
|---|---|---|
/voice/synthesize | POST | Convert text to speech. Returns audio/mpeg or audio/wav. |
/voice/voices | GET | List available voices (ElevenLabs: all voices including cloned; Piper: model name) |
/health | GET | Plugin health check and provider connectivity status |
{
"text": "string", // text to synthesize (max 5000 chars by default)
"voice": "Rachel", // voice name or ID (ElevenLabs only; ignored for Piper)
"speed": 1.0, // playback speed: 0.5–2.0 (default: 1.0)
"format": "mp3" // output format: mp3 | wav (default: mp3)
}nself-voice stores synthesis history and a cache in the np_voice schema.
| Table | Key Columns | Purpose |
|---|---|---|
np_voice.requests | id, provider, voice, text_hash, text_length, format, duration_ms, cached, created_at | Audit log of every synthesis request |
np_voice.cache | text_hash, provider, voice, format, audio_bytes, expires_at | Cached audio for repeated identical requests (avoids re-billing ElevenLabs) |
| Event | Trigger | Payload Includes |
|---|---|---|
voice.synthesize.success | Synthesis completes successfully | provider, voice, text_length, cached, duration_ms |
voice.synthesize.error | Provider returns an error or is unreachable | provider, error_code, message |
voice.provider.rate_limited | ElevenLabs character quota reached | provider, quota_used, quota_limit |
| Feature | ElevenLabs | Piper |
|---|---|---|
| Voice quality | Excellent — natural, expressive | Good — clear and natural-sounding |
| Cost | Per-character API cost | Free — runs on your server |
| Internet required | Yes | No — fully offline |
| Custom voices | Yes — voice cloning available | No — pre-trained models only |
| Latency | ~300ms | ~50ms on modern hardware |
| Multi-language | 29+ languages | Depends on downloaded model |
| Option | Cost | Data privacy | Offline | nself integration |
|---|---|---|---|---|
| nself-voice (ElevenLabs) | $0.99/mo + ElevenLabs API costs | Text sent to ElevenLabs | No | Yes — events, Hasura, caching |
| nself-voice (Piper) | $0.99/mo only | Full — never leaves your server | Yes | Yes |
| ElevenLabs directly | API costs only | Text sent to ElevenLabs | No | No — manual integration |
| AWS Polly | Per-character pricing | AWS-managed | No | No — manual integration |
| Google Cloud TTS | Per-character pricing | Google-managed | No | No — manual integration |
Verify PLUGIN_ELEVENLABS_API_KEY in your .env.secrets. Run nself build && nself start after updating secrets to reload the environment.
Check that the ONNX model file exists at the path set in PLUGIN_PIPER_MODEL_PATH. Download models from the Piper releases page. Each language needs its own model file.
For ElevenLabs, latency is network-bound (~300ms typical). If consistently slow, check ElevenLabs status. For Piper, slow synthesis usually means insufficient RAM — Piper requires ~200MB free per concurrent request.
The cache is keyed on sha256(text + provider + voice + format). Cache hits are logged in np_voice.requests with cached=true. If every request misses, verify the TTL hasn't been set too low via PLUGIN_VOICE_CACHE_TTL_SECONDS.
nself plugin remove voicePort: 3714 | Bundle: ɳClaw ($0.99/mo) or ɳSelf+ ($3.99/mo) | Last Updated: May 2026 | Plugin Version 1.1.0