Screen messages, images, and user-generated content before it reaches your audience. Runs locally using a fine-tuned classifier, with optional escalation to OpenAI Moderation API for edge cases.
Local classification works without any API key. Add an OpenAI key for high-confidence edge-case escalation.
nself license set nself_pro_...
nself plugin install moderation
nself build
nself startnself-moderation intercepts content at the nself-chat message delivery hook before messages are stored or fanned out to subscribers. The local classifier runs every message through a multi-label model covering hate speech, explicit content, spam, and harassment in under 5 ms. Flagged content is held for human review or auto-rejected based on your threshold config.
Image attachments are scanned using a local NSFW classifier when storage is enabled. For high-stakes use cases, set MODERATION_ESCALATE_TO_OPENAI=true to send low-confidence predictions to the OpenAI Moderation endpoint for a second opinion.
Every moderation decision is logged to Postgres with the full content, prediction scores, and outcome. You can review the queue in the nself-admin UI or query it directly through Hasura. Users can be auto-banned, warned, or shadow-banned based on their moderation history.
| Variable | Required | Description |
|---|---|---|
DATABASE_URL | Yes | Postgres connection string (auto-set by nself) |
MODERATION_REJECT_THRESHOLD | No | Score 0–1 above which content is auto-rejected. Default: 0.95 |
MODERATION_REVIEW_THRESHOLD | No | Score above which content is queued for human review. Default: 0.75 |
MODERATION_ESCALATE_TO_OPENAI | No | Send uncertain predictions to OpenAI Moderation API. Default: false |
OPENAI_API_KEY | Escalation only | OpenAI API key for escalation (not used when escalation is off) |
MODERATION_SCAN_IMAGES | No | Enable image NSFW scanning. Default: true |
MODERATION_AUTO_BAN_COUNT | No | Auto-ban after N rejected messages. Default: 5. Set 0 to disable. |
| Endpoint | Method | Description |
|---|---|---|
/moderation/check | POST | Check arbitrary text or image URL and return scores |
/moderation/queue | GET | List items pending human review |
/moderation/queue/:id/approve | POST | Approve a held item and deliver it |
/moderation/queue/:id/reject | POST | Reject a held item with optional reason |
/moderation/users/:id/history | GET | Get a user's moderation history and current status |
/health | GET | Plugin health and model load status |
| Table | Purpose |
|---|---|
np_moderation_decisions | All decisions: content hash, scores, outcome, reviewer |
np_moderation_queue | Items pending human review |
np_moderation_user_strikes | Strike count per user and ban status |
| Event | Payload |
|---|---|
moderation.content.rejected | Content hash, user ID, category scores |
moderation.content.queued | Queue item ID, user ID, prediction scores |
moderation.user.banned | User ID, ban reason, strike count |
Install nself-moderation alongside nself-chat. The moderation hook is registered automatically — no code changes needed in your app. Set your threshold, run nself build, and every message is screened before storage.
| Feature | nself-moderation | OpenAI Moderation API | Perspective API |
|---|---|---|---|
| Data leaves your server | No (local classifier) | Yes | Yes |
| Latency | <5 ms (local) | ~200 ms (API round-trip) | ~150 ms (API round-trip) |
| Cost | Included in ɳChat bundle | Per-request (free tier limited) | Free with quota limits |
Model not loading: Run nself plugin logs moderation. The classifier model is bundled with the plugin image; if the container fails to start, check available disk space (model is ~200 MB).
Too many false positives: Raise MODERATION_REJECT_THRESHOLD toward 0.99 and lower MODERATION_REVIEW_THRESHOLD to route borderline content to human review instead of auto-reject.
Port: 3208 | Bundle: ɳChat ($0.99/mo) or ɳSelf+ ($3.99/mo) | Last Updated: May 2026 | Plugin Version 1.0.13