nself-moderation: AI Content Moderation

Screen messages, images, and user-generated content before it reaches your audience. Runs locally using a fine-tuned classifier, with optional escalation to OpenAI Moderation API for edge cases.

Pro Plugin — included in ɳChat bundle ($0.99/mo) or ɳSelf+ ($3.99/mo)

Local classification works without any API key. Add an OpenAI key for high-confidence edge-case escalation.

Install

nself license set nself_pro_...
nself plugin install moderation
nself build
nself start

What it does

nself-moderation intercepts content at the nself-chat message delivery hook before messages are stored or fanned out to subscribers. The local classifier runs every message through a multi-label model covering hate speech, explicit content, spam, and harassment in under 5 ms. Flagged content is held for human review or auto-rejected based on your threshold config.

Image attachments are scanned using a local NSFW classifier when storage is enabled. For high-stakes use cases, set MODERATION_ESCALATE_TO_OPENAI=true to send low-confidence predictions to the OpenAI Moderation endpoint for a second opinion.

Every moderation decision is logged to Postgres with the full content, prediction scores, and outcome. You can review the queue in the nself-admin UI or query it directly through Hasura. Users can be auto-banned, warned, or shadow-banned based on their moderation history.

Configuration

Variable	Required	Description
`DATABASE_URL`	Yes	Postgres connection string (auto-set by nself)
`MODERATION_REJECT_THRESHOLD`	No	Score 0–1 above which content is auto-rejected. Default: `0.95`
`MODERATION_REVIEW_THRESHOLD`	No	Score above which content is queued for human review. Default: `0.75`
`MODERATION_ESCALATE_TO_OPENAI`	No	Send uncertain predictions to OpenAI Moderation API. Default: `false`
`OPENAI_API_KEY`	Escalation only	OpenAI API key for escalation (not used when escalation is off)
`MODERATION_SCAN_IMAGES`	No	Enable image NSFW scanning. Default: `true`
`MODERATION_AUTO_BAN_COUNT`	No	Auto-ban after N rejected messages. Default: `5`. Set `0` to disable.

Endpoints

Endpoint	Method	Description
`/moderation/check`	POST	Check arbitrary text or image URL and return scores
`/moderation/queue`	GET	List items pending human review
`/moderation/queue/:id/approve`	POST	Approve a held item and deliver it
`/moderation/queue/:id/reject`	POST	Reject a held item with optional reason
`/moderation/users/:id/history`	GET	Get a user's moderation history and current status
`/health`	GET	Plugin health and model load status

Database tables

Table	Purpose
`np_moderation_decisions`	All decisions: content hash, scores, outcome, reviewer
`np_moderation_queue`	Items pending human review
`np_moderation_user_strikes`	Strike count per user and ban status

Events and hooks

Event	Payload
`moderation.content.rejected`	Content hash, user ID, category scores
`moderation.content.queued`	Queue item ID, user ID, prediction scores
`moderation.user.banned`	User ID, ban reason, strike count

Integration with ɳChat bundle

Install nself-moderation alongside nself-chat. The moderation hook is registered automatically — no code changes needed in your app. Set your threshold, run nself build, and every message is screened before storage.

vs SaaS alternatives

Feature	nself-moderation	OpenAI Moderation API	Perspective API
Data leaves your server	No (local classifier)	Yes	Yes
Latency	<5 ms (local)	~200 ms (API round-trip)	~150 ms (API round-trip)
Cost	Included in ɳChat bundle	Per-request (free tier limited)	Free with quota limits

Troubleshooting

Model not loading: Run nself plugin logs moderation. The classifier model is bundled with the plugin image; if the container fails to start, check available disk space (model is ~200 MB).

Too many false positives: Raise MODERATION_REJECT_THRESHOLD toward 0.99 and lower MODERATION_REVIEW_THRESHOLD to route borderline content to human review instead of auto-reject.

Related plugins

nself-chat: The messaging layer moderation hooks into
nself-bots: Moderation bots that can act on flagged content
nself-notify: Alert moderators when items enter the review queue

Port: 3208 | Bundle: ɳChat ($0.99/mo) or ɳSelf+ ($3.99/mo) | Last Updated: May 2026 | Plugin Version 1.0.13