Monitor and control the nSelf self-healing watchdog — check circuit-breaker state, reset tripped breakers, review recovery history, and verify alert delivery.
# Check watchdog and circuit-breaker status
nself watchdog status
# Reset all tripped circuit breakers (after you have fixed the root cause)
nself watchdog reset-breakers
# View the last 20 watchdog recovery events
nself watchdog history
# Fire a test alert to verify your alert routing is working
nself watchdog test-alertnself watchdog <SUBCOMMAND> [FLAGS]The nSelf watchdog is a background process that continuously monitors all stack services and enforces self-healing policies. It polls each service's health endpoint, tracks failure rates, and trips circuit breakers when a service crosses its configured error threshold.
When a circuit breaker trips, the watchdog:
After the root cause is resolved, use nself watchdog reset-breakers to close the open breakers and resume normal routing. The watchdog will automatically close breakers after a configurable cool-down period if it detects the service is healthy again.
Watchdog is part of the Security-Always-Free tier — it runs without a license key on every nSelf installation. Advanced alert routing and SLO integration are available through the ɳSentry plugin bundle.
Show the watchdog process state and a summary of all circuit breakers.
nself watchdog status
# Watchdog: running (PID 18432, uptime 6d 14h)
# Breakers: 5 total — 4 closed, 1 open
#
# SERVICE STATE TRIPS LAST TRIP LAST RESTART
# api (hasura) closed 0 — —
# auth closed 0 — —
# storage (minio) closed 2 2026-05-06 09:14Z 2026-05-06 09:16Z
# search open 1 2026-05-07 08:03Z in progress
# mail closed 0 — —Close all open circuit breakers and resume routing immediately. Use this after you have fixed the root cause of a service failure. Without --service, resets all open breakers at once.
nself watchdog reset-breakers # reset all open breakers
nself watchdog reset-breakers --service search # reset one specific serviceDisplay the watchdog recovery event log — restarts, breaker trips, and resolved incidents.
nself watchdog history # last 20 events (default)
nself watchdog history --limit 50 # last 50 events
nself watchdog history --since 7d # events in the last 7 days
nself watchdog history --json # machine-readable outputSend a synthetic test alert through your configured alert routing to verify end-to-end delivery. Useful after adding a new alert channel or changing routing rules.
nself watchdog test-alert
# Sending test alert...
# ✓ Email delivered to ops@example.com (250ms)
# ✓ Slack #alerts channel (310ms)
# Test complete — all 2 channels confirmed.| Flag | Applies to | Default | Description |
|---|---|---|---|
--service | reset-breakers | — | Target a single service name instead of all breakers |
--limit | history | 20 | Maximum number of events to display |
--since | history | — | Show events after this time: 7d, 2026-05-01, 2h |
--json | status, history | false | Emit structured JSON output |
--env | all | current | Target environment: local, staging, prod |
| State | Meaning | Action |
|---|---|---|
closed | Service healthy — requests flow normally | None required |
open | Service unhealthy — requests blocked, restart scheduled | Fix root cause, then reset-breakers |
half-open | Watchdog probing recovery — limited requests allowed through | Wait for automatic close (or force with reset-breakers) |
# 1. See which service is tripped
nself watchdog status
# 2. Read recent logs to understand the failure
nself logs --service search --tail 100
# 3. Fix the root cause (e.g., disk full, bad config)
# ...
# 4. Reset the breaker
nself watchdog reset-breakers --service search
# 5. Confirm it closed
nself watchdog statusnself watchdog history --since 7d --json > recovery-history.jsonnself watchdog test-alertwatch -n 10 nself watchdog statusRestart back-off and breaker thresholds are configured per-service in .env(or nself.yaml if you use the config file). Key variables:
| Variable | Default | Description |
|---|---|---|
NSELF_WATCHDOG_FAILURE_THRESHOLD | 3 | Consecutive failures before breaker opens |
NSELF_WATCHDOG_COOLDOWN | 60s | Cool-down before auto-close attempt (half-open probe) |
NSELF_WATCHDOG_POLL_INTERVAL | 30s | How often the watchdog polls each service |
NSELF_WATCHDOG_ALERT_CHANNELS | — | Comma-separated alert destinations (email, slack, webhook URL) |