ɳSelfɳSELFDOCS
  • Getting Started

    • Introduction
    • Quick Start
    • Installation
    • Your First Project
  • Core Concepts

    • Architecture Overview
    • Project Structure
    • Configuration
    • Environments
  • Services

    • PostgreSQL Database
    • Hasura GraphQL
    • Authentication
    • Real-Time Communication
    • Storage (MinIO)
    • Email Configuration
    • Redis Cache
    • Search Engines
    • Functions
    • MLflow (ML Tracking)
    • Monitoring & Metrics
    • Admin UI
    • Dashboard
  • Database Tools

    • Schema Management
    • Migrations
    • Seeding Data
    • Backup & Restore
    • dbdiagram.io Sync
  • Microservices

    • NestJS Services
    • BullMQ Workers
    • Go Services
    • Python Services
  • CLI Reference

    • All Commands
    • Core Commands
    • Database Commands
    • Service Management
    • Production Commands
  • Deployment

    • Local Development
    • Production Setup
    • SSL/TLS Configuration
    • Domain Configuration
    • Environment Variables
  • Advanced Topics

    • Multi-Tenancy & SaaS
    • Security & Hardening
    • Custom Actions
    • Webhooks
    • Performance Tuning
    • Troubleshooting
  • ɳClaw

    • Backend Manager
    • API Gateway
    • Voice Input
    • Threads & Projects
  • Migration Guides

    • From Supabase
    • From Nhost
    • From Firebase
  • Plugins

    • Plugin catalog (87)
    • Installation
    • Free plugins (25)
    • Pro plugins (62)
  • Reference

    • Stack & Hosting
    • API Reference
    • Guides
    • RFCs
    • Legal
    • Contributing
  • Resources

    • Changelog
    • Licensing
    • FAQ
    • Support

Disaster Recovery

Tested runbook for recovering from failures.


Scenario 1: Database Corruption

Postgres data is corrupted or the database won't start.

# 1. Stop the stack
nself stop

# 2. List available backups
nself backup list

# 3. Restore the most recent backup
nself backup restore latest

# 4. Start the stack
nself start

# 5. Verify data integrity
nself health
nself db query "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public'"

Scenario 2: Server Disk Full

# 1. Check disk usage
df -h
du -sh /var/lib/docker/*

# 2. Clean Docker resources
docker system prune -f

# 3. Remove old backups (keep last 7)
nself backup prune --keep 7

# 4. Clean Loki logs if monitoring is enabled
docker exec nself-loki /usr/bin/loki -target compactor

# 5. Restart services
nself restart

Scenario 3: Complete Server Loss

The server is gone. You have backups stored off-site.

# 1. Provision a new server (same OS, same or larger spec)
# 2. Install nself
curl -fsSL https://install.nself.org | sh

# 3. Initialize with the same project name
nself init my-project
cd my-project

# 4. Copy your .env files from backup
scp backup-server:/backups/my-project/.env.* .

# 5. Start the stack (creates fresh containers)
nself start

# 6. Restore database from off-site backup
nself backup restore /path/to/backup.sql.gz

# 7. Restore MinIO data if using storage
nself storage restore /path/to/minio-backup.tar.gz

# 8. Verify everything
nself health
nself status

Scenario 4: Hasura Metadata Lost

# 1. Check if metadata is in the database (it usually is)
nself hasura metadata export

# 2. If metadata is gone, reapply from your migration files
nself hasura metadata apply

# 3. If migration files are also gone, track all tables manually
nself hasura console
# In the console: Data > Track All tables

Scenario 5: Auth Service Down

# 1. Check auth logs
nself logs auth --tail 100

# 2. Common fix: restart just the auth container
nself restart auth

# 3. If auth DB is corrupted, restore from backup
nself stop auth
nself backup restore latest --service auth
nself start auth

Prevention Checklist

  • Daily automated backups: nself backup schedule --daily --retain 30
  • Off-site backup copy: Sync backups to a different server or S3 bucket daily
  • Test restore monthly: Spin up a test server, restore, verify data
  • Monitor disk usage: Alert at 80% capacity via Grafana
  • Monitor backup age: Alert if the newest backup is older than 36 hours
  • Document your env files: Keep a secure copy of all .env.* files outside the server

Recovery Time Objectives

ScenarioExpected RTOData Loss (RPO)
Service restart< 1 minuteNone
Database restore (local backup)5-15 minutesSince last backup
Full server rebuild30-60 minutesSince last off-site backup
Disk full recovery5-10 minutesNone