Recovery escalation: - Level 0-1: LaunchAgent KeepAlive + Watchdog - Level 2: Automated "doctor --fix" (config validation, port checks) - Level 3: Claude Code spawns in tmux, reads logs, attempts repairs - Level 4: Discord alert if all automation fails
Production-tested in my homelab over 3 months: 99% recovery rate, recovery time reduced from 45min → 3min avg. Handled 17 consecutive crashes, config corruption, port conflicts.
Built for macOS (stable) with Linux systemd support (beta). MIT licensed.
Curious what others think about AI-powered infrastructure self-healing.