The Problem with Dumb Watchdogs Most hardware watchdogs are just simple ping-checkers that power-cycle a smart plug when a port drops. The problem? If Home Assistant is doing a massive Core update, restoring a backup, or rebuilding a database, the 8123 port goes down. A standard watchdog will blindly pull the plug right in the middle of a critical write, completely corrupting the host OS or database.
I wanted something better, so I built a completely independent, out-of-band recovery guardian hosted on a separate Raspberry Pi.
What is it? It monitors your HA host (like a NUC) from the outside and controls its power via a local Tuya smart plug.
Instead of just checking if the UI is up, it uses SSH to inspect the actual supervisor and job states before it ever touches the power relay. It knows the difference between a hard crash and a legitimate update.
Key Features:
- Multi-Signal Diagnosis: It checks both Core (8123) and Observer (4357) to distinguish a localized HA Core crash from a total host freeze.
- Deep State Inspection: Before rebooting, it connects via SSH to read
ha jobs info. It safely pauses its failure counters if the system isstarting,rebuilding, orupdating. - Two-Strike Recovery Policy: It operates conservatively. It tries a soft timeout, escalates to a hard power cycle, waits through a boot grace period, and only initiates a destructive SSH encrypted backup restore as a true last resort.
- Intelligent Backup Selection: If forced to restore, it parses HA backup metadata, prefers backups stored on
Local_NAS, and skips partial/add-on-only backups. - Air-Gapped Dashboard: Includes a lightweight, concurrent status dashboard served directly from the Pi. It caches local UI assets so the dashboard remains fully functional even if your WAN goes down.
Hardware Requirements:
- A dedicated Raspberry Pi (keeps it out-of-band)
- A Tuya smart plug (for power cycling the HA host)
- SSH access enabled to the HA host
Repository & Code: I’ve heavily documented the deployment and configuration process in the repository. It uses a clean .env architecture so secrets never touch source control.
GitHub Repo: kiranvenom1209/voerynth-sentinel
I’ve been running this to protect my own infrastructure and would love to hear feedback or edge-case ideas from other users managing high-availability setups!