Home Assistant Add-on: Hardware watchdog service

About

Plugin activates /dev/watchdog - hardware watchdog device to restart server on no responce. For details about watchdog see https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt.
I checked it with my Raspberry Pi 4 - it has Broadcom BCM2835 Watchdog timer, enabled by default.
Service sends keepalive to watchdog timer every 5 seconds, on hang or other software problems system will do hardware restart in 15 seconds.

Repository on GitHub

4 Likes

Cool, I’ve been having occasional issues with my home assistant becoming completely unresponsive so I’m happy to see this exists and I’m giving it a try.

Seems like this should really just be a default part of Home Assistant OS.

One thing that would be useful is to track how frequently this is triggered.

I do have an uptime sensor configured (Uptime - Home Assistant) but that won’t let me distinguish restarts due to software updates or config changes from watchdog resets.

I also have set up a notification when HA restarts (similar to the example here Home Assistant restart notification) so at least I can note & track manually.

Personally, I added notifications about shutting down and starting HA and I think this is enough, because reloading the watchdog timer is not a frequent occurrence.
I think the main problem with counting these events is that watchdog restarts are outside the scope of the software - it’s a hardware restart when the software becomes unresponsive. You can’t write down some information because you (as a script, as a program) may be (and probably) dead at that moment.
Another way is to count “incorrect” OS startups - create some flag on proper shutdown, and if there is no such flag on startup, interpret this situation as a bad/watchdog shutdown.
If anyone has an idea how to count this - do not hold back)
Also if you know how to increment any counter in HA from supervisor container - give me an example and I’ll add this function.