Home Assistant Add-on: Hardware watchdog service

About

Plugin activates /dev/watchdog - hardware watchdog device to restart server on no responce. For details about watchdog see https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt.
I checked it with my Raspberry Pi 4 - it has Broadcom BCM2835 Watchdog timer, enabled by default.
Service sends keepalive to watchdog timer every 5 seconds, on hang or other software problems system will do hardware restart in 15 seconds.

Repository on GitHub

7 Likes

Cool, I’ve been having occasional issues with my home assistant becoming completely unresponsive so I’m happy to see this exists and I’m giving it a try.

Seems like this should really just be a default part of Home Assistant OS.

1 Like

One thing that would be useful is to track how frequently this is triggered.

I do have an uptime sensor configured (Uptime - Home Assistant) but that won’t let me distinguish restarts due to software updates or config changes from watchdog resets.

I also have set up a notification when HA restarts (similar to the example here Home Assistant restart notification) so at least I can note & track manually.

Personally, I added notifications about shutting down and starting HA and I think this is enough, because reloading the watchdog timer is not a frequent occurrence.
I think the main problem with counting these events is that watchdog restarts are outside the scope of the software - it’s a hardware restart when the software becomes unresponsive. You can’t write down some information because you (as a script, as a program) may be (and probably) dead at that moment.
Another way is to count “incorrect” OS startups - create some flag on proper shutdown, and if there is no such flag on startup, interpret this situation as a bad/watchdog shutdown.
If anyone has an idea how to count this - do not hold back)
Also if you know how to increment any counter in HA from supervisor container - give me an example and I’ll add this function.

Please add a how-to-install section to your README. I‘ve tried adding your repo to the add-on store to no avail.

1 Like

Done, also automation with HA start/stop notifications are now in readme too.

@alex107 Is the „/dev/watchdog - hardware watchdog device“ only active when the add-on is started and running?
Is the watchdog still active when I stop the add-on or does the watchdog get deactivated when I stop the add-on?

Add-on activates (starts) watchdog timer on its start and updates its state while running. On correct shutdown add-on deactivates (stops) watchdog timer to prevent hardware reboot.

1 Like

Is there a way to simulate a no responce to test that watchdog is working correctly?

You can try to kill -9 add-on process to prevent stopping watchdog timer correctly. Don’t forget to disable auto restart of add-on. On success you will get a hardware restart by watchdog as on software problems. It’s not completely correct, but you will check if hardware watchdog is working correctly.
P.S.: kill -9 is not the same as docker stop or add-on stop - on stop add-on will disable hardware watchdog timer.

No tengo claro que el complemento esté funcionando correctamente. Está en rojo. Pero el registro parece indicar que se inició correctamente.
El resto son los datos de mi sistema software y hardware.

Registro

2023-01-18 16:46:01 INFO Opening watchdog device
2023-01-18 16:46:01 INFO Watchdog identity: Broadcom BCM2835 Watchdog timer
2023-01-18 16:46:01 INFO Watchdog firmware version: 0
2023-01-18 16:46:01 INFO Watchdog options: 33152
2023-01-18 16:46:01 INFO Watchdog timeout: 15
2023-01-18 16:46:01 INFO Starting main cycle with sleep time 5 sec…

Sistema

Home Assistant 2023.1.2
Supervisor 2022.12.1
Operating System 9.3
Frontend 20230104.0 - latest

Hardware RPI 3 B+

watchdog
Subsistema:
misc
Ruta del dispositivo:
/dev/watchdog
Atributos:
DEVNAME: /dev/watchdog
DEVPATH: /devices/platform/soc/3f100000.watchdog/bcm2835-wdt/misc/watchdog
MAJOR: ‘10’
MINOR: ‘130’
SUBSYSTEM: misc

Todo OK

Funciona perfectamente.

Home Assistant 2023.1.5
Supervisor 2022.12.1
Operating System 9.4