Plugin activates /dev/watchdog - hardware watchdog device to restart server on no responce. For details about watchdog see https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt.
I checked it with my Raspberry Pi 4 - it has Broadcom BCM2835 Watchdog timer, enabled by default.
Service sends keepalive to watchdog timer every 5 seconds, on hang or other software problems system will do hardware restart in 15 seconds.
Cool, Iâve been having occasional issues with my home assistant becoming completely unresponsive so Iâm happy to see this exists and Iâm giving it a try.
Seems like this should really just be a default part of Home Assistant OS.
One thing that would be useful is to track how frequently this is triggered.
I do have an uptime sensor configured (Uptime - Home Assistant) but that wonât let me distinguish restarts due to software updates or config changes from watchdog resets.
I also have set up a notification when HA restarts (similar to the example here Home Assistant restart notification) so at least I can note & track manually.
Personally, I added notifications about shutting down and starting HA and I think this is enough, because reloading the watchdog timer is not a frequent occurrence.
I think the main problem with counting these events is that watchdog restarts are outside the scope of the software - itâs a hardware restart when the software becomes unresponsive. You canât write down some information because you (as a script, as a program) may be (and probably) dead at that moment.
Another way is to count âincorrectâ OS startups - create some flag on proper shutdown, and if there is no such flag on startup, interpret this situation as a bad/watchdog shutdown.
If anyone has an idea how to count this - do not hold back)
Also if you know how to increment any counter in HA from supervisor container - give me an example and Iâll add this function.
@alex107 Is the â/dev/watchdog - hardware watchdog deviceâ only active when the add-on is started and running?
Is the watchdog still active when I stop the add-on or does the watchdog get deactivated when I stop the add-on?
Add-on activates (starts) watchdog timer on its start and updates its state while running. On correct shutdown add-on deactivates (stops) watchdog timer to prevent hardware reboot.
You can try to kill -9 add-on process to prevent stopping watchdog timer correctly. Donât forget to disable auto restart of add-on. On success you will get a hardware restart by watchdog as on software problems. Itâs not completely correct, but you will check if hardware watchdog is working correctly.
P.S.: kill -9 is not the same as docker stop or add-on stop - on stop add-on will disable hardware watchdog timer.
2023-01-18 16:46:01 INFO Opening watchdog device
2023-01-18 16:46:01 INFO Watchdog identity: Broadcom BCM2835 Watchdog timer
2023-01-18 16:46:01 INFO Watchdog firmware version: 0
2023-01-18 16:46:01 INFO Watchdog options: 33152
2023-01-18 16:46:01 INFO Watchdog timeout: 15
2023-01-18 16:46:01 INFO Starting main cycle with sleep time 5 secâŠ
Sistema
Home Assistant 2023.1.2
Supervisor 2022.12.1
Operating System 9.3
Frontend 20230104.0 - latest
When I first started dreaming about a watchdog for my Pi-based Home Assistant, I envisaged a watchdog keep alive trigger coming from my Node Red logic (so if Node Red stopped, the system would reset).
I understand that this add-on is a service (which Iâm not familiar with in a HA-context).
What are the possibilities that something could stop Node Red functioning, but the watchdog sevice would still keep plugging away happily?
This works great and saved me more than once when away from home and needed to rely on VPN access. HA broke, but came actually back w/o me onsite!
The only thing I wondered: it takes about 30-40 minutes until HA is actually restarted after it became unresponsive (and e.g. doesnât record sensors any longerâŠ). Is there any way to make it react quicker?
Thanks, habitoti
Hi!
Canât find anything about it in changelogs.
Can you show the results of âls -la /dev/watchdogâ and âlsof /dev/watchdogâ? May be it can give us some clue.
Thanks!
UPD: looks like hassioâs lsof works the other way, so for me âlsof | grep watchdogâ works fine
looks like the watchdog device exists and not used by anyone :-/
internet tells us that error 16 also can appear if the kernel canât communicate with the device, but in this case /dev/watchdog should not exists.
you can also try to check if watchdog is enabled in systemd, maybe itâs tha thing that was changed in hassio: âcat /etc/systemd/system.conf | grep -i watchâ
Hmm, to check that I should probably login through SSH to the host system, right? Right now Iâm logging through SSH Addon to HA and there is no /etc/systemd directory.
Right, you need access to the core system. I use port 22222 and user root to connect to the core system, but I canât recall if I did smth special for it before.