Home Assistant Add-on: Hardware watchdog service

When I first started dreaming about a watchdog for my Pi-based Home Assistant, I envisaged a watchdog keep alive trigger coming from my Node Red logic (so if Node Red stopped, the system would reset).

I understand that this add-on is a service (which I’m not familiar with in a HA-context).

What are the possibilities that something could stop Node Red functioning, but the watchdog sevice would still keep plugging away happily?

This works great and saved me more than once when away from home and needed to rely on VPN access. HA broke, but came actually back w/o me onsite!
The only thing I wondered: it takes about 30-40 minutes until HA is actually restarted after it became unresponsive (and e.g. doesn’t record sensors any longer…). Is there any way to make it react quicker?
Thanks, habitoti

Cool, but unfortunately does not work with recent HA. I’m getting OSError: [Errno 16] Resource busy: '/dev/watchdog'.

Hi!
Can’t find anything about it in changelogs.
Can you show the results of “ls -la /dev/watchdog” and “lsof /dev/watchdog”? May be it can give us some clue.
Thanks!

UPD: looks like hassio’s lsof works the other way, so for me “lsof | grep watchdog” works fine

Sure…

ls -la /dev/watchdog returns: 0 crw------- 1 root root 10, 130 Apr 4 2023 /dev/watchdog and lsof | grep watchdog nothing.

looks like the watchdog device exists and not used by anyone :-/
internet tells us that error 16 also can appear if the kernel can’t communicate with the device, but in this case /dev/watchdog should not exists.
you can also try to check if watchdog is enabled in systemd, maybe it’s tha thing that was changed in hassio: “cat /etc/systemd/system.conf | grep -i watch”

Hmm, to check that I should probably login through SSH to the host system, right? Right now I’m logging through SSH Addon to HA and there is no /etc/systemd directory.

Right, you need access to the core system. I use port 22222 and user root to connect to the core system, but I can’t recall if I did smth special for it before.

I’m there. It seems that the watchdog is probably disabled.

# cat /etc/systemd/system.conf | grep -i watch
#RuntimeWatchdogSec=off
#RuntimeWatchdogPreSec=off
#RuntimeWatchdogPreGovernor=
#RebootWatchdogSec=10min
#KExecWatchdogSec=off
#WatchdogDevice=

Maybe this could be also interesting.

# systemctl show | grep -i watchdog
WatchdogDevice=/dev/watchdog0
WatchdogLastPingTimestamp=Thu 2023-11-30 09:16:17 UTC
WatchdogLastPingTimestampMonotonic=82726617867
RuntimeWatchdogUSec=infinity
RuntimeWatchdogPreUSec=0
RebootWatchdogUSec=10min
KExecWatchdogUSec=0
ServiceWatchdogs=yes
# lsof | grep watchdog
1	/usr/lib/systemd/systemd	9	/dev/watchdog0
2753	/package/admin/s6-2.11.3.2/command/s6-supervise	3	/run/s6/legacy-services/watchdog/supervise/lock
2753	/package/admin/s6-2.11.3.2/command/s6-supervise	4	/run/s6/legacy-services/watchdog/supervise/control
2753	/package/admin/s6-2.11.3.2/command/s6-supervise	5	/run/s6/legacy-services/watchdog/supervise/control

Maybe the addon should use /sys/class/watchdog/watchdog0/dev file instead?

Thanks for all your checks! Looks like hardware watchdog is enabled in your systemd. By default it’s disabled, maybe there is some other place to enable it and maybe it’s enabled by default in new hassio. Good news - there is no reason for you to use extra addon.

Not sure about the watchdog being actually used. I’m suffering from random crashes (HA itself somehow works, but the Supervisor and all Addons are dead, Observer can’t be reached) and I always need to do a power cycle to get out of it.

This Enable watchdog control in systemd by sbyx · Pull Request #2628 · home-assistant/operating-system · GitHub is where the watchdog was enabled in systemd.

1 Like

As this documentation says:

RebootWatchdogSec= may be used to configure the hardware watchdog when the system is asked to reboot. It works as a safety net to ensure that the reboot takes place even if a clean reboot attempt times out.

RuntimeWatchdogSec … will be programmed to automatically reboot the system if it is not contacted within the specified timeout interval.

So according to your systemd configuration (and commit you mentioned) it looks like the hardware watchdog is only used during reboots, but not during the runtime( strange dessigion imho(mean this commit).

I also realized only now that the Watchdog addon, which I was using from over one year, it is now disabled… and can’t start

Don’t know exactly when it started to be disabled… but it has to be related to some recent HA updates

Using Raspberry PI3A+

This is the log output:

2023-12-03 19:13:44 INFO Opening watchdog device
Traceback (most recent call last):
File “/watchdog.py”, line 42, in
app.run()
File “/watchdog.py”, line 20, in run
self.wdt = watchdog(‘/dev/watchdog’)
OSError: [Errno 16] Resource busy: ‘/dev/watchdog’

I’m having the same issue with my HA getting stuck in the last days and I’m not able to find the cause at the moment but I would definitely like the idea of the system rebooting by itself if needed.
When I tried to start the add on I get the same “reply” saying watchdog is already being used. Did you manage to disable the use of the watchdog and are you now able to use the add on?
The way the watchdog is being used now doesn’t restart the rpi4 when needed…

Thanks

currently, after update to 2023.12 my RPI is totally unresponsive (but Observer reports no problem) and watchdog is NOT restarting my RPI

now the situation is totally different… after the very latest update (see below the versions)
HA is restarting spontaneously multiple times per day…
So Watchdog is indeed working… and it must be the embedded one, since I removed the addon which I used for quite a long time

  • Core 2023.12.1
  • Supervisor 2023.11.6
  • Operating System11.2
  • Frontend 20231030.2

I noticed that all the times that a restart happens… is because of the following:

23-12-11 15:08:00 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/core/state:
23-12-11 15:08:04 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/core/state:
23-12-11 15:08:04 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
23-12-11 15:08:12 INFO (SyncWorker_0) [supervisor.docker.manager] Restarting homeassistant

again situation is changed…

see details in my latest post here
Home Assistant automatic restart for API call error? - #30 by DarthJacks - Home Assistant OS - Home Assistant Community (home-assistant.io)