Thanks for all your checks! Looks like hardware watchdog is enabled in your systemd. By default it’s disabled, maybe there is some other place to enable it and maybe it’s enabled by default in new hassio. Good news - there is no reason for you to use extra addon.
Not sure about the watchdog being actually used. I’m suffering from random crashes (HA itself somehow works, but the Supervisor and all Addons are dead, Observer can’t be reached) and I always need to do a power cycle to get out of it.
RebootWatchdogSec= may be used to configure the hardware watchdog when the system is asked to reboot. It works as a safety net to ensure that the reboot takes place even if a clean reboot attempt times out.
RuntimeWatchdogSec … will be programmed to automatically reboot the system if it is not contacted within the specified timeout interval.
So according to your systemd configuration (and commit you mentioned) it looks like the hardware watchdog is only used during reboots, but not during the runtime( strange dessigion imho(mean this commit).
I’m having the same issue with my HA getting stuck in the last days and I’m not able to find the cause at the moment but I would definitely like the idea of the system rebooting by itself if needed.
When I tried to start the add on I get the same “reply” saying watchdog is already being used. Did you manage to disable the use of the watchdog and are you now able to use the add on?
The way the watchdog is being used now doesn’t restart the rpi4 when needed…
now the situation is totally different… after the very latest update (see below the versions)
HA is restarting spontaneously multiple times per day…
So Watchdog is indeed working… and it must be the embedded one, since I removed the addon which I used for quite a long time
Core 2023.12.1
Supervisor 2023.11.6
Operating System11.2
Frontend 20231030.2
I noticed that all the times that a restart happens… is because of the following:
23-12-11 15:08:00 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/core/state:
23-12-11 15:08:04 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/core/state:
23-12-11 15:08:04 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
23-12-11 15:08:12 INFO (SyncWorker_0) [supervisor.docker.manager] Restarting homeassistant
UPDATE (28/june/2024):
The problem is still there … at least on my “poor” Raspberry PI3A+…
I noticed that after 2 timeout error on call (api/core/state) the watchdog restart HomeAssistant…
Maybe it would be appropriate to change the values of either timeout (maybe it’s too short) or Max attempts (in supervisor code) … in order to give “more time” to react and eventually avoid all of these HA restart which may not be necessary…
I understand that from a developer point of view everything should react as in theory should be (on enough powerfull HW) but givin the fact that there are many “small HW” that maybe are much slower… giving the options to “accept” some slower reaction to avoid useless restart could be a good idea…
Maybe these values can be configurable with UI (so who has slower HW can better tune these values accepting that system will react slowly
“TimeoutError” in supervisor/supervisor/homeassistant/api.py
“ASS_WATCHDOG_MAX_API_ATTEMPTS” (currently = 2) in supervisor/supervisor/misc/tasks.py