Home Assistant automatic restart for API call error?

bgoncal · February 28, 2024, 2:21pm

I recently had the same issue and the root cause CALDav integration connected to an icloud email. Removing that fixed everything

spanzetta · March 1, 2024, 11:32am

In my case… it is happening again a frequent reboot (:-() and it seems it’s caused by core_mosquitto…
I already had in the past this feeling (that core mosquitt was the cause of reboot) but then with latest updates it did not create any problem…
Now it started again…

24-03-01 10:54:37 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call https://172.30.32.1:8123/api/core/state.
24-03-01 10:54:42 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call https://172.30.32.1:8123/api/core/state.
24-03-01 10:54:43 WARNING (MainThread) [supervisor.misc.tasks] Watchdog missed an Home Assistant Core API response.
24-03-01 10:55:33 WARNING (MainThread) [supervisor.misc.tasks] Watchdog found a problem with core_mosquitto application!
24-03-01 10:55:45 INFO (SyncWorker_7) [supervisor.docker.manager] Stopping addon_core_mosquitto application
24-03-01 10:56:24 INFO (SyncWorker_7) [supervisor.docker.manager] Cleaning addon_core_mosquitto application
24-03-01 10:56:47 ERROR (MainThread) [supervisor.misc.tasks] Watchdog missed 2 Home Assistant Core API responses in a row. Restarting Home Assistant Core API!

Salsalove · March 9, 2024, 2:49pm

@spanzetta Hai risolto? Io non ne riesco a venire a capo da settimane ormai.
I miei problemi iniziano quando avvio Node-red (che utilizza core-mosquitto). Ma se uso solo core-mosquitto (per un altro add-on che ne fa uso) non ho di questi errori.

spanzetta · March 9, 2024, 3:21pm

Purtroppo la situazione è estremamente variabile… e non credo avra’ mai una soluzione definitiva.

Ad ogni aggiornamento la situazione può migliorare o peggiorare … e quando sembra stabile (es dopo update Core a 2024.3) poi magari ad un certo punto senza motivo inizia a non rispondere piu e inizia a fare reboot uno dietro l’altro… poi magari torna ad essere stabile…(mi e’ successo proprio ieri, e non era la prima volta).

Dai log l’unica cosa che ho notato è che i due add-on che piu di altri sembrano provocare i blocchi (e quindi i reboot dovuti al watchdog) sono Core Mosquitto (che pero mi serve per cui e’ ON) e File Editor (che tengo spento e accendo solo se serve)…

Ma può anche essere fuorviante… che i problemi sono causati da altro… difficile scoprirlo…

Sino ad un riavvio ogni uno/due giorni lo considero normale (per un RPI con 512Mb di Ram) … quando si riavvia 10 volte in due ore anche no…

Io non ho node-red…

spanzetta · March 9, 2024, 5:53pm

Novità di qualche giorno fa (e successo di nuovo oggi) e che dopo restart vedo che la memoria utilizzata è intorno al 75% invece che 85/88% che sono i valori soliti…
Quando accade ciò è ancora più instabile … e si riavvia… non si capisce come mai ha un utilizzo di memoria inferiore… ma è più instabile…
Solo dopo ulteriori riavvi… la situazione diventa “normale” con utilizzo di memoria del 85/88%…
Misteri!!

NathanCu · March 9, 2024, 6:23pm

@spanzetta Please remember the HA community is English only

(Recuerde que la comunidad HA solo está disponible en inglés.)

spanzetta · March 9, 2024, 6:45pm

Yes… sorry…

NathanCu · March 9, 2024, 6:53pm

No worries we just want everyone to be able to understand the answer

spanzetta · June 28, 2024, 6:30am

The problem is still there … at least on my “poor” Raspberry PI3A+…

I noticed that after 2 timeout error on call (api/core/state) the watchdog restart HomeAssistant…

Maybe it would be appropriate to change the values of either timeout (maybe it’s too short) or Max attempts (in supervisor code) … in order to give “more time” to react and eventually avoid all of these HA restart which may not be necessary…

I understand that from a developer point of view everything should react as in theory should be (on enough powerfull HW) but givin the fact that there are many “small HW” that maybe are much slower… giving the options to “accept” some slower reaction to avoid useless restart could be a good idea…

Maybe these values can be configurable with UI (so who has slower HW can better tune these values accepting that system will react slowly

“TimeoutError” in supervisor/supervisor/homeassistant/api.py
“ASS_WATCHDOG_MAX_API_ATTEMPTS” (currently = 2) in supervisor/supervisor/misc/tasks.py

What do you think about?

Salsalove · June 30, 2024, 3:38pm

After several months I fixed it by replacing my Rpi4 with an Intel Mini Pc and a backup of my HaOS instance. No more freeze or error in supervisor logs.

spanzetta · June 30, 2024, 5:17pm

That is the proof that the problems we are talking about affects only Raspberry… so developers should try to fix since there are tons of HA instance running on Raspberry (all variations)

Salsalove · June 30, 2024, 8:01pm

That’s right, I’ve been saying for months that the problem was in recent updates and some incompatibility with raspberry.

Olivier974 · July 6, 2024, 7:11am

Nope…i am in the same boat and i have an Intel NUC8, with a core I5, X86 system and i have several reboot when i am on Esphome dashboard.

And the bad thing is now i cant open 2 differentes nodes config in a 2 differents browser (firefox + chrome) because i cant save anymore my nodes after changes…

If i close the chrome browser, “file save” appear immédiately in Firefox…really strange…seems something is blocked in a browser when an other is open…and i have a lot of :

2024-07-06 11:01:58.054 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:01:58.055 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:02:28.038 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:02:28.041 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:02:58.049 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:02:58.050 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:03:28.047 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:03:28.048 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:03:58.053 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:03:58.057 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:04:28.050 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:04:28.052 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:04:58.058 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:04:58.059 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:05:28.053 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
2024-07-06 11:05:28.059 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None

and after…restart of ha because of watchdog error :

2024-07-06 11:29:28.845 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.

i try ha core rebuild doesnt work better

spanzetta · July 6, 2024, 7:41am

Ok but this looks a different situation…

What we was describing here is a problem caused by

ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/core/state

And then the watchdog restart HA …

Olivier974 · July 6, 2024, 7:42am

yes i read all post, and i have this same error too, then ha restart

the error is in my previous post just above, last line

i try ha supervisor repair seems to be better… will see in the couple next hours…

spanzetta · July 6, 2024, 8:10am

If it happens also on X86/NUC it is even worst…

I wonder how developers didn’t find a solution after so long… (well… some builds in the past apparently fixed it but then it did come again)

Olivier974 · July 7, 2024, 2:53pm

Hello,
some news on my side,

i have take a look to home-assistant.log and just before the restart, there is a problem with HASS-AGENT.

Dont know if you use it in your config but i am using it.

In the Github repo issues, i saw a bunch of guys that encounter the same problem, its related to the old version, need to delete the integrations in HA, then the custom component HASS-AGENT in HACS, then add the new repo to HACS, install and do integration of my 4 Pcs.

repo: https://github.com/hass-agent/HASS.Agent-Integration

more info : https://github.com/LAB02-Research/HASS.Agent-Integration?tab=readme-ov-file

the new version is HASS-Agent2, i dont even know there is a new version…my bad.

Errors and Restarts about watchdog complaining have gone…for now

If it can help others…

EDIT : perhaps its another custom_component for others, but sure now its not related to the hardware side

remimikalsen · July 26, 2024, 10:25am

I’ve had the same problem for a few months now, but currently only when I update the Z-Wave JS UI Add-on (it happens every time!). I’ve got a Raspberry Pi 4B, 4 GB RAM running on an SSD. CPU utilization is 12% during normal operation, but jumped to 43% last time updating the add-on. Memory utilization is normally around 56%, and increasing to 58% during the upgrade. After the upgrade, HA restarts a couple of times before stabilizing. During normal operation, this error doesn’t appear and I don’t experience these issues. It doesn’t happen when I upgrade supervisor, core or any other add-on. Before (don’t know since when), this wasn’t an issue. I could upgrade the Z-Wave JS UI add-on without these problems.

See a snippet from my supervisor logs:

2024-07-26 11:31:58.674 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on ghcr.io/hassio-addons/zwave-js-ui/aarch64 with version 3.9.2
2024-07-26 11:32:10.519 INFO (MainThread) [supervisor.hardware.monitor] Detecting remove hardware /dev/ttyACM0 - /dev/serial/by-id/usb-0658_0200-if00
2024-07-26 11:32:10.529 INFO (MainThread) [supervisor.hardware.monitor] Detecting remove hardware /dev/bus/usb/001/015 - None
2024-07-26 11:32:12.877 INFO (MainThread) [supervisor.hardware.monitor] Detecting add hardware /dev/bus/usb/001/016 - None
2024-07-26 11:32:12.881 INFO (MainThread) [supervisor.hardware.monitor] Detecting add hardware /dev/ttyACM0 - /dev/serial/by-id/usb-0658_0200-if00
2024-07-26 11:32:20.044 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:32:52.044 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:32:58.045 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:33:10.044 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:33:10.045 WARNING (MainThread) [supervisor.misc.tasks] Watchdog missed an Home Assistant Core API response.
2024-07-26 11:33:24.043 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:33:56.043 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:34:10.962 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API request initialize
2024-07-26 11:34:10.995 INFO (MainThread) [supervisor.api.proxy] WebSocket access from a0d7b954_vscode
2024-07-26 11:34:11.701 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API request running
2024-07-26 11:34:12.236 INFO (MainThread) [supervisor.auth] Auth request from 'core_mosquitto' for 'homeassistant'
2024-07-26 11:34:16.841 INFO (MainThread) [supervisor.auth] Successful login for 'homeassistant'
2024-07-26 11:34:58.045 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API error: Cannot proxy websocket message of unsupported type: 257
2024-07-26 11:34:58.046 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API for a0d7b954_vscode closed
2024-07-26 11:35:29.047 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:35:41.044 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:35:41.044 ERROR (MainThread) [supervisor.misc.tasks] Watchdog missed 2 Home Assistant Core API responses in a row. Restarting Home Assistant Core!
2024-07-26 11:35:41.057 INFO (SyncWorker_3) [supervisor.docker.manager] Restarting homeassistant
2024-07-26 11:36:01.045 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:36:33.044 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:37:05.044 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
2024-07-26 11:37:07.323 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2024-07-26 11:37:07.748 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API request initialize
2024-07-26 11:37:07.760 INFO (MainThread) [supervisor.api.proxy] WebSocket access from a0d7b954_vscode
2024-07-26 11:37:10.188 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API request running
2024-07-26 11:39:07.394 INFO (MainThread) [supervisor.homeassistant.secrets] Request secret ssh_user
2024-07-26 11:39:07.395 INFO (MainThread) [supervisor.homeassistant.secrets] Request secret ssh_pass
2024-07-26 11:39:07.410 INFO (MainThread) [supervisor.homeassistant.secrets] Request secret db_pass
2024-07-26 11:39:41.565 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API for a0d7b954_vscode closed
2024-07-26 11:39:45.464 INFO (MainThread) [supervisor.homeassistant.core] Wait until Home Assistant is ready
2024-07-26 11:39:55.297 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state running
2024-07-26 11:39:55.927 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
2024-07-26 11:39:55.939 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to NOT_RUNNING
2024-07-26 11:39:56.122 INFO (MainThread) [supervisor.homeassistant.secrets] Request secret ssh_user
2024-07-26 11:39:56.123 INFO (MainThread) [supervisor.homeassistant.secrets] Request secret ssh_pass
2024-07-26 11:39:56.143 INFO (MainThread) [supervisor.homeassistant.secrets] Request secret db_pass
2024-07-26 11:40:27.169 INFO (MainThread) [supervisor.auth] Auth request from 'core_mosquitto' for 'homeassistant'
2024-07-26 11:40:27.859 INFO (MainThread) [supervisor.auth] Home Assistant not running, checking cache
2024-07-26 11:40:36.809 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to STARTING
2024-07-26 11:40:55.257 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to RUNNING
2024-07-26 11:40:55.257 INFO (MainThread) [supervisor.homeassistant.core] Detect a running Home Assistant instance

The Add-on logs don’t display any signs of problems.

I’ve had a similar problem once before, much more serious though, when my system had been offline for some hours. At the time, I believed it was related to MQTT and too many messages being queued for delivery. Also, my HA dashboard would break down due to too many client requests or something like that. I disabled add-ons and brought them back one by one, and in the end things stabilized (after some hours).

All this leads me to believe there may be some kind of queueing-issue. I don’t have that many devices. Around 55 Z-Wave devices (not using MQTT) and around 80+ Zigbee devices (using MQTT) and maybe 60 more devices that are either EspHome, Bluetooth, 433 MHz or other kinds of WiFi and Wired devices. I have mostly mains-powered ZWave devices; so they may generate a fair amount of traffic. Maybe when I take the Add-on down for maintenance, messages queue up and things are congested when all the awaiting messages are to be processed?

I don’t think it’s directly a hardware issue, because resource-wise my raspberry seems to have a lot of headroom; but it may be a supervisor-issue with limitations on open file handles or something like that (wildly guessing now).

spanzetta · September 5, 2024, 7:50am

I did a test for over a month…

I disabled completelly MQTT … and my Raspberry was running withour restarting for up to 7 days!!

After a month I re-enabled MQTT… and again I have several reboots every day (between 1 and 5/10 every day… it vary every day)

An obvious conclusion is that… MQTT is the root cause… or it is just a case?

NathanCu · September 5, 2024, 7:55am

It could be just a contribution.

For instance, maybe you have a race condition in an automation. (absolutely can crash your server. Ask me how I know)

…But the automation deals with a lamp provided by Z2M through MQTT. When MQTT is down the automation is broken… So ‘technically’ the race conditions don’t exist anymore. But the important part is no crash.

Is MQTT at fault here no. The automation is but MQTT is absolutely an unwilling participant in the crash.

So cause. Not. Proven
Contributing factors? Absolutely and it could help inform the exploration further…