HA is randomly restarting

sh00t2kill · July 20, 2021, 10:40pm

I haven’t had a restart since upgrading to 7.3

33 hours and counting.

sh00t2kill · July 21, 2021, 2:18pm

Spoke too soon – 3 restarts in the last 2 hours.
I have a feelings its to do with available memory, but i cant be 100% sure.

These logs seems to correspond with the restart.

21-07-21 23:31:55 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:31:58 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API connection is closed
21-07-21 23:32:34 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:33:10 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:33:15 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:33:15 WARNING (MainThread) [supervisor.misc.tasks] Watchdog miss API response from Home Assistant
21-07-21 23:33:46 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:34:22 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:

21-07-21 23:34:58 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:35:34 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:35:46 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:35:46 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
21-07-21 23:35:46 INFO (SyncWorker_3) [supervisor.docker.interface] Restarting ghcr.io/home-assistant/qemux86-64-homeassistant

ariel · July 29, 2021, 2:51pm

I’m still on 2021.7.2, I noticed that it can go a few days without a restart and then 2-3 restarts happen on the same day. It’s quite annoying but I’ll wait until .8.x is released, waiting as well for the latest zwave-js stack to stabilize.

sh00t2kill · July 30, 2021, 5:39am

I think I may have narrowed it down to unauthenticated browser sessions.

I have a desktop and a work laptop at home, with a kvm. My desktop had a bunch of ha tabs open. I closed them all, and it hasn’t restarted since.

153 hours and counting. Of course it will restart this evening and prove me wrong.

ariel · July 30, 2021, 2:20pm

I also have a desktop browser tab permanently open on HA, in addition to the android app. But I notice restarts happen randomly, the last one was at 3am, when the laptop is off. And I’ve been using this setup for years now. Over the next few weeks I will be migrating HA to an RPi4, newer kernel, 64bits, etc. Hopefully that’'s going to do it. It doesn’t look like it’s many of us having this problem.

sh00t2kill · August 11, 2021, 12:32pm

Its only gotten worse since .8

21-08-11 22:14:47 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
21-08-11 22:14:47 INFO (SyncWorker_1) [supervisor.docker.interface] Restarting ghcr.io/home-assistant/qemux86-64-homeassistant

The supervisor is detecting an issue, and restarting HA. Ive been in the middle of using it when this happens, so im not sure if the supervisor is misdiagnosing an issue, or if something actually _IS going awry.

ariel · August 11, 2021, 1:59pm

In my case, touch wood, I think I may have found the problem. I activated debug logs (normally I only track “fatal” logs to spare the sdcard in the RPi) and found periodic issues (every few minutes) from the recorder integration, something like “id field not existing in database” or schema. I use an external mysql server for the HA DB (again to spare the sdcard) and typically prior to a major update I remove all records (to avoid having to wait for a conversion) but for a couple of years I have not deleted it completely so that HA can fully recreate it. So I did just that, restarted, and the error were gone. I’ve been optimizing other things and have been restarting manually but in-between, HA never again restarted by itself in the last couple of days. Before, it would restart itself after anywhere from 2 hours to 8 hours.

I am guessing the hard crashes were due to some bug in python’s mysql driver that eventually gets triggered after a massive number of failures if the mysql db has the wrong schema. There was probably some schema change somewhere over the few couple of releases that in my case didn’t go well for whatever reason.
I’ll report back if self-restarts reoccur

p.s. I run HA on docker, RPi4, no supervisor, zwavejs2mqtt.

sh00t2kill · August 11, 2021, 2:11pm

I also use mysql, so thats sometihng for me to look at!
It may also explain why it seems to be better for a while if i reboot the host. I will enable debug logs and see what happens!

sh00t2kill · August 13, 2021, 4:08am

The other thing ive noticed since it started happening is this:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

3249272 root 20 0 650760 415636 40836 R 71.6 10.5 17:52.46 python3

the cpu usage by python is … high!

the process in question is python3 -m homeassistant --config /config

ariel · August 14, 2021, 9:14pm

Bad news, restarts continue albeit at a much lower rate (once every 36-48h).

Continuing with the debug log analysis, I am observing this as the last message prior to the “restart”

2021-08-14 15:07:38 CRITICAL (stream_worker) [libav.generic] Assertion next_dts <= 0x7fffffff failed at libavformat/movenc.c:1026

So next step is to tweak the stream integration or remove it. Hopefully we won’t need to remove it because it’s really useful

sh00t2kill · August 15, 2021, 12:10pm

Have you got any custom components integated ?
Ive got a few. For the last couple of days, restarts have been atleast a couple of times a day. I went through all the custom components i have installed and updated them all, and havent had a restart since.

sh00t2kill · August 16, 2021, 2:41am

Spoke too soon — and when i had debug logging turned on, i couldnt find any critical errors anywhere!

ariel · August 16, 2021, 1:11pm

Yes I do have a few custom components. In my list of things to try I included disabling them one by one. Right now I am testing removing some “ffmpeg optional arguments” that I was using for one of my cameras to rotate the image, in one of my camera integrations (custom component for tplink/tapo). Trying to correlate the start of the problem with config changes that was the one that came closest in time. I noticed that people have been complaining about stability of both ffmpeg and stream components for a while, often when they have non-default or unusual configurations. 30 hours and counting since the last restart.

sh00t2kill · August 17, 2021, 4:28am

I thought i was going well, got to 24 hours.
I have, however, found an exception in the logs around the time of the reboot.

Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/unix_events.py", line 42, in _sighandler_noop
    def _sighandler_noop(signum, frame):
BlockingIOError: [Errno 11] Resource temporarily unavailable
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.

And this is the only other error i can see around that time
2021-08-17 13:52:54 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer

sh00t2kill · August 22, 2021, 6:15am

I seem to have solved the problem, but I don’t have a smoking gun.

Using a combination of logs and the commit history in my config file, I put it down to a combination of 2 different things.

the samsung tizen custom component. While I can’t be 100% certain, I believe this is the main cause of my issue. Once I disabled this, the restarts stopped.
I have an iotawatt, and had a rest sensor setup to its json endpoint. I moved all this into emoncms and used the emoncms integration instead. However, this alone didn’t stop the restarts, but it did make them less frequent. I had a large number of warnings re the rest sensor not updating in its refresh interval.

I’m at 24 hours and counting using a different samsung tv custom component.

ariel · August 22, 2021, 5:38pm

In my case it was the stream component. After checking forums, I found that “stream” has a reputation for instability for many people. Component removed 4 days ago: no more crashes nor entries in the log, I didn’t lose any functionality, i really almost never open the live streams on HA, I just check for packages in the porch or baby status on the still images.
I get overall less CPU usage unless I open the live streams, but the RPi4 can handle it and again, I almost never use live streams. And I also added rtsp hardlinks on the picture-glance card that open directly on VLC when clicked in case I need to leave live streams open for a while (works on android, mac and linux), so the cpu load on HA is zero, even less than with “stream” activated.

Now looking back the crashes may have started after playing with the “preload” feature in the picture card of the Hikvision cam, or maybe when I added that second cam. Anyways since the system was so brittle I just prefer to remove stream and be sure that HA is rock solid no matter what settings are done in the UI

flobidan · March 31, 2022, 10:26am

Since two days I also have the issue of random reboots.
Power should be fine, was also no issue the months before.
I already tried a new SD card with a fresh image and my backup, same behaviour.

If I look at the log for the last restart, I see

2022-03-31 12:11:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/15d21743_samba_backup/stats request
2022-03-31 12:11:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_configurator/stats request
2022-03-31 12:11:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-31 12:11:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_bitwarden/stats request
2022-03-31 12:11:00 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data: 
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mariadb/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_sonweb/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_adguard/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_configurator/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nginxproxymanager/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_bitwarden/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/15d21743_samba_backup/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_samba/stats request
2022-03-31 12:16:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_ssh/stats request
2022-03-31 12:16:11 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data:

So looks like somehow the supervisor died, and with it all add-ons, leading to a reboot?!
Strange also that first just a few addons timed out, and five minutes later (without any other log entry in between, which usually occur more frequently) basically all…

skynet01 · March 4, 2023, 12:03am

Is there a guide somewhere on how someone can troubleshoot this? My HA randomly restarts as well (all other addons run fine) and there is nothing in the logs. It randomly started happening a few months ago.

sh00t2kill · March 4, 2023, 12:38am

It’s happened to me twice.

Both times I’ve started disabling custom components and one of them has been the cause.

Luckily for me they were things I could do without.

skynet01 · March 6, 2023, 7:19pm

Tracked it. Looks like I suffered the fate of the famous Stream component like @ariel did. You’d think there would be some safeguards there so it wouldn’t crash the whole HA if it’s having issues with a camera stream.