HA is randomly restarting

sh00t2kill · July 13, 2021, 11:44pm

Hi all.
Bit of a weird one here.

Im r unning supervised on docker, and since i upgraded past 2021.6.3, HA has been randomly restarting. Its normally late in the evening, but this morning happened at 9am.

The only thing i can find that might be relevant is this in the supervisor log

21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.check] System checks complete
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
21-07-14 08:58:19 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-07-14 08:58:19 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
21-07-14 08:58:19 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete

Any ideas ?

jaaem · July 13, 2021, 11:58pm

About a year ago, mine rebooted like crazy after updating when running on a Raspberry PI4 with only 1 gig ram, even with HassOS. Upgraded to an old laptop, debian supervised, and has been great ever since.

ariel · July 14, 2021, 11:44pm

@sh00t2kill I noticed the same, since 2021.6.x. Now running 2021.7.2 and same thing.

Running directly on docker (no supervisor). More details posted here, Home Assistant randomly restarting by itself

I did notice some errors in the log related to the Stream component but could not yet correlate those to the random restarts. I have two cameras, one is ONVIF. I use zwavejs2mqtt (in another docker container). Any similarity with your setup?

I wouldn’t have noticed the restarts were it not for a telegram notification that I have configured to be sent during HA startup.

sh00t2kill · July 16, 2021, 11:47am

I do also have stream setup with some rstp cameras from my dahua nvr.

I haven’t noticed any errors but I’ll go looking for any!

sh00t2kill · July 19, 2021, 8:00am

I have a HA app notification on HA start too – which is how i discovered this

Ive added a bunch of template sensors around uptime, so i can see if theres any kind of pattern to it.

sh00t2kill · July 20, 2021, 10:40pm

I haven’t had a restart since upgrading to 7.3

33 hours and counting.

sh00t2kill · July 21, 2021, 2:18pm

Spoke too soon – 3 restarts in the last 2 hours.
I have a feelings its to do with available memory, but i cant be 100% sure.

These logs seems to correspond with the restart.

21-07-21 23:31:55 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:31:58 INFO (MainThread) [supervisor.api.proxy] Home Assistant WebSocket API connection is closed
21-07-21 23:32:34 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:33:10 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:33:15 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:33:15 WARNING (MainThread) [supervisor.misc.tasks] Watchdog miss API response from Home Assistant
21-07-21 23:33:46 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:34:22 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:

21-07-21 23:34:58 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:35:34 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:35:46 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:
21-07-21 23:35:46 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
21-07-21 23:35:46 INFO (SyncWorker_3) [supervisor.docker.interface] Restarting ghcr.io/home-assistant/qemux86-64-homeassistant

ariel · July 29, 2021, 2:51pm

I’m still on 2021.7.2, I noticed that it can go a few days without a restart and then 2-3 restarts happen on the same day. It’s quite annoying but I’ll wait until .8.x is released, waiting as well for the latest zwave-js stack to stabilize.

sh00t2kill · July 30, 2021, 5:39am

I think I may have narrowed it down to unauthenticated browser sessions.

I have a desktop and a work laptop at home, with a kvm. My desktop had a bunch of ha tabs open. I closed them all, and it hasn’t restarted since.

153 hours and counting. Of course it will restart this evening and prove me wrong.

ariel · July 30, 2021, 2:20pm

I also have a desktop browser tab permanently open on HA, in addition to the android app. But I notice restarts happen randomly, the last one was at 3am, when the laptop is off. And I’ve been using this setup for years now. Over the next few weeks I will be migrating HA to an RPi4, newer kernel, 64bits, etc. Hopefully that’'s going to do it. It doesn’t look like it’s many of us having this problem.

sh00t2kill · August 11, 2021, 12:32pm

Its only gotten worse since .8

21-08-11 22:14:47 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
21-08-11 22:14:47 INFO (SyncWorker_1) [supervisor.docker.interface] Restarting ghcr.io/home-assistant/qemux86-64-homeassistant

The supervisor is detecting an issue, and restarting HA. Ive been in the middle of using it when this happens, so im not sure if the supervisor is misdiagnosing an issue, or if something actually _IS going awry.

ariel · August 11, 2021, 1:59pm

In my case, touch wood, I think I may have found the problem. I activated debug logs (normally I only track “fatal” logs to spare the sdcard in the RPi) and found periodic issues (every few minutes) from the recorder integration, something like “id field not existing in database” or schema. I use an external mysql server for the HA DB (again to spare the sdcard) and typically prior to a major update I remove all records (to avoid having to wait for a conversion) but for a couple of years I have not deleted it completely so that HA can fully recreate it. So I did just that, restarted, and the error were gone. I’ve been optimizing other things and have been restarting manually but in-between, HA never again restarted by itself in the last couple of days. Before, it would restart itself after anywhere from 2 hours to 8 hours.

I am guessing the hard crashes were due to some bug in python’s mysql driver that eventually gets triggered after a massive number of failures if the mysql db has the wrong schema. There was probably some schema change somewhere over the few couple of releases that in my case didn’t go well for whatever reason.
I’ll report back if self-restarts reoccur

p.s. I run HA on docker, RPi4, no supervisor, zwavejs2mqtt.

sh00t2kill · August 11, 2021, 2:11pm

I also use mysql, so thats sometihng for me to look at!
It may also explain why it seems to be better for a while if i reboot the host. I will enable debug logs and see what happens!

sh00t2kill · August 13, 2021, 4:08am

The other thing ive noticed since it started happening is this:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

3249272 root 20 0 650760 415636 40836 R 71.6 10.5 17:52.46 python3

the cpu usage by python is … high!

the process in question is python3 -m homeassistant --config /config

ariel · August 14, 2021, 9:14pm

Bad news, restarts continue albeit at a much lower rate (once every 36-48h).

Continuing with the debug log analysis, I am observing this as the last message prior to the “restart”

2021-08-14 15:07:38 CRITICAL (stream_worker) [libav.generic] Assertion next_dts <= 0x7fffffff failed at libavformat/movenc.c:1026

So next step is to tweak the stream integration or remove it. Hopefully we won’t need to remove it because it’s really useful

sh00t2kill · August 15, 2021, 12:10pm

Have you got any custom components integated ?
Ive got a few. For the last couple of days, restarts have been atleast a couple of times a day. I went through all the custom components i have installed and updated them all, and havent had a restart since.

sh00t2kill · August 16, 2021, 2:41am

Spoke too soon — and when i had debug logging turned on, i couldnt find any critical errors anywhere!

ariel · August 16, 2021, 1:11pm

Yes I do have a few custom components. In my list of things to try I included disabling them one by one. Right now I am testing removing some “ffmpeg optional arguments” that I was using for one of my cameras to rotate the image, in one of my camera integrations (custom component for tplink/tapo). Trying to correlate the start of the problem with config changes that was the one that came closest in time. I noticed that people have been complaining about stability of both ffmpeg and stream components for a while, often when they have non-default or unusual configurations. 30 hours and counting since the last restart.

sh00t2kill · August 17, 2021, 4:28am

I thought i was going well, got to 24 hours.
I have, however, found an exception in the logs around the time of the reboot.

Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/unix_events.py", line 42, in _sighandler_noop
    def _sighandler_noop(signum, frame):
BlockingIOError: [Errno 11] Resource temporarily unavailable
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.

And this is the only other error i can see around that time
2021-08-17 13:52:54 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer

sh00t2kill · August 22, 2021, 6:15am

I seem to have solved the problem, but I don’t have a smoking gun.

Using a combination of logs and the commit history in my config file, I put it down to a combination of 2 different things.

the samsung tizen custom component. While I can’t be 100% certain, I believe this is the main cause of my issue. Once I disabled this, the restarts stopped.
I have an iotawatt, and had a rest sensor setup to its json endpoint. I moved all this into emoncms and used the emoncms integration instead. However, this alone didn’t stop the restarts, but it did make them less frequent. I had a large number of warnings re the rest sensor not updating in its refresh interval.

I’m at 24 hours and counting using a different samsung tv custom component.