Im r unning supervised on docker, and since i upgraded past 2021.6.3, HA has been randomly restarting. Its normally late in the evening, but this morning happened at 9am.
The only thing i can find that might be relevant is this in the supervisor log
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.check] System checks complete
21-07-14 08:58:18 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
21-07-14 08:58:19 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-07-14 08:58:19 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
21-07-14 08:58:19 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
About a year ago, mine rebooted like crazy after updating when running on a Raspberry PI4 with only 1 gig ram, even with HassOS. Upgraded to an old laptop, debian supervised, and has been great ever since.
I did notice some errors in the log related to the Stream component but could not yet correlate those to the random restarts. I have two cameras, one is ONVIF. I use zwavejs2mqtt (in another docker container). Any similarity with your setup?
I wouldn’t have noticed the restarts were it not for a telegram notification that I have configured to be sent during HA startup.
I’m still on 2021.7.2, I noticed that it can go a few days without a restart and then 2-3 restarts happen on the same day. It’s quite annoying but I’ll wait until .8.x is released, waiting as well for the latest zwave-js stack to stabilize.
I also have a desktop browser tab permanently open on HA, in addition to the android app. But I notice restarts happen randomly, the last one was at 3am, when the laptop is off. And I’ve been using this setup for years now. Over the next few weeks I will be migrating HA to an RPi4, newer kernel, 64bits, etc. Hopefully that’'s going to do it. It doesn’t look like it’s many of us having this problem.
21-08-11 22:14:47 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
21-08-11 22:14:47 INFO (SyncWorker_1) [supervisor.docker.interface] Restarting ghcr.io/home-assistant/qemux86-64-homeassistant
The supervisor is detecting an issue, and restarting HA. Ive been in the middle of using it when this happens, so im not sure if the supervisor is misdiagnosing an issue, or if something actually _IS going awry.
In my case, touch wood, I think I may have found the problem. I activated debug logs (normally I only track “fatal” logs to spare the sdcard in the RPi) and found periodic issues (every few minutes) from the recorder integration, something like “id field not existing in database” or schema. I use an external mysql server for the HA DB (again to spare the sdcard) and typically prior to a major update I remove all records (to avoid having to wait for a conversion) but for a couple of years I have not deleted it completely so that HA can fully recreate it. So I did just that, restarted, and the error were gone. I’ve been optimizing other things and have been restarting manually but in-between, HA never again restarted by itself in the last couple of days. Before, it would restart itself after anywhere from 2 hours to 8 hours.
I am guessing the hard crashes were due to some bug in python’s mysql driver that eventually gets triggered after a massive number of failures if the mysql db has the wrong schema. There was probably some schema change somewhere over the few couple of releases that in my case didn’t go well for whatever reason.
I’ll report back if self-restarts reoccur
p.s. I run HA on docker, RPi4, no supervisor, zwavejs2mqtt.
I also use mysql, so thats sometihng for me to look at!
It may also explain why it seems to be better for a while if i reboot the host. I will enable debug logs and see what happens!
Have you got any custom components integated ?
Ive got a few. For the last couple of days, restarts have been atleast a couple of times a day. I went through all the custom components i have installed and updated them all, and havent had a restart since.
Yes I do have a few custom components. In my list of things to try I included disabling them one by one. Right now I am testing removing some “ffmpeg optional arguments” that I was using for one of my cameras to rotate the image, in one of my camera integrations (custom component for tplink/tapo). Trying to correlate the start of the problem with config changes that was the one that came closest in time. I noticed that people have been complaining about stability of both ffmpeg and stream components for a while, often when they have non-default or unusual configurations. 30 hours and counting since the last restart.
I thought i was going well, got to 24 hours.
I have, however, found an exception in the logs around the time of the reboot.
Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/asyncio/unix_events.py", line 42, in _sighandler_noop
def _sighandler_noop(signum, frame):
BlockingIOError: [Errno 11] Resource temporarily unavailable
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
And this is the only other error i can see around that time 2021-08-17 13:52:54 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer
I seem to have solved the problem, but I don’t have a smoking gun.
Using a combination of logs and the commit history in my config file, I put it down to a combination of 2 different things.
the samsung tizen custom component. While I can’t be 100% certain, I believe this is the main cause of my issue. Once I disabled this, the restarts stopped.
I have an iotawatt, and had a rest sensor setup to its json endpoint. I moved all this into emoncms and used the emoncms integration instead. However, this alone didn’t stop the restarts, but it did make them less frequent. I had a large number of warnings re the rest sensor not updating in its refresh interval.
I’m at 24 hours and counting using a different samsung tv custom component.