HassIO stops responding every so often

benzo · September 30, 2020, 12:30pm

The are too many SD Card classifications… SDHC II U3 C10 V60 A2

benzo · September 30, 2020, 12:30pm

Thanks again (:

baizinger · October 1, 2020, 7:36am

This is the one i used. But i think starting all over again with my .yaml files solved the problem for me.

r33b · October 5, 2020, 9:37am

After the troubles i mentioned above, my system is stable now for already 17 days.

For me it was the deconz addon. I moved the conbee/deconz to a separate system. Since then there was no similar behavior anymore. Everything is running fine.

So for those who have similar problems I would suggest to try to narrow the problem down by disabling addons.

benzo · October 5, 2020, 12:34pm

No freezes since 4 days, without changing to ssd or sdcard a2. What i did is to delete the lovelace cards “picture-elements” for the two Foscam cameras, the cards example was taken from official doc (https://www.home-assistant.io/integrations/foscam/) . With your sensor for the IO wait i constantly monitored the situation, and in only one time for now the spike went to 76, sometimes 50, but the average is 25. Is it a “normal” situation having 25 averge?
Thank you again!

PS The CPU load never went over 25%.

Recte · October 6, 2020, 2:18pm

I am no linux expert but as far as I know, the IOWait percentage tells you the time the CPU is waiting for the IO to read/write. So what you are basically saying is that your CPU is 25% of its time waiting for your IO to finish.

As a result it most likely shows a high load (not a percentage, but a number) and a relatively low CPU usage percentage. The reason for that is that the instructions keep coming, but the CPU is patiently waiting for the IO to finish before it can start processing.

So simply said, the instruction “show lovelace” or “process login” is in the load queue, because the IO isn’t ready and the CPU is waiting.

FYI. My current average IOWait is about 0% since the switch to the SSD. The spikes I see are about once or twice an hour and hit 2% at most. CPU 3-6% and my 15m load average over a period of two hours is between 0.64 and 0.89

At the SD-Card times it was hitting 100% IOWaits as you can see in the image, even though I already had reduced the write actions.

I hope this helps.

benzo · October 6, 2020, 6:40pm

Thank you very much

benzo · October 6, 2020, 6:41pm

Hello again, did you tested the IO waits with @Recte sensor and the CPU load, before and after changing sd card?

baizinger · October 8, 2020, 7:19am

Hey. i tracked the CPU Load and Memory.

Original SD Card: CPU goes up to 75% in one day - crash
New SD Card with Snapshot recover: CPU goes up to 75% in one day / deconz stuff is unavailable - crash
New SD Card with complete new configuration of addons, yaml and NO! Snapshots: CPU max is
about 25% at peak - no crash since then.

My system is running since 2 weeks without any reboot of the pi.

In my opinion something went wrong in an Update a while ago (probably deconz) and my system never recovered from it.

BamaBlueCollar · October 17, 2020, 5:39pm

I was running into a similar problem with a Pi 3b+ running the system on a USB stick. I was able to eliminate the problem simply by using a MySql server running on another computer to replace the builtin recorder.

jaruba · November 5, 2020, 7:46am

I just went through this whole thread because I have the same issues as most of you for some months now.

What I’ve learned from this thread:

most use HA on an RPI
many use a SSD
many (if not all) seem to be using the Deconz addon

I also run HA on an RPI 3B+, use a SSD and the Deconz addon. In my case the system can lock up around 1-2 days from a restart. I disabled the Deconz addon and the RPI stopped locking up.

Finding any good logs for the cause of this seems to be an issue for me too. I tend to think this could be a HA recorder or logbook issue somehow…

I’m thinking of modifying the default config to disable the recorder as a test…

jaruba · November 8, 2020, 12:40am

This was clearly a hardware limitation of the RPI…
I (finally) managed to fix my issues by optimising the more intensive integrations.
I set memory_init: 256 in Unifi Controller’s config, disabled query log in AdGuard Home, and tweaked the recorder with these settings:

recorder:
  purge_keep_days: 1
  commit_interval: 10

Not only does it not hang anymore, all the automations seem a lot faster too.

skynet35 · December 19, 2020, 8:52am

Hi everybody, for 1 week, i had the same problem of you, HA crashed every day, every 24H. (config: NUC with proxmox, and home assistant OS) All other VM not crashed, so problem came from HA.
I removed one by one the last plugins i installed, and bingo, after removed motion eyes, HA stopped crash. Maybe it can help some persons.

chris4 · December 20, 2020, 9:54pm

Have the same problem as the first post I have RPi4 2Gb on SSD
HassIO on RPi4 goes into sleep mode after a while, do not start again must hard reset for it to start again.

skynet35 · December 21, 2020, 12:30pm

After 2 days without crash, HA crashed again this morning. Problem not come from proxmox because my Win10 VM work fine, and 1 debian, and one pihole to.
I don’t know why this fuck home assistant crash every day, every average 24h. I haven’t problem on raspberry pi (but pi it to slow face of NUC)

shax · January 1, 2021, 8:29am

I have same problems like everyone else, but somewhat different scenario.
I’m seeing exactly same log as @pdkwork

ERROR (MainThread) [hassio.store.git] Can’t update https://github.com/home-assistant/hassio-addons 1 repo: Cmd(‘git’) failed due to: exit code(128)

but this was hour and a half before it went down, so possible missleading.
Now, my HA has been installed and running for less than 24 hours, in a VM.
I have only addons: ESPHome, FileEditor, MQTT, Met.no Forecast, UPNP/IGD Broadcom and Mobile App.

So it’s hard to believe this is linked to a single plugin, or to HA db database getting to big or “too old”. And for sure it doesn’t have nothing to do with updates since this is clean install.
After it was unresponsive to both browser and mobile app I’ve tried to do supervisor repair, reload and restart, all without any success. Memory was at about 5%, underlaying disk is SSD with plenty of space.
Only after reboot of VM HA came back to life.

Foosman · January 11, 2021, 11:11pm

I’m dealing with the same non-communciation issues with Hassio - and I’m running the latest OS and HASS software as of today. (it still stops communicating <24hrs).

I pretty much have every integration you can conceive of - so a good method of diagnosing will be helpful for me.

Hey specifically to @jaruba thanks for your work digesting such a huge thread. As a followup - are you content with your fix to the recorder here? Or have you discovered anything new? Thanks!

jaruba · January 12, 2021, 10:27am

@Foosman I am not content with that fix actually, it worked ok for some time then I added a few more devices to Deconz and it started happening again (maybe it would of started happening again anyway)

So i increased the commit_interval to 30, but it still froze up once in a while. I then moved the HA instance (and updated to latest HA version) from an RPI 3B+ to an Intel Nuc (NUC10I5FNH2) (running HA Supervised in Docker on the Nuc) with an M.2 NVMe SSD. HA never froze after that, but the funny thing is that all my zigbee devices would just go offline and never come back online with the same frequency as the previous HA freezes. (and rebooting HA would fix it) I updated the Conbee II firmware to latest and the Deconz addon also to latest and haven’t had any issue since.

So the Nuc certainly helped, and updating HA, Conbee II and Deconz was the fix IMO. (tbh, after my experience I would say it was a Conbee II firmware issue to begin with in my case, but the limited RPI 3B+ resources probably had a hand in it too)

gafain · February 22, 2021, 8:19am

Hello,
I have same problem. Hassio stop responding about every day, and sometime also twice in a day.
I run Hassio in docker on PI4 4GB.
Yesterday I replace the SD CARD, but nothing today at 3.30 another stop.
When I reboot I get all logs empty. So I cannot check anything.
In past days I see that when I have internet connections problems also Home Assistant stop to ping in the lan interface.
I don’t understand if the problem is related or no.
I build a nice work with home assistant but is completely un-usable because of this continuos locks.

When locks, I attach a monitor and Keyboard but don’t respond on any keyboard command.

How I can check the all logs ?

JeromeO · February 22, 2021, 6:19pm

In my specific case, I wiped off the SD card, restored a working backup a few days old and deleted the DB file to start fresh and get rid of the errors in the logs regarding the ‘malformed’ DB.
My HA has been stable so far and I don’t see any error messages regarding the DB in the logs anymore