HASS OS becomes unresponsive / then almost unusable / and is finally dead - starting every ~ 10 hours after last HA start

e-raser · December 30, 2020, 10:45pm

Strange: I removed the trakt integration using the integrations section and rebooted even twice meanwhile. Anyway I found this warning in the HA log again now:

Logger: homeassistant.loader
Source: loader.py:465
First occurred: 23:36:55 (1 occurrences)
Last logged: 23:36:55
You are using a custom integration for trakt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

Where does this log entry come from? The integration is even not shown on the http://homeassistant:8123/config/info page anymore so I´m really wondering.

Must have been a “ghost message” cause HA log does NOT contain the [homeassistant.loader] You are using a custom integration for trakt... entry anymore on HA start.

hughhallhh56 · December 31, 2020, 3:10pm

Me too! I have been experiencing the same issue for about two months or so. For the most part I had left HA alone for over 10 months and well winter is here so time for indoor things to do. I upgraded to to HassOS 4.16 and Core 0.117.6, plus updated the addons. So, what is causing the issue isn’t going to be so straight forward to figure out. Of course I didn’t experience the issue right away so merrily added more integrations.

What I can say is that at first it tooks sometimes a week before I had the issue, but as of late it happens often in less than 24 hours. This has led me to think it is a memory leak issue. I am using HA more for sure, but the CPU load in general is quite low with a typical load of .5. So last night I removed some integrations that were memory heavy to leave the system with lots of unused memory. I had about 250K free after reboot. There was 180K when I went to bed and this morning I am down to 95K. We will see how it goes.

e-raser · December 31, 2020, 4:08pm

I also suspect either memory or “CPU_IOWAIT” related issues. I therefore ordered another hardware, Pi 4 with 8 GB of memory should be more than overkill for current ~ 850 MB RAM consumption. That way I can sort out other possible root causes. If I´ll experience same issues I only have two options left:

switch SD card (which is brand new by the way)
switch whole platform (test with HASSIO image for VMWare, I can run it temporarily on a Windows 10 machine)

System is so heavily unusable, meanwhile it only takes few hours to the ‘situation of death’ where only hard pulling power plug “resolves” the issue. Last reboot was at 2 pm, now 3 hours later it again starts to freak out.

Again I see this message in home-assistant.log even I removed that integration!

Logger: homeassistant.loader
Source: loader.py:465
First occurred: 16:55:05 (1 occurrences)
Last logged: 16:55:05

You are using a custom integration for trakt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

I have no idea what else to check. All debugging options (Profiler or Glances) won´t run when it´s “too late” because the system already is unresponsive. And anyway, there are no experts here telling me what to do, unfortunately only other users with same or similiar issues.

This really is a Home Assistant blocker. I already wasted more than 3 days of holiday, 3 days in which I could build great things in my smart home. Damn it´s so frustrating.

e-raser · December 31, 2020, 4:10pm

I think that might be a good indication to follow. What is the CPU_IOWAIT and why is it that high?

e-raser · December 31, 2020, 5:31pm

See https://github.com/home-assistant/operating-system/issues/1119. Probably related to HASS OS 5.8/5.9.

bschatzow · December 31, 2020, 5:40pm

What version of the os are you on? On my pi4 I am currently using 5.2. Above this it freezes if I use the boot from ssd drive.

e-raser · December 31, 2020, 5:49pm

HASS OS 5.8 as shown on the OP system status paste. Will try 5.10 or otherwise downgrade to 5.3 or even 5.2 too. I even ordered new hardware meanwhile as an act of pure desperation

petro · December 31, 2020, 5:59pm

did you delete the files from the custom_components? Did you delete any yaml configuration for it?

e-raser · December 31, 2020, 6:01pm

If you’re referring to my trakt integration removal: yes, I deleted the custom_components folder (see How to fully remove an integration - can´t get rid of one - #2 by e-raser).

That’s for sure only a side node and wrong way when looking at HASS unstable · Issue #1119 · home-assistant/operating-system · GitHub.

petro · December 31, 2020, 6:03pm

Ok, did you remove the integration from the UI? Also, did you remove any references to it in configuration.yaml?

Edit: To clarify, that message appears if it thinks you have it integrated. So there’s 1 of 2 options: It’s still integrated in configuration -> integrations (in the ui), or it’s integrated via a config line in configuration.yaml.

Edit2: This won’t alleviate your issues, just remove that warning.

e-raser · December 31, 2020, 6:08pm

Yeah that trakt integration has been removed from the integration part using the UI, the folder was removed and there´s nothing (never was) in the configuration.yaml.

Enough talking bout that integration removal thing sorted out already, the core issue is surely another one.

gcb2018 · January 3, 2021, 9:36am

I am probably not adding anything useful but I have the same issue:

behavior: after full manual reboot it’s a 5-7hrs before it crashes (I have uptime robot to monitor if it’s up/down). Yesterday it crashed at 9pm and when I woke up this morning it was still down…
checked all intergrations and removed/cleaned up all errors I saw (still crashes)
turned off all automations (still crashes)
turned off all auto-adds & discoveries (still crashes)
moved from SD card to SSD (still crashes)
SWAP full (99.4%) / RAM high (81,8%)
I don’t have many integrations / add-ons / devices actually
Raspberry Pi 3b+
logs shows this one sometimes:

2021-01-03 08:43:31 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"

Read somewhere the crashing can be related to:

snapshot creation (and maybe the google drive back up add-on) and I have disabled that one for now. Will report back.
it being the Raspberry Pi 3b+ and upgrading to a Pi4 would do the trick as RAM would max out at 1220 which would be too much for the 3b+

Btw: I stopped all add-ons that are not crucial for me at the moment so only have the following running:

Node-RED
Mosquitto Broker
zigbee2mqtt

tom_l · January 3, 2021, 2:53pm

Yeah there’s definitely something wrong my installation too.

cpu_use

No errors in any of the logs, nothing occurring when the CPU use starts climbing. Happens every day or two.

hdehaseleer · January 5, 2021, 5:53pm

I’m relatively new to HA.
I’m running HA on a Raspberry model 4.
At this moment, I’m only using the integrations Node-red, Denon Heos, Onvif, Zwave, KNX.
I still have plenty of RAM, SSD available. CPU % is always less than 10%

The core on my Raspberry hangs nearly every week, and stops doing automation or measuring.
A ping to the PI still is OK. A connection with the UI times-out. Until now, I’m always doing a hard reboot (power off). But this empties also the log file

Is there a way to keep an archive of the log files? So that at least I can try to find the problem in the older log file from before the hard reboot.

Does a watchdog mechanism exists for the Core on my PI, which reboots automatically?

e-raser · January 6, 2021, 7:06pm

Switched from Pi 3 B+ to Pi 4 B (8 GB) yesterday. Immediate effect on the CPU and of course RAM, now no swapping etc.

1 GB of RAM definitely is an issue on a Pi device.

e-raser · January 17, 2021, 2:13am

Just a quick update (I hate unresolved topics… ) :
To everyone arriving in this topic and having similar issues I’d strongly recommend to have a look at this GitHub issue: https://github.com/home-assistant/operating-system/issues/1119#issuecomment-761696480

For me since switching hardware it is “fixed” (or “worked around”), where the issue itself still exists.

ilmec · July 5, 2021, 8:35am

Hi!
I was having the same problems that froze my raspberry pi (raspberry pi 3b 1 gb RAM with SSD).
Reading in the forums I tried to increase the size of the swap file. I followed the instructions given at the link

I also created an automation to restart the home assistant host at night.
This solved my problems (iI am convinced that increasing the swap file is sufficient, reboot the host empties the swap file but 2 gb is a lot…).

The idea came to me by creating a system monitor and noticing that the system freezes when the ram ran out and the swap file got saturated

e-raser · July 5, 2021, 9:06am

While I „fixed“ this issue for me by switching to a more powerful hardware (Pi 4 with 8 GBs) and this is more like a workaround, I need to fully agree with you: running another service on the same hardware some time before Home Assistant, I had pretty similar issues (for months) that could be prevented tweaking the SWAP file settings. Larger one and resetting it regularly is probably the best way to go to buy some time with hardware with limited RAM amount.

Aside from switching hardware this is the way to go to work around this „low RAM“ based issue. I‘ll mark this one as solution.

maxshcherbina · November 11, 2021, 8:05pm

I may be off the mark here, but when I had this issue, it was caused by my database becoming extremely bloated. I had a motion detector in a very busy part of the house that was writing its state 1000s of times an hour. This would very quickly inflate the database size. Turning off the logging of the sensor stopped this issue.

e-raser · November 12, 2021, 11:55pm

Not one sensor, not the database - more likely it’s the storage of the database which can’t handle the IOs.

Upgrading hardware helped.