HASS OS becomes unresponsive / then almost unusable / and is finally dead - starting every ~ 10 hours after last HA start

Every few hours after system/HA starts, CPU load is going up for no obvious reasons, massively slowing down whole system performance.
It started a few days ago, but as was building up the whole HA and integrated things and therefore restarted HA every now and then, I did not detect it until now.

Approximately 9 to 10 hours after latest HA restart CPU-load goes totally crazy:


(that was taken ~ 16 hours after last HA start / 6,5 hours after 1st wave / 3,5 hours after start of 2nd wave and ~ 30 minutes before complete system death)

  • Last Host start: 16:40 the day before
  • Last HA restart: 00:39
  • Start of CPU load going up: first wave 11:00, second wave 14:00
  • Time now: 17:30

Usually CPU usage is quite ok (round about 20 to 25 %) but I also see it going absolutely crazy (ignore RAM + SWAP, thatĀ“s normal for the hardware):

Because of strong limitations of the HASS OS I have absolutely no tools to closely monitor whatĀ“s really going on:

  1. Using ā€˜topā€™ / ā€˜psā€™ on CLI (SSH) shows really nothing helpful:
  2. ThereĀ“s no ā€˜iotopā€™, no ā€˜htopā€™, nothing to deep dive into whatĀ“s stressing the system
  3. No useful external tools; I think having a closer look into the system will be the only way of getting fundamental evidence for the root cause

+++ Therefore I kindly ask for any help to debug. Building my smart home on such an unreliable system is of course not an option for me. +++

Disabling all addons, removing all integrations etc. for debugging reasons is not an option by the way; it would take me at least 10 hours from last restart to see if changes have effect.

Meanwhile HA is down, SSH connection has been closed, SAMBA share and nothing else is responding, hardware is hot as hell, a hard reset is needed because HASS OS of course also does not provide a system watchdog IĀ“d normally use on a full Linux (if load-X over YY for ZZ minutes - perform system reset).

IĀ“ll try to recover and provide some information/screenshots I created yesterday when same happened, there were some more information from home-assistant.log and Supervisor log.

All system information:

System Health

version: 2020.12.1
installation_type: Home Assistant OS
dev: false
hassio: true
docker: true
virtualenv: false
python_version: 3.8.6
os_name: Linux
os_version: 5.4.79-v8
arch: aarch64
timezone: Europe/Berlin

Home Assistant Community Store 

GitHub API: ok
Github API Calls Remaining: 4878
Installed Version: 1.9.0
Stage: running
Available Repositories: 702
Installed Repositories: 20

Home Assistant Cloud 

logged_in: false
can_reach_cert_server: ok
can_reach_cloud_auth: ok
can_reach_cloud: ok

Hass.io 

host_os: Home Assistant OS 5.8
update_channel: stable
supervisor_version: 2020.12.7
docker_version: 19.03.13
disk_total: 57.8 GB
disk_used: 10.1 GB
healthy: true
supported: true
board: rpi3-64
supervisor_api: ok
version_api: ok
installed_addons: Samba share (9.3.0), deCONZ (6.6.1), Node-RED (7.2.11), CEC Scanner (2.4), AppDaemon 4 (0.3.2), Check Home Assistant configuration (3.6.0), Home Panel (1.8.3), motionEye (0.10.2), SQLite Web (2.3.2), Let's Encrypt (4.11.0), Portainer (1.3.0), Hey Ada! (1.1.1), Terminal & SSH (8.10.0), Almond (1.0.1), File editor (5.2.0), Grafana (5.3.6), Log Viewer (0.9.1), MariaDB (2.2.1), Samba Backup (4.3), InfluxDB (3.7.9), Visual Studio Code (2.9.1)

Lovelace

dashboards: 3
mode: storage
views: 17
resources: 13

There are two.

  1. Profiler
  2. Py-spy

Have a read of this topic: Python3 high CPU Usage

I feel your pain. I have a similar unidentifiable issue with resource use. Fortunately in my case I have enough resources that I can catch it happening every day or two and restart. Unfortunately I am away from home for another 4 months and canā€™t diagnose this remotely without pre-shared keys.

1 Like

I could recover some log output from home-assistant.log (Supervisor log would be much more interesting but thereĀ“s no way to access them):

Remember itĀ“s UTC times so log 23:39 + 1 is 00:39 local time (time when HA was started):

************************************************** HA START **************************************************
************************************************** HA START **************************************************
************************************************** HA START **************************************************
************************************************** HA START **************************************************
************************************************** HA START **************************************************
2020-12-29 23:37:26 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for rki_covid which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-12-29 23:37:26 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for fritzbox_tools which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-12-29 23:37:26 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-12-29 23:37:26 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for shelly which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-12-29 23:37:26 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for trakt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-12-29 23:37:38 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"
2020-12-29 23:37:38 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"
2020-12-29 23:38:29 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-29 23:38:33 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: hacs, trakt, kodi
2020-12-29 23:38:33 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"
2020-12-29 23:38:33 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"
2020-12-29 23:38:34 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-29 23:38:34 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-29 23:39:33 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: ios, upnp, kodi
2020-12-29 23:39:35 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-29 23:39:37 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-29 23:39:37 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-29 23:40:33 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: kodi
2020-12-29 23:41:39 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"
2020-12-29 23:41:39 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"
2020-12-29 23:59:13 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-29 23:59:16 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-29 23:59:21 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-30 00:02:14 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-30 00:02:17 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-30 00:02:19 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-30 00:02:34 ERROR (MainThread) [aiohttp.web] Forbidden
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/http/static.py", line 26, in _handle
    raise HTTPForbidden()
aiohttp.web_exceptions.HTTPForbidden: Forbidden
2020-12-30 00:08:00 WARNING (MainThread) [homeassistant.components.http.ban] Login attempt or request with invalid authentication from xxxxxxxxxxxxxx (192.168.xxx.xxx) (Mozilla/5.0 (iPhone; CPU iPhone OS 14_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Home Assistant/2020.7 (io.robbie.HomeAssistant; build:11; iOS 14.3.0) Mobile/HomeAssistant, like Safari)
2020-12-30 00:08:00 WARNING (MainThread) [homeassistant.components.http.ban] Login attempt or request with invalid authentication from xxxxxxxxxxxxxx (192.168.xxx.xxx) (Mozilla/5.0 (iPhone; CPU iPhone OS 14_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Home Assistant/2020.7 (io.robbie.HomeAssistant; build:11; iOS 14.3.0) Mobile/HomeAssistant, like Safari)
2020-12-30 00:08:02 ERROR (MainThread) [frontend.js.latest.202012120] http://homeassistant:8123/frontend_latest/chunk.95d66c74cae7e6ffd795.js:2:20907 TypeError: undefined is not an object (evaluating 't._leaflet_pos')
2020-12-30 00:08:02 ERROR (MainThread) [frontend.js.latest.202012120] http://homeassistant:8123/frontend_latest/chunk.95d66c74cae7e6ffd795.js:2:20907 TypeError: undefined is not an object (evaluating 't._leaflet_pos')
2020-12-30 00:35:37 ERROR (MainThread) [homeassistant.components.ipp] Error fetching ipp data: Invalid response from API: Error occurred while communicating with IPP server.
2020-12-30 00:39:41 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 00:39:42 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 00:39:43 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 01:39:46 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 01:39:47 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 01:39:48 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 02:00:20 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/jsonrpc_websocket/jsonrpc.py", line 105, in _ws_loop
    raise TransportError('Websocket error detected. Connection closed.')
jsonrpc_base.jsonrpc.TransportError: Websocket error detected. Connection closed.
2020-12-30 02:00:20 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/jsonrpc_websocket/jsonrpc.py", line 105, in _ws_loop
    raise TransportError('Websocket error detected. Connection closed.')
jsonrpc_base.jsonrpc.TransportError: Websocket error detected. Connection closed.
2020-12-30 02:00:20 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/jsonrpc_websocket/jsonrpc.py", line 105, in _ws_loop
    raise TransportError('Websocket error detected. Connection closed.')
jsonrpc_base.jsonrpc.TransportError: Websocket error detected. Connection closed.
2020-12-30 02:00:20 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/jsonrpc_websocket/jsonrpc.py", line 105, in _ws_loop
    raise TransportError('Websocket error detected. Connection closed.')
jsonrpc_base.jsonrpc.TransportError: Websocket error detected. Connection closed.
2020-12-30 02:00:20 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/jsonrpc_websocket/jsonrpc.py", line 105, in _ws_loop
    raise TransportError('Websocket error detected. Connection closed.')
jsonrpc_base.jsonrpc.TransportError: Websocket error detected. Connection closed.
2020-12-30 02:00:20 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/jsonrpc_websocket/jsonrpc.py", line 105, in _ws_loop
    raise TransportError('Websocket error detected. Connection closed.')
jsonrpc_base.jsonrpc.TransportError: Websocket error detected. Connection closed.
2020-12-30 02:23:25 WARNING (MainThread) [homeassistant.helpers.condition] Value cannot be processed as a number: <state sensor.my_fire_hd_10_akkustatus=full; friendly_name=My Fire HD 10 Akkustatus, icon=mdi:battery-charging @ 2020-12-30T03:23:25.735555+01:00> (Offending entity: full)
2020-12-30 02:23:25 WARNING (MainThread) [homeassistant.helpers.condition] Value cannot be processed as a number: <state sensor.my_fire_hd_10_akkustatus=full; friendly_name=My Fire HD 10 Akkustatus, icon=mdi:battery-charging @ 2020-12-30T03:23:25.735555+01:00> (Offending entity: full)
2020-12-30 02:36:46 WARNING (MainThread) [homeassistant.helpers.condition] Value cannot be processed as a number: <state sensor.my_fire_hd_10_akkustatus=discharging; friendly_name=My Fire HD 10 Akkustatus, icon=mdi:battery-minus @ 2020-12-30T03:36:46.674503+01:00> (Offending entity: discharging)
2020-12-30 02:36:46 WARNING (MainThread) [homeassistant.helpers.condition] Value cannot be processed as a number: <state sensor.my_fire_hd_10_akkustatus=discharging; friendly_name=My Fire HD 10 Akkustatus, icon=mdi:battery-minus @ 2020-12-30T03:36:46.674503+01:00> (Offending entity: discharging)
2020-12-30 02:39:52 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 02:39:54 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 02:39:55 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 03:03:48 ERROR (MainThread) [homeassistant.components.ipp] Error fetching ipp data: Invalid response from API: Timeout occurred while connecting to IPP server.
2020-12-30 03:24:10 WARNING (SyncWorker_7) [homeassistant.components.fritz.device_tracker] Host entry for 88:87:17:8D:C0:00 not found: 'UPnPError: \nerrorCode: 714\nerrorDescription: NoSuchEntryInArray'
2020-12-30 03:32:10 ERROR (MainThread) [hole] Can not load data from *hole: 192.168.0.99
2020-12-30 03:32:10 ERROR (MainThread) [homeassistant.components.pi_hole] Error fetching Pi-Hole data: Failed to communicating with API: Can not load data from *hole: 192.168.0.99
2020-12-30 03:39:59 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 03:40:01 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 03:40:01 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 04:40:05 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 04:40:06 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 04:40:06 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 05:40:10 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 05:40:12 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 05:40:12 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 06:18:38 WARNING (MainThread) [homeassistant.components.sensor] Updating command_line sensor took longer than the scheduled update interval 0:00:10
2020-12-30 06:18:38 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.pi_hole_status is taking over 10 seconds
2020-12-30 06:40:16 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 06:40:17 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 06:40:18 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 07:40:25 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 07:40:27 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 07:40:27 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 07:53:15 ERROR (MainThread) [homeassistant.components.ipp] Error fetching ipp data: Invalid response from API: Timeout occurred while connecting to IPP server.
2020-12-30 08:40:31 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 08:40:33 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 08:40:33 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 08:56:14 ERROR (MainThread) [homeassistant.components.ipp] Error fetching ipp data: Invalid response from API: Timeout occurred while connecting to IPP server.
2020-12-30 09:40:37 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 09:40:39 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 09:40:39 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
************************************************** ROUGH START OF 1ST WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 1ST WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 1ST WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 1ST WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 1ST WAVE OF CPU LOAD GOING CRAZY **************************************************
2020-12-30 09:58:02 WARNING (MainThread) [homeassistant.components.sensor] Updating command_line sensor took longer than the scheduled update interval 0:00:10
2020-12-30 09:58:02 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.pi_hole_status is taking over 10 seconds
2020-12-30 09:58:22 WARNING (MainThread) [homeassistant.components.sensor] Updating command_line sensor took longer than the scheduled update interval 0:00:10
2020-12-30 09:58:22 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.pi_hole_status is taking over 10 seconds
2020-12-30 10:40:44 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 10:40:46 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 10:40:47 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 11:40:51 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 11:40:53 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 11:40:53 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 12:40:57 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 12:40:59 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
2020-12-30 12:40:59 WARNING (MainThread) [custom_components.trakt] Error retrving information from api.themoviedb.org
************************************************** ROUGH START OF 2ND WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 2ND WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 2ND WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 2ND WAVE OF CPU LOAD GOING CRAZY **************************************************
************************************************** ROUGH START OF 2ND WAVE OF CPU LOAD GOING CRAZY **************************************************
*************************************** NOW EVERYTHING IS STUCK IN QUEUE, ALMOST NOTHING IS WORKING IN TIME  ****************************************
*************************************** NOW EVERYTHING IS STUCK IN QUEUE, ALMOST NOTHING IS WORKING IN TIME  ****************************************
*************************************** NOW EVERYTHING IS STUCK IN QUEUE, ALMOST NOTHING IS WORKING IN TIME  ****************************************
*************************************** NOW EVERYTHING IS STUCK IN QUEUE, ALMOST NOTHING IS WORKING IN TIME  ****************************************
*************************************** NOW EVERYTHING IS STUCK IN QUEUE, ALMOST NOTHING IS WORKING IN TIME  ****************************************
[ had to remove it because of post length restrictions ]

In your case I would start by removing:

1 Like

Raspberry Pi 3B +
With the last 118.x version I had 8 of 12 addons running at the same time, no problems.
After updating to 2020.xxx the same symptoms as described above.
Could run a maximum of 2 addons. As soon as I started another, the supervisor slowed down, CPU load went up, SWAP to 100%.
At first the supervisor was no longer available, the add-ons could be addressed for a while, then everything was dead.
Same with OS 5.9.
I gave up at first, now run Debian supervised on an old laptop.

Unfortunately it doesnā€™t help you, but maybe you donā€™t feel so alone. :wink:

(Turning off the screen gives me less than 20 watts on the laptop,
dos anyone know what a Raspberry 4 can take?)

ThatĀ“s what was my first thought too. Removed it and restarted HA, letĀ“s seeā€¦

Additionally I

  • try to setup Glances (does only run with disabled protection mode and also puts some load on the whole system but IĀ“ll buy that for temporary debugging reasons) to hope to get more information on what exactly is causing the cpu load.
  • have a look at ā€˜Profilerā€™ and ā€˜Py-spyā€™ as suggested
  • searched the whole forum and there are plenty of similar reports, sometimes with causes that make sense (SQLite recorder db with 30 gigs of size of course could cause performance issues) or are basic faults (low power supply, corrupt SD card, ā€¦) and sometimes thereĀ“s no solution

I wonĀ“t build anything with and on HA unless system runs reliably stable. Turning back with (good) news I hope.

Thank you, as I screened the whole forums I already got that IĀ“m not alone, unfortunately/luckily, depends on the point of view. But yeah thereĀ“s important information in your post which made me think and check my log:

Current behaviour has not been there from the beginning. What changed:

  • on Dec 19 I switched from Pi 2 B 32 bit to Pi 3 B+ 64 bit
  • not sure if after or before that date thereĀ“ve been HA updates (Core, Supervisor, OS, plenty - donĀ“t remember all of them)

But I can not fingerpoint to an update unless I donĀ“t know what exactly is causing the issues (whatĀ“s the root cause). IĀ“ll invest a few more hours in that.

While I find it difficult to get Py-spy running on my HASS OS setup I integrated Profiler.
What is the best way to use Profiler?

I thought of (plan A): as soon as the system starts to slow down/has CPU load issues, I start the profiler.start and profiler.memory services and take a look at Glances I have set up meanwhile.

But what after, what to do with the Profiler output? Do they contain sensitive information, is it safe to provide/upload/post? I really need assistance on this otherwise IĀ“m totally lost :frowning: :frowning: :frowning:

Or any other thoughts? Already spent 1/2 of the day on this, what a pain.

HereĀ“s some Glances output (from system freshly started without any issues), I donĀ“t know if itā€™s really critical, what do you think?

Warning or critical alerts (last 9 entries)
2020-12-30 23:07:23 (ongoing) - CPU_IOWAIT (23.0)
2020-12-30 23:06:46 (00:00:19) - WARNING on CPU_IOWAIT (23.6)
2020-12-30 23:06:10 (00:00:14) - CRITICAL on CPU_IOWAIT (26.3)
2020-12-30 23:03:36 (00:01:30) - CRITICAL on CPU_IOWAIT (32.7)
2020-12-30 23:03:01 (00:00:17) - WARNING on CPU_IOWAIT (22.7)
2020-12-30 22:37:40 (ongoing) - MEMSWAP (100.0)
2020-12-30 22:36:51 (ongoing) - MEM (84.0)
2020-12-30 22:35:59 (00:00:14) - WARNING on MEM (70.7)
2020-12-30 22:31:53 (00:04:02) - WARNING on CPU_USER (77.6)

Strange: I removed the trakt integration using the integrations section and rebooted even twice meanwhile. Anyway I found this warning in the HA log again now:

Logger: homeassistant.loader
Source: loader.py:465
First occurred: 23:36:55 (1 occurrences)
Last logged: 23:36:55
You are using a custom integration for trakt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant. 

Where does this log entry come from? The integration is even not shown on the http://homeassistant:8123/config/info page anymore so IĀ“m really wondering.

Must have been a ā€œghost messageā€ cause HA log does NOT contain the [homeassistant.loader] You are using a custom integration for trakt... entry anymore on HA start.

Me too! I have been experiencing the same issue for about two months or so. For the most part I had left HA alone for over 10 months and well winter is here so time for indoor things to do. I upgraded to to HassOS 4.16 and Core 0.117.6, plus updated the addons. So, what is causing the issue isnā€™t going to be so straight forward to figure out. Of course I didnā€™t experience the issue right away so merrily added more integrations.

What I can say is that at first it tooks sometimes a week before I had the issue, but as of late it happens often in less than 24 hours. This has led me to think it is a memory leak issue. I am using HA more for sure, but the CPU load in general is quite low with a typical load of .5. So last night I removed some integrations that were memory heavy to leave the system with lots of unused memory. I had about 250K free after reboot. There was 180K when I went to bed and this morning I am down to 95K. We will see how it goes.

I also suspect either memory or ā€œCPU_IOWAITā€ related issues. I therefore ordered another hardware, Pi 4 with 8 GB of memory should be more than overkill for current ~ 850 MB RAM consumption. That way I can sort out other possible root causes. If IĀ“ll experience same issues I only have two options left:

  • switch SD card (which is brand new by the way)
  • switch whole platform (test with HASSIO image for VMWare, I can run it temporarily on a Windows 10 machine)

System is so heavily unusable, meanwhile it only takes few hours to the ā€˜situation of deathā€™ where only hard pulling power plug ā€œresolvesā€ the issue. Last reboot was at 2 pm, now 3 hours later it again starts to freak out.

Again I see this message in home-assistant.log even I removed that integration!

Logger: homeassistant.loader
Source: loader.py:465
First occurred: 16:55:05 (1 occurrences)
Last logged: 16:55:05

You are using a custom integration for trakt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

I have no idea what else to check. All debugging options (Profiler or Glances) wonĀ“t run when itĀ“s ā€œtoo lateā€ because the system already is unresponsive. And anyway, there are no experts here telling me what to do, unfortunately only other users with same or similiar issues.

This really is a Home Assistant blocker. I already wasted more than 3 days of holiday, 3 days in which I could build great things in my smart home. Damn itĀ“s so frustrating.

I think that might be a good indication to follow. What is the CPU_IOWAIT and why is it that high?

See https://github.com/home-assistant/operating-system/issues/1119. Probably related to HASS OS 5.8/5.9.

What version of the os are you on? On my pi4 I am currently using 5.2. Above this it freezes if I use the boot from ssd drive.

HASS OS 5.8 as shown on the OP system status paste. Will try 5.10 or otherwise downgrade to 5.3 or even 5.2 too. I even ordered new hardware meanwhile as an act of pure desperation :frowning:

did you delete the files from the custom_components? Did you delete any yaml configuration for it?

If youā€™re referring to my trakt integration removal: yes, I deleted the custom_components folder (see How to fully remove an integration - canĀ“t get rid of one - #2 by e-raser).

Thatā€™s for sure only a side node and wrong way when looking at HASS unstable Ā· Issue #1119 Ā· home-assistant/operating-system Ā· GitHub.

Ok, did you remove the integration from the UI? Also, did you remove any references to it in configuration.yaml?

Edit: To clarify, that message appears if it thinks you have it integrated. So thereā€™s 1 of 2 options: Itā€™s still integrated in configuration -> integrations (in the ui), or itā€™s integrated via a config line in configuration.yaml.

Edit2: This wonā€™t alleviate your issues, just remove that warning.

Yeah that trakt integration has been removed from the integration part using the UI, the folder was removed and thereĀ“s nothing (never was) in the configuration.yaml.

Enough talking bout that integration removal thing sorted out already, the core issue is surely another one.

I am probably not adding anything useful but I have the same issue:

  • behavior: after full manual reboot itā€™s a 5-7hrs before it crashes (I have uptime robot to monitor if itā€™s up/down). Yesterday it crashed at 9pm and when I woke up this morning it was still downā€¦
  • checked all intergrations and removed/cleaned up all errors I saw (still crashes)
  • turned off all automations (still crashes)
  • turned off all auto-adds & discoveries (still crashes)
  • moved from SD card to SSD (still crashes)
  • SWAP full (99.4%) / RAM high (81,8%)
  • I donā€™t have many integrations / add-ons / devices actually
  • Raspberry Pi 3b+
  • logs shows this one sometimes:
2021-01-03 08:43:31 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"

Read somewhere the crashing can be related to:

  • snapshot creation (and maybe the google drive back up add-on) and I have disabled that one for now. Will report back.
  • it being the Raspberry Pi 3b+ and upgrading to a Pi4 would do the trick as RAM would max out at 1220 which would be too much for the 3b+

Btw: I stopped all add-ons that are not crucial for me at the moment so only have the following running:

  • Node-RED
  • Mosquitto Broker
  • zigbee2mqtt