Homeassistant goes bananas with 100% cpu

It’s an issue Im trying to deal with for a while now.
I have home assistant installed inside a docker container on a rpi 4b. Sometimes, quite often, but not consistently, home assistant stops responding all together. Examining the logs shows no new logs from the second it starts exploding, and the last logs showing the following (right now, not consistently - although that timeout at the end is quite consistent):

2024-11-27 06:21:39.427 WARNING (MainThread) [custom_components.localtuya.common] [561...ac7] Disconnected - waiting for discovery broadcast
2024-11-27 06:21:47.645 ERROR (MainThread) [custom_components.monitor_docker.helpers] [Docker] clalit-checker: Container not available anymore (3b) ()
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 663, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 538, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1562, in _create_connection
    _, proto = await self._loop.create_unix_connection(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/unix_events.py", line 275, in create_unix_connection
    transport, protocol = await self._create_connection_transport(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1182, in _create_connection_transport
    await waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/monitor_docker/helpers.py", line 784, in _run
    await self._run_container_info()
  File "/config/custom_components/monitor_docker/helpers.py", line 834, in _run_container_info
    raw: dict = await self._container.show()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/containers.py", line 242, in show
    data = await self.docker._query_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 319, in _query_json
    async with self._query(
  File "/usr/local/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 231, in _query
    yield await self._do_query(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 267, in _do_query
    response = await self.session.request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 578, in _request
    with timer:
  File "/usr/local/lib/python3.12/site-packages/aiohttp/helpers.py", line 749, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError
2024-11-27 06:21:50.017 ERROR (MainThread) [custom_components.monitor_docker.helpers] [Docker] home-assistant: Container not available anymore (3b) ()
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1059, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 644, in read
    await self._waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/monitor_docker/helpers.py", line 788, in _run
    await self._run_container_stats()
  File "/config/custom_components/monitor_docker/helpers.py", line 908, in _run_container_stats
    rawarr = await self._container.stats(stream=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/containers.py", line 396, in _stats_list
    async with cm as response:
  File "/usr/local/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 231, in _query
    yield await self._do_query(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 267, in _do_query
    response = await self.session.request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 690, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1054, in start
    with self._timer:
  File "/usr/local/lib/python3.12/site-packages/aiohttp/helpers.py", line 749, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError
2024-11-27 06:21:50.320 ERROR (MainThread) [custom_components.monitor_docker.helpers] [Docker] grafana: Container not available anymore (3b) ()
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1059, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 644, in read
    await self._waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/monitor_docker/helpers.py", line 784, in _run
    await self._run_container_info()
  File "/config/custom_components/monitor_docker/helpers.py", line 834, in _run_container_info
    raw: dict = await self._container.show()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/containers.py", line 242, in show
    data = await self.docker._query_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 319, in _query_json
    async with self._query(
  File "/usr/local/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 231, in _query
    yield await self._do_query(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiodocker/docker.py", line 267, in _do_query
    response = await self.session.request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 690, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1054, in start
    with self._timer:
  File "/usr/local/lib/python3.12/site-packages/aiohttp/helpers.py", line 749, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError

I saw that the monitor_docker custom integration Im using is quite a star in those logs, but even after disabling and removing it the problem still repeats itself.

Docker ps command on the host machine shows 100% cpu for the home assistant container.
After restarting the container everything goes back to normal, but it repeats itself.

I looked online and saw this post:

But for varius reasons the installation of py-spy inside the container failed :frowning:
Any suggestions of what I can do to troubleshoot and fix that issue?

Thanks a lot!

Restart in safe mode. This disables all custom components and then enable them one by one.

Your monitor-docker complains about a missing container. Could one of the add-ons be the problem? Also the first line of your log says that localtuya has disconnected.

do you use the ltss integration? it causes memory leaks leading to cpu overload and crash

Methods involving profiler described in documentation somehow did not help me. I wasn’t able to extract meaningful informations out of them. Since HA was crashing after up to 32h I opted for disabling all integrations and then enable them one by one.