Best way to troubleshoot Homeassistant crashes

Hi all,

Since a couple of weeks there seem to be a pattern that my Homeassistant is crashing. The graph shows 3 crashes. You can see that the memory is flying to the max, home assistant get inresponsive and everything crashes. In #1 and #2 I had my docker max memory at 2048MB and at #3 I changed it to 4096MB. After the crash home-assistant container reboots, but my flows aren’t working anymore because I have to manual restart NodeRed.

I am able to retrieve the logs (home-assistant.log.1), but the real reason doesn’t make sense to me.
It is constant:

"The above exception was the direct cause of the following exception:"

And then I have so many python exceptions.

Then I see the connection drop with NodeRed:

2023-12-21 17:40:10.619 INFO (MainThread) [custom_components.nodered.websocket] Device trigger removed: 20d8a839f137650f
2023-12-21 17:40:10.619 INFO (MainThread) [custom_components.nodered.websocket] Device trigger removed: f8c36160f0c3a8c7
2023-12-21 17:40:10.619 INFO (MainThread) [custom_components.nodered.websocket] Device trigger removed: ddbfda0a7982a74a

And then all the services seem to crash:
"Updating <random> sensor took longer than the scheduled update interval "

I also see a lot of errors of the ping plugin, but I see them all the time (also when it works fine):

2023-12-21 17:39:46.883 ERROR (MainThread) [] Error running command: `ping -n -q -c 1 -W1 5`, return code: 2
NoneType: None

What I have done:

  • Looked into the logs
  • Cleaned up configurations
  • Deleted unused integrations

What is the best way to troubleshoot this? Yes I have quite some integrations, including HACS integrations. But it’s almost impossible to boot in safe mode, since it can take days for the problem to show up. Is there any debugging log usefull to enable in my configuration?

Homeassistant in docker container, version: 2023.12.3.

Help is appreciated!

There’s a profiler you can run. Instructions and options will differ based on your installation method.

You can also go back one version at a time and try to narrow it down to a specific release. Onerous, but it would help.

Remove one integration at a time to see if a specific integration is the problem.