Home Assistant randomly crashes and I have no idea why

I’ve got HA running on a pi4 with an SSD drive. My HA instance has been running pretty much flawlessly for nearly a year now with almost no issues, but all of a sudden I’d find that the UI says that “an unknown error” has occurred or it gives me a 500 error. When I try to access the pi via SSH, I get kex_exchange_identification: read: Connection reset by peer Connection reset by XXX.XXX.X.XXX port 22.

It seems my only course of action is to manually turn the pi on and off again. When I check the logs, it seems like everything was fine. For example, here’s a snippet from the last crash:

2022-10-17 20:23:30.526 INFO (SyncWorker_3) [WazeRouteCalculator.WazeRouteCalculator] Min	Max
0.07	0.07 minutes
0.03	0.03 km
2022-10-17 20:23:30.576 INFO (SyncWorker_5) [WazeRouteCalculator.WazeRouteCalculator] Min	Max
0.32	0.32 minutes
0.12	0.12 km
2022-10-17 20:23:31.083 INFO (SyncWorker_0) [WazeRouteCalculator.WazeRouteCalculator] Min	Max
27.63	39.63 minutes
27.90	33.83 km
2022-10-17 20:23:31.203 INFO (SyncWorker_9) [WazeRouteCalculator.WazeRouteCalculator] Min	Max
26.85	48.30 minutes
27.68	39.95 km

The last error message was:

2022-10-17 20:18:09.688 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [547347264272] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds
2022-10-17 20:18:09.693 INFO (MainThread) [homeassistant.components.websocket_api.http.connection] [547347264272] Connection closed by client

This last error message showed up 5 minutes before the last message on the log. The last message before that was an hour ago (same error). I cannot figure out why it suddenly crashes and how I can stop it. I’m not sure if this error has anything to do with it, but is anyone else also dealing with this?

When I used to run HA on a RPi4 it used to crash every few days. My issue was insufficient power. This may not be your issue though.

Honestly, i’ve never had HA crash. If you’re having crashes, its likely hte HW as @templeton_nash suggested too.

Yeah I used to have that issue previously and then eventually having SD cards that would die on me. The power issue had been resolved, but since switching to an SSD it seems to have resolved the crashing thing for almost a year now - until 3 days ago.

Please vote for your own topic :wink:.
But there are cases of sd wear out. Could the same thing happen to ssd?
You could try to replace it if you have one laying around.

The official power supply is what I have.

I’m using the official power supply right now so I’m not 100% if it is the power. Maybe it’s a bit out of scope here, but do you have any recommendations for alternatives?

Mine will crash occassionally (maybe once per week) and I have an RPI 4 running supervisor booting from a 1TB SSD too. I have watchdog running on it so if it does crash, which is usually after maxing out the CPU for about 5 minutes, it reboots and I only find out later. So i am curious on how I can determine this too… (most of the time the CPU and termperatrure are fine, I even have fans that run above a certain tempo and turn off whs back down…)?

IMHO random crashes on a Pi is almost always one o f two things.

SD failing or Power source is weak.

Since you’re already on an SSD. I’d bet money it’s a weak power supply. I like the CanaKit 3.5a USB-C supply… Should be easy to find on Amazon. If you can’t then go for o w with at least a 3 Amp output but the bigger the better.

WTH is not for support. Please use #configuration in the future. Thanks.

1 Like

I’ll test this out as I also have a 3.5A power supply that came with the Argon case I’m using. I just stuck with the official power supply because it is what is always recommended so we’ll see if it is in fact a hardware issue or not. It’s just strange that after all this time with no issues and almost no change in set up, it’s starting to act up now.

In any case, I’ll report back.

1 Like

Update: Unfortunately after about a week of testing with the 3.5A power supply, I’m still having issues with HA crashing with nothing to work from in the logs. I’m now fairly certain this is not a power issue in my case.

My only suspicion is the Tado integration. It’s the latest integration I’ve added to HA before the crashing started. I have no other concrete evidence to prove or disprove it, but I’ll keep looking into it.

1 Like

The only way my HA crashes is when it runs out of memory.
So how is your memory usage?

I don’t have tads. But in my case this can happen while working on esphome. On rpi3 it’s an issue by default. On rpi4 for somehow it can fail while compiling multiple configurations simultaneous.

If memory is the issue. Maybe Haos should have improved memory management.

Just a quick update:

I think I figured out what was crashing HA. It was Tado or rather the climate integration.

I’m not sure if this was changed at some point or if I just implemented it wrong initially, but to bypass their subscription service I’ve been using Node Red to automate the open window shut down and setting Tado to Away mode when we’re not at home.

I feel like I did initially set it up with individual entities, but couldn’t see them when I checked last week. I looked at the documentation and realized that I need to now specify this elsewhere. I think PyTado would try to send everything at once and run out of memory. Ever since I fixed this, the crashing seems to have resolved - at least is has been going strong for the entire week so far compared to it crashing 2-3 times a day before that.

1 Like

I take it back. I’ve made very little changes to HA since my last update - i.e. I’ve just been updating HA and everything else over time. Since the Summer, I’ve also removed all Tado automations from HA and we’ve bit the bullet and paid for their service to test out for this cold season. HA seems to have run fine for about 3-4 months without any issue. I use a watchdog addon and I started noticing that it was restarting occasionally, but for the last month, it’s gone back to completely crashing - this time, without Tado to blame as I’m not using it via HA for now.

Trying to access via CLI returns kex_exchange_identification: read: Connection reset by peer Connection reset so I have to resort back to hard restarts again.

Looking at the logs, there’s nothing abnormal or a pattern I could find. I didn’t want to bump this thread again, but yet again, I have no idea why it’s doing this. Perhaps someone else might have an answer.

Same here!