Home Assistant on Raspberry Pi 4 issue - Connection Drops

HAdes1 · January 10, 2025, 12:45pm

Hi everyone,

after the awesome community already helped me perfectly with solving my first problem, I’m now facing a new challenge. I’ve been using Home Assistant for about two months now and have gradually expanded the integrated devices and services. It started with a few smart devices, but in the meantime I’ve fallen down the self-hosting rabbit hole and now host services like Vaultwarden, Wireguard, Linkwarden… alongside Home Assistant.
(Now crossposting this issue here and in reddit)

My Setup:

Raspberry Pi 4 Model B 4GB with Home Assistant OS (v25.1.1; with about 70 automations and 80 integrations)
1TB SSD (connected directly to the Pi via USB - it wouldn’t boot via the USB hub, but it doesn’t seem to cause a power problem)
Add-ons: including Studio Code Server, Samba, InfluxDB/Grafana, Music Assistant, WireGuard, AdGuard Home, MQTT Broker & Explorer, Frigate (with one camera and Coral TPU - connected to the USB hub with its own power supply), Vaultwarden(Bitwarden), Linkwarden
- Average RAM usage: 60%, with 10-20% higher peaks (especially when using add-ons like Samba or Studio Code Server, which are therefore only activated when needed)
Since the Pi tends to get warmer under load, there is a CPU temperature control via GPIO fan (activates when the temperature is above 60°C for several minutes or briefly above 70°C. It then reliably cools down to 35°C and only then deactivates again)

My Problem (Connection Drops): Recently, I’ve been experiencing repeated connection drops to Home Assistant. The user interface becomes unavailable, and I can’t SSH into the Pi. Sometimes it reconnected after a few minutes, sometimes it took hours to fix itself. A few times I had to hard reboot the Pi by disconnecting the power. Since I tried to set up an old phone as a wall panel yesterday (where I only created a new MQTT user in the broker and configured WallPanel on the phone - so established the MQTT connection), the problem has worsened. Immediately after establishing the connection, the interface was unavailable again and could not recover on its own. Since then, Home Assistant has been completely unreachable, even after several hard reboots. When I access the URL (DuckDNS/Let’s Encrypt; the internal URL is therefore identical to the external one), I see the Home Assistant boot screen, but I can’t get to the login because a connection error with a countdown to the next attempt is always displayed. I therefore suspect that at least parts of the system are running.

Update: Last night I tried disconnecting the USB hub before rebooting, and lo and behold: Home Assistant boots without any problems and the interface is accessible again (only Frigate couldn’t start without the Coral, of course, but that was fixed by connecting the hub after booting)!

Until I drove to work this morning, everything worked fine (at three o’clock in the morning HA restarts automatically, where it didn’t seem to have any problems either). During the 30 minute drive it seems to have crashed again and has been unreachable since then. Since the hub is currently connected again, I suspect that it will be available again as soon as I disconnect it and hard reboot…

But I wonder why it always runs for a few hours first before the problem occurs (and it always happens when I’m away from home…)

Has anyone had similar experiences with connection drops and USB hubs (I use the SABRENT USB Hub Active 3)?

Could the hub be the cause of the problems (or is it somehow related to the Coral)? I’m unsure if it makes sense to buy a new hub and test it or if that’s just a symptom, not the cause…

Does someone know, what would be the best next step to ensure the stability of Home Assistant?

I am grateful for any help! Thank you!

NathanCu · January 10, 2025, 12:58pm

Do it all but my bet is you’re already at step 5.

For the record your Pi4 4g is more heavily loaded from an addon perspective than my pi48g was before I made the decision to move to a NUC over a year ago and I moved for basically the same reason. Not nearly as severe as yours mine died once a week but it was dying because of memory exhaustion and just falling over. (why sometimes it stays up. How fast do you run out of resource)

HAdes1 · January 10, 2025, 1:22pm

Yes, I probably fell a little too deep into that rabbit hole (but all these options of selfhosting are just too tempting)
I already had the feeling of needing to change to a more powerful device in not too far future…

When I took a look into the logs, when it was up yesterday, I only had the usual warnings (like Philips Hue taking more than 10s to load or not being able to connect to services that were offline at that moment) but I’ll connect the SSD to my computer and look into the files manually as soon as I am home.

I thought to have read some article about HAOS not crashing when under full or heavy load, but just getting slower. I hoped that this is true, but I get more and more the feeling that it really is overloaded and that exchanging the hub won’t change anything. Although it still is weird that it starts with the hub disconnected, while it can’t when it is connected…
Maybe I’ll transfer the system to an old computer I have laying around sooner than expected to make sure everything works when being away…

NathanCu · January 10, 2025, 1:29pm

And when it’s that loaded if anything creates a race condition… Boom lockup. That was my experience.

Sounds like you have multiple issues one being with that hub. There were a bunch of threads mid last year having to do with Z2M or Zwjs and locking up when a USB hub was installed so you probably have some variation of what that was (sorry don’t remember detail) while ALSO being in a low mem condition (and being aggravated)

If it were me I’d move the whole install tto a new nuc and see if it persists with the boot lock up. If no. take the win and move on. Because I suspect even if the boot lockup is a separate issue you’re still fighting lo me after you solve that

HAdes1 · January 10, 2025, 1:46pm

That sounds like describing the issue perfectly…
Okay then… I’ll really change these two aspects asap (getting a new host as well as hub).
Thank you for sharing your experience!

NathanCu · January 10, 2025, 1:48pm

I’d do the host first I doubt anything is wrong with the hub.

HAdes1 · January 10, 2025, 3:41pm

I was already afraid to receive this answer.
Anyways, I got home about half an hour ago and found another externally powered hub like in the first places I looked at.
Thus, the order has more or less has been determined by itself.
Surprisingly, everything booted (faster then every other after-lock boot). But as you said, I won’t take it as win and rather as temporary workaround until I got another host.

HAdes1 · January 10, 2025, 4:08pm

It looks like Linkwarden is causing the high resource usage by trying to fetch an image for each of the 1500+ imported links. Maybe I can delay replacing the host for a bit by uninstalling it again. But I realize I’ll need to upgrade soon.

NathanCu · January 10, 2025, 7:32pm

So it’s hitting something hard enough to tip you over the edge. Glad you were able to locate it!