WTH is HA becoming less and less reliable?

dantist · December 18, 2024, 1:37pm

Over the last couple of updates I noticed an increasing amount of issues, especially when (re)starting Home Assistant. It started with the Huawei Solar-integration failing 50% of the time, and when it does, I need to unplug the power from my RPi 5 before booting it up again. Simple HA-restarts won’t fix the issue, and triggering a reboot just freezes everything.

Then the same thing happend with Z-Wave and Zigbee, and it’s gotten to a point where I just don’t change stuff in HA anymore because doing cold restarts over and over again until everything works is too time consuming and frustrating.

teachingbirds · December 18, 2024, 1:55pm

Huawei solar seems to be a custom integration? Have you followed some of the steps recommended if you are having problems, like disabling custom integrations and seeing if that helps your problems?

jackjourneyman · December 18, 2024, 4:08pm

Impossible to generalise on the basis of mesh networks because every one is different. My (quite extensive) Zigbee has been rock solid for a couple of years.

peterxian · December 18, 2024, 5:04pm

I last booted my HA server 170 days ago. Before that it went almost 200 days without a reboot. I update HA at least once a month and restart it probably weekly with hacs updates or config changes. I have a 50+ node Z-Wave mesh with ZUI on the same server, along with about 8 other containers. It has been incredibly reliable.

If you suspect a 3rd-party integration is causing instability, try disabling it for a few weeks to see if reliability improves. If it does, provide feedback to the developer of the Integration (e.g. open a GitHub issue) about the problems you’re seeing. If stability doesn’t improve, consider the possibility of a hardware issue. Try a bigger power supply. Be wary of failing/corrupt storage media. Unplug usb peripherals to rule out driver problems. Unfortunately it may take time to troubleshoot but hopefully you get to the bottom of it.

Another personal anecdote: I used to run HA under VMware Fusion and started seeing almost daily incidents where my Mac was churning 4x100% cpu, fans screaming, due to VM usage. Suspecting a runaway process, I spent weeks disabling integrations, containers, and poring through logs only to figure out it was caused by a bug in VMware. I ended up moving everything to a new machine (thin client) that has been rock solid since.