Over the last couple of updates I noticed an increasing amount of issues, especially when (re)starting Home Assistant. It started with the Huawei Solar-integration failing 50% of the time, and when it does, I need to unplug the power from my RPi 5 before booting it up again. Simple HA-restarts won’t fix the issue, and triggering a reboot just freezes everything.
Then the same thing happend with Z-Wave and Zigbee, and it’s gotten to a point where I just don’t change stuff in HA anymore because doing cold restarts over and over again until everything works is too time consuming and frustrating.
Huawei solar seems to be a custom integration? Have you followed some of the steps recommended if you are having problems, like disabling custom integrations and seeing if that helps your problems?
Impossible to generalise on the basis of mesh networks because every one is different. My (quite extensive) Zigbee has been rock solid for a couple of years.
I last booted my HA server 170 days ago. Before that it went almost 200 days without a reboot. I update HA at least once a month and restart it probably weekly with hacs updates or config changes. I have a 50+ node Z-Wave mesh with ZUI on the same server, along with about 8 other containers. It has been incredibly reliable.
If you suspect a 3rd-party integration is causing instability, try disabling it for a few weeks to see if reliability improves. If it does, provide feedback to the developer of the Integration (e.g. open a GitHub issue) about the problems you’re seeing. If stability doesn’t improve, consider the possibility of a hardware issue. Try a bigger power supply. Be wary of failing/corrupt storage media. Unplug usb peripherals to rule out driver problems. Unfortunately it may take time to troubleshoot but hopefully you get to the bottom of it.
Another personal anecdote: I used to run HA under VMware Fusion and started seeing almost daily incidents where my Mac was churning 4x100% cpu, fans screaming, due to VM usage. Suspecting a runaway process, I spent weeks disabling integrations, containers, and poring through logs only to figure out it was caused by a bug in VMware. I ended up moving everything to a new machine (thin client) that has been rock solid since.