WTH is ZHA/Zigbee so unstable and hard to troubleshoot?

Yes that is correct, they have the same problem regardless of which Zigbee Gateway you use.

Note that since the issue is bug in the device firmware the problem is sometimes only in a specific firmware version.

Most devices that are infamous for having these types of bugs have never gotten official firmware upgrades, (and many do not even support OTA updates at all), but some might have a new revision come with newer firmware from the factory without getting a new model number, so it can be a gamble buying old stock from a store or buying used hardware if it is one of those models that is known to have buggy firmware.

FYI, I posted a feature requerst to the zigpy/zha developers for that idea of having warning with comments for specific Zigbee device with known issue inside ZHA UI, so you guys could add to that list I started there with the infamously known problem products that I have read and/or used myself:

No, I mean these sensors. I’m not sure if it helps you, though

Are you sure those bulbs are not Zigbee routers?

From what I know, the official spec states that all mains powered (110/230V) Zigbee device should be a router.

I am only using Philips Hue lights and all types are routers. There’s only 1-2 models with built in battery that are end points only. All their buttons and sensors are end points as well.

I was running a combination of Z2M and ZHA for over a year, as power plugs with energy metering caused the whole Z2M network to slowdown so I had only my power plugs connected to ZHA.

Recently moved back from Z2M+ZHA to just ZHA, since I switched wall buttons from FriendsOfHue (ZGP) which is not supported by ZHA to the Philips Hue Tap Dial Switch which is just an endpoint device with battery.

That switch was added in ZHA 2024.10.0 release. Before it was added it constantly crashed ZHA, which is why I tried it in the past and moved back to the ZGP buttons with Z2M and all my lights in there.

The thing about specifications, is that manufacturers would actually need to follow them and most don’t bother. There are many mains powered devices that don’t route.

Here’s an example:
image

That said, I’m actually of the opinion that bulbs shouldn’t route, since they can be easily powered off.

I think I may have found a solution to the entire network going down, or at least something that should reduce the severity of the issue. Time will tell.

It was working mostly fine for a couple months. Then roughly around the time I updated HA to 2025.2.0, it started happening again almost daily without any changes to the network. I took another stab at looking for reports about the issue and found this zigpy bellows issue mentioning MAX_MESSAGE_LIMIT_REACHED and how a restart resolved it for a day or two.
That issue was closed in favor of this PR for silabs-firmware-builder to “Increase broadcast and unicast table sizes”.

I had SkyConnect firmware 7.1.1.0 and that PR was merged for 7.4.2.0. I updated to 7.4.4.0 after some difficulty.

The official firmware update instructions were useless because it only mentions using an add-on which isn’t supported by HA Container and a web flasher which only works for devices purchased after Oct 20, 2024 (I purchased Oct 2023).

I found someone mentioning a third option to flash via shell using universal-silabs-flasher. After shutting down HA, I kept getting [Errno 13] Permission denied when trying to probe or flash the SkyConnect on the HA system, even after adding it to udev rules and reloading udev. I moved SkyConnect to my main system and instead got [Errno 16] Device or resource busy, which was apparently caused by the gpsd service being overzealous, so I stopped it with sudo systemctl stop gpsd and was then able to probe and flash.

I expected one benefit of a Home Assistant branded hub would be that HA could help keep the firmware updated much like it does for other Zigbee devices. This experience proved that’s not the case at all for HA Container. That’s another one for my feature wishlist.

I’m running HA OS and with the official add-on it’s literally one option to change (select adapter) and start the add-on. It automatically flashes the latest 7.4.4.0 firmware on the HA Connect ZBT-1.

Not sure if that add-on can be run as a separate container with pure docker.

If you want “full support” the recommended way to run is HA OS. With the other installation types, certain things you need to do yourself.

I don’t understand: are you saying that you’re running HA Container on HA hardware instead of HAOS?

Which might also help, if you have a bigger Zigbee network or misbehaving routers in your network:

https://www.reddit.com/r/homeassistant/comments/1iq9dav/zha_users_are_you_missing_out_on_source_routing

Read what it does before you apply this to your config.yaml .

zha:
  zigpy_config:
    source_routing: true

I switched it on yesterday, and so far it works fine for me.

1 Like

No. I’m running HA Container on an Intel NUC running Debian, which I also use for other things unrelated to HA. I prefer to use a standard distro that I’m familiar with for multi-purpose installations.

@danieldeni Thanks! That looks promising. I’ll try it out if I run into reliability issues again. That looks like a good candidate for an option available in the UI if it really does help.

So far so good for the most part. Several devices that were particularly troublesome, including one that I was unable to rediscover in pairing without moving it, immediately started working with instant response after the firmware update. Though I still have a relay in the wall that needs re-paired since it has been unavailable for over a year; the physical switch controlling it still works and I don’t use it often anyway. Adding a router appeared to resolve a group of 4 bulbs that were slow or unresponsive, but then I remembered I have another router right next to it inside a closet, so it may have been a coincidence. I am still noticing the occasional bulb responding slowly (~20 seconds) and then after it does respond it’s instant, but that’s a different issue I’ll worry about when/if it gets too annoying.

Ah I understand now, with “hub” you mean the Zigbee dongle, not the device that runs HA.

As you have already found out, HA Container is considered to be an advanced installation method that lacks a lot of the features that HAOS has.

1 Like

I had no end of issue with Zigbee when using Sonoff USB sticks, and really not much more success with the Zigbee in the Yellow.

Initially I moved over to a POE-powered Zigbee router thing from China (HAMGeek HMG-01 Plus), which was much better but not perfect. I’ve moved to a Aeotec stick recently and it’s been solid. No problems at all.

I have a mix of Zigbee stuff, mostly Candeo dimmer modules and Sonoff switches which act as routers, and the rest are battery powered thermometers and contact switches, etc.

1 Like

Well, for me the Source routing actually made it less stable. Especially the Lights the furthest from the coordinator started to respond more slowly and sometimes timing out with commands.

So I’ve switched back to the default mode of ZHA for now, which I didn’t really have problems with.

Thank you for this! Had issues for over months now and it felt like each update made it worse. Switched source_routing on yesterday and it seems like all my issues are gone!

Thank you, @danieldeni, for discovering and sharing this!