I have been using Home Assistant for a few years, but never managed to get Zigbee completely stable. My house has 127 devices, of which 81 are Namron light bulbs, a few Agara sensors and the rest a mix of Tuya compatible plugs and sensors. The lights are downlights throughout the house, so the network is very well spread out with routers. The walls are thick concrete. I have Google nest throughout the house, so I cannot choose the wifi channel. I have used the Sky Connect USB adapter, then a Sonoff Usb adapter, then changed to an ethernet adapter I found online. I finally divided the net into two, one on each side of the house, with an ethernet adapter for each, and it has been relatively stable. It is mostly for instance when I turned off all lights at the same time, or use an integration that dynamically changes the lights in a Hue like manner, that the system would crash. The crash would mean all devices were lost at the same time from HA, and all bulbs would light up an blink. The WAP factor goes way down when this happens, especially at night. The fix is to turn off power to all bulbs, reload ZHA until it stops saying it is unable to initialize, and then turn on a few bulbs at a time until everything is back on. Then leave the system for some time, before starting to use the lights again. When I divided the system into two networks only one network would go down at a time, and it happened much more seldom.
I tried again a few days ago to consolidate into one big network using ZHA and a new SLZB-06 in ethernet mode. Unfortunately it did not help, it was stable until about 60-70 devices were added, and then became even more unstable than the other coordinators when I tried to add more. I was able to carefully add all devices, but just turning on lights in a room ( 3-4 bulbs) would crash the system again. With the SLZB it is even more difficult to get ZHA to reload, and I usually have to disconnect it from power before I reload.
Any suggestions on what I can do to avoid ZHA dropping the network and needing to reinitialize? Or is the best solution to divide the net into two or more to limit the number of devices? I have also used Z2M and had the exact same experience, so it does not seem to be ZHA specific, and since it hasn’t changed through four different coordinators it is unlikely to be caused by the type of coordinators.
A couple of questions:
- How are you turning on/off these lights? Dumb switch, or via software?
- If via software, are you using HA groups or ZHA groups?
- What does your zigbee map look like?
- I am using Home Assistant to turn these off when in regular use. The actual switches have nice labels saying not in use and are just used when the system crashes and I need to reduce the number of active devices so the lights stop blinking and ZHA can reload. I use both Zigbee switches triggering automations, Alexa and the web interface to control the lights through HA in normal use.
- I mostly use ZHA groups, but I have a few wifi bulbs as well so for instance for “All lights” I have a large ZHA group together with a few wifi lights in a HA group.
- The system has crashed and I am working to reset without luck, so I cannot access the map just now. When I checked earlier it was an interconnected mesh with the coordinator and lights in the middle and the sensors and a few devices in the garage as outliers. The lights were all showing good connections as far as I could tell, while the devices in the garage which never cause much trouble had poor connections.
Sounds to me like your mesh is getting overwhelmed with all the light commands occurring at once, though it really shouldn’t happen with 3-4 bulbs like you described in the initial post.
Anything in the ZHA logs when this happens?
With the house divided into two networks, it only happens when a large number of bulbs are turned on or off at the same time, for instance if someone pushes buttons that affect most of the house, and even then very rarely.
With all devices in the same mesh a few bulbs are enough, and the last few hours I actually haven’t been able to get all devices connected again before the system crashes. I guess there are a lot of commands being sent back and forth when the devices get the power back, and that this is enough to overwhelm the mesh. Are 127 devices, most of which are routers, just too many devices on one mesh?
I have an error log from a previous time the system went down, but I am not sure what to look for?
Potentially, yes. Assuming you have 3 (or more) Google nest wifi routers in your house, that’s enough to interfere with the entire 2.4Ghz spectrum (channels 1-6-11) and not leave much leeway for your zigbee network to operate comfortably.
Without the possibility to modify the Nest’s channels on 2.4Ghz, your only options are to either try to move your entire zigbee network to channel 25 (still not guaranteed since it partially overlaps with wifi channel 11), or to replace your Nest routers with something that offers some kind of basic user control.
I have 5 Nest wifi points… sounds like that is the problem then. So without interference this would not happen? The interference causes commands to be resent, which then overloads the mesh, and the bulbs blink because they are unable to ping the coordinator?
Do you know why this causes ZHA to go to Failed Setup and be unable to initialize though? The SLZB-06 web dashboard is accessible throughout, so the device should still be accessible.
I’ll go back to two nets with fewer devices then until I can replace the Nests.
I can’t tell you with absolute certainty that this would not happen, since I’m basing my responses on the info you have provided so far.
What I can guarantee is that in 5 years I’ve never seen anyone’s entire ZigBee network crash due to turning on 3-4 lights at once. The only thing different about your network compared to others is the locked down nature of your routers. Hence, my best guess is massive interference
Ok, thanks for the advice. I have returned to two nets again now and everything is quite stable. I’ll find out if the Nests were causing this at a later time when I can prioritise new wifi routers.