Zigbee Network Blew Up - ZHA Integration

I am new to HASS but not new to Home Automation so please bear with me as I learn HASS.

Last night my few day old Zigbee mesh network of about 90 devices collapsed and nothing was working. The ConBee II coordinator said it was offline just like all the other devices.

I tried many different things before resorting to more destructive options like deleting the ZHA Integration, and adding it back. When I did that, the count of devices was correct but I was only seeing a subset in the device list and their names were all reset to brand/model. Therefore I opted to restore HA to yesterday hoping it would save me from going back to every device to exclude/include. I had also taken a snapshot just before restoring so as to restore at least some of what I had done since yesterday (I am building my home’s home automation back up after migrating from ST).

Most Zigbee devices came back alive except a few that were still offline after 12hrs. I have excluded, reset and included them back. Annoying but not as bad as doing all 90. Process was quick and easy!

I lost a few minor things, and a few Node Red flows I was working on. I restored NR from the most recent snapshot that should have included them but it did not bring them back (I likely restored the app and not the flows stored somewhere else… but where?). Luckily I had also exported them to a JSON file so I imported them back in NR.

Questions:

  1. Any idea what I may have done to cause the meltdown? I have read of other people having the same issue but none of the comments seemed to indicate the possible cause. I did power down the RPI4 abruptly a couple times while replacing a failing core switch that was powering the RPI4 via POE so I wonder whether I caused file corruption - but in that case I am alarmed at how easily it can happen. The RPI4 is running HASS on a 512GB M2 SATA SSD (no SD Card at all).

  2. Is what I did the right way to recover from a dead Zigbee network?

  3. How do I better protect myself in the future from this?

This is what the network looks like right now:

  1. I am concerned about those battery powered devices having a single route back. Will they gain more over time? If something happens to the router they connect to, do they seek out another or do I have to exclude/include again?

Sorry you’re having such trouble, it seems like you have a fairly complex setup which makes it a little difficult to answer your questions.

  1. There are a few things you might want to check:
  • It could be how the Conbee II is connected to the RPI4. Are you using a USB 2.0 extension cable to keep it away from the RPI? There are lots of reports that suggest EM interference can cause network issues. I have kept mine 3 feet away using a dollar store cable and haven’t had an issue since. Some USB extension cables (Amazon basics) are trash and cause random disconnects and network issues.

  • Placement of the Conbee stick could also impact network performance. I keep mine about eye-level and within line-of-sight of at least one repeater or plug.

  • It could be corruption as you mentioned, or it could be incompatible devices (Ikea Tradfri repeaters in particular have given me meltdowns more than once).

  • It could be out of date firmware. Have you updated the Conbee II’s firmware?

  • It also could be the channel your zigbee network is broadcasting on. Have you checked for WiFi interference? You could try switching to channel 25 or 26, but this would require you to re-pair every device.

2 and 3. If I recall, deleting the ZHA integration is a more destructive option as it holds the network configuration for your coordinator, I think. It seems like you did everything “right” to try recovering it. Consider using the custom ZHA Network Lovelace card to monitor RSSI/LQI of your devices. This is how I caught device incompatibility by finding devices going “unknown.” With as complex as your network is, you may want to consider something beefier than a Raspberry Pi. I had a lot of issues on an old NUC I was using, but when I transitioned to an old Mac mini with 16gb of RAM most of my issues went away. I have 94 devices on my ZHA network and other than range issues, my network has been solid.

3 and 4. I think battery powered devices tend to stick to one parent device for routing until it can’t connect and then finds a new route. From what I’ve read on here, this is normal behavior and is to preserve battery life. I believe Zigbee devices “panic” after 20-30 minutes of being disconnected and attempts to reconnect with the closest neighbor. If you’re worried about the devices not finding a route add repeaters or plugs to your network to strengthen your mesh. I would avoid micro-managing your network with includes/excludes because that can also negatively impact the mesh.

1 Like

I really appreciate all the information you have shared!

I updated the firmware of the ConBee II yesterday, and reset and paired back very few devices that were stuck offline. Due to issues with 3 zwave locks I had to move the RPI closer to the locks which I think caused a few zigbee devices to go back offline. Now I have a small number of offline Zigbee devices and 20 to 30 Zwave devices offline. I wish I had a network map for zwave and also an easy reference to match node numbers to devices give the logs just mention the latter.

I was worried the RPI is not powerful enough but up to this point it appears that CPU and RAM usage are super low… but the zwave repair I started 9hrs ago is apparently still going. I imagine the slowness is due to the many dead nodes and possibly to excessive traffic further slowing things down. If I adopt a more powerful host, I really want to find something that doesn’t require OS administration. This is one thing I like of the RPI image that uses the HASS OS.

I think I found some threads on how to find and change the zigbee channel but have not tried yet. I unplugged the ST and Hubitat hubs to ensure there is no interference from other zigbee networks. I have 4 APs in the house that use wifi channels 1, 6 and 11. I keep power low as they are only for IoT as the rest is on 5GHz channels. I’ll look into the zigbee channel being used though tomorrow.

The Conbee II stick is on a USB 3.0 extension cable with dock but connected to USB 2.0 port. It is sitting on top of my networking rack at 50cm distance from the RPI to avoid the USB3 interference (there is a link behind the RPI that connects the RPI to the M2 SATA SSD sitting in the bottom portion of the Argon One M2 Case. Unless shielded properly I am guessing it is spewing interference.

I was attentive to install wired zigbee devices throughout the house so no battery zigbee node should be too far from multiple routers. I was actually shifting from a majority of zwave devices to zigbee due to incessant zwave issues (on ST platform) likely due to excessive traffic (Homeseer switches/dimmers were just locking up all the time).

I will definitely look into installing the card you mentioned. I wish there was one for zwave too! There are so many more useful tools for zigbee in HA compared to Zwave…

Thanks again for your help!