Zigbee (ZHA) lagging - not sure how to debug

Hello! I’m running into a strange issue, and I’m not sure what to do about it. I have about 80 Zigbee devices on a network controlled by Home Assistant via a ConBee II. Starting last week, the devices that perform some action became slow to respond to commands (1-10 minutes, it seems to vary), and sensors don’t reliably report back anymore.

Last week, before this started, Home Assistant was running on a VM with the ConBee II passed through to the VM. The USB passthrough has been an occasional source of problems (usually just fully disconnecting), so last week when this lag started I assumed it was at fault and decided to move to a raspberry pi on HAOS. This worked beautifully, and all the lag (seemingly) weent away. Then last night, exactly one week after the initial issue, the lag reappeared.

Relevant logs:

2023-11-14 11:01:52.789 WARNING (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xC5EC:1:0x0008]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
2023-11-14 11:01:53.093 WARNING (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xC5EC:1:0x0300]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
2023-11-14 11:01:53.552 WARNING (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xA762:1:0x0300]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
2023-11-14 11:01:53.963 WARNING (MainThread) [homeassistant.components.zha.core.cluster_handlers] [0xA762:1:0x0008]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]

I tried searching for NWK_ROUTE_DISCOVERY_FAILED and I found a bunch of seemingly unrelated problems. In particular, most of the issues I found seemed to have started after a change (to the zigbee dongle, to the HA version…), where as mine started before I made the change, and went away for a few days after making the change. I am interested in any guidance or suggestions to get this working again.

Could it be that some zigbee devices that are crucial in routing for other nodes are cut off from power at some point in time? How many router devices do you have?

Unlikely, if my understanding is correct that all wired nodes can act as routers. I have 15+ Ikea light bulbs all around the house, as well as a number of plugs and in-wall switches. Notably, the in-wall switch in the room that the raspberry pi and conbee are in is also affected by this.

edit: I just checked the zigbee integration’s “visualization” tab and can see that the coordinator is connected to 8 wired devices around the house. The issue seems to have gone away (I assume temporarily) so I can’t try to control one of the directly connected devices and verify. Next time this comes up I will check the network visualization again and make sure affected devices appear to be connected.

What channels is everything on in that part of the spectrum? What are your neighbours using?

It says channel 20. how do I find out what my neighbors are using?

Use a wifi analysis tool on your pc. (like the following : (msdlff store link for a windows tool. If you need an iOS, Android Mac, or Linux tool I’m sure people can point you in the right direction.

Or you could ask them.

I live in a stand-alone house on it’s own lot, I’m not super concerned about interference from neighbors, and my wifi is working fine. Are these symptoms consistent with wireless interference? It seems extreme and sudden. I will run a wifi scan on 2.4ghz next time it starts happening (mysteriously seems to still be working)

1 Like

It absolutely can be. If you want an example:

Last time I had to deal with it my neighbor installed a new mesh network and set all of his access points to a conflicting channel as mine. My UniFi network adjusted its 2.4Ghz channel right on top of my zigbee network and Poof no more zigbee.

When I figured out my wifi had switched and fixed THAT (including turning off auto optimization) it still took about an hour or so to recover as the mesh healed itself out of panic mode. But… It did basically force a Zigbee heal so when everything came back it WAS faster. :slight_smile:

1 Like

thanks for that video, great demonstration. I’m probably going to look for a USB extension cable to address that. What that seems to show is just fully dropped messages, what I was experiencing is laggy messages. press button in home assistant, wait 3+ minutes, light goes on.

overall the RF environment seems fine, here’s a scan from my unifi controller off the AP in the same room as Home Assistant/the dongle/etc:

To me this seems like not a significant amount of RF interference, but maybe y’all have other opinions? For what it’s worth, the issue hasn’t re-occured since about the time I initially posted this thread, although some devices have fully dropped off (i’ve been re-pairing them as I discover that they’re non-responsive)

It doesn’t take much at all. Relative strength comes to bear. Zigbee is a VERY low power transport on purpose. If it were a bicycle horn, typical wifi of standard power is like a cruise ship horn. (wifi was designed as always on power and - it doesn’t give a rat about power efficiency) If they’re both hitting the bicycle doesn’t have a prayer. It doesn’t take much. And it degrades signal (making things slow) before they fail outright.

You have…

This is how they overlap

Those top three in your diagram correspond to wifi2.4 channels 1,6,11 (the three most popular nonoverlapping channels and what 90% of everything runs) your 1/11 are busy. 1 more than 11. 6 is showing traffic but not nearly as bad as 1. The side lobes of 1 (aka channels 2-4) seem to have a little traffic but not much. In this case I’m either picking a Zigbee channel that does not conflict with 6 and trying to get everything Off 6 to clear the zigbee in that pocket at ch18-20 (20 would be the choice because I try to stay on 5s, and your higher frequency lobe of ch1 is extending up towards the midrange.

Alternatively move whatever AP is serving wifi Channel 11 to wifi ch 6. Then pick a Zigbee channel as far away from 6 and 11 as you can get (that’s also NOT Chan 26) some older gear doesn’t do well on Zigbee 26 and it’s best avoided. 25 works great.

1 Like

Indeed, you need to understand that Zigbee is extremely sensitive to EMF/EMI/RMI interference and Zigbee devices have very short range and poor coverage, so before you troubleshoot any deeper you really need to start with taking active actions as proactive measures that follow the best practices, read and try to follow all the tips here → Zigbee networks: how to guide for avoiding interference and optimize for getting better range + coverage

1 Like

Updating this thread because I think I have resolved all of my zigbee issues: a few weeks prior to this happening, I had installed some IKEA blinds that came with a USB repeater/router thing that the IKEA manual insisted was mandatory. I’m not sure if that device was poorly positioned or actually just sucks, but I unplugged it on a whim and all of the unreliability instantly disappeared. It’s been a few days and I haven’t had a single zigbee issue.

thanks all for the suggestions!

1 Like

Zigbee sabotages zigbee - I don’t think this is yet in @Hedda’s extensive (de)bug-ing guide!

1 Like