All Zigbee devices offline after HA restart - debug recommendations?

TL;DR: my entire previously working/reliable Zigbee network went offline after restarting HA. It hasn’t come back after 36hr, despite multiple restarts, reboots, and power cycling devices. I’m wondering if anyone more experienced with Zigbee/ZHA has any pointers on understanding/resolving the problem before I have to consider manually rebuilding my ~80 device Zigbee network.

Questions:

  • Would TXStatus.NWK_ROUTE_DISCOVERY_FAILED errors be a likely explanation for HA not being able to talk to my Zigbee devices?
  • What would be likely causes for getting lots of TXStatus.NWK_ROUTE_DISCOVERY_FAILED errors?
  • What does the “leave/join” option for the ConBeeII stick in the deConz app actually do? My understanding is that the coordinator (ConBeeII) initially distributes a network key to all the devices as they join the Zigbee mesh and that the mesh can survive the coordinator going offline (e.g. when you reboot your HA box). When the coordinator “leaves” what does that do? If you attempt to rejoin, does that change anything with the network key, or should everything return to as it was if everything is working properly?

Hardware/Software Setup Details:

  • HA version: v0.115.5
  • PC Host: Ubuntu 18.04
  • Zigbee Controller: ConBee II
  • HA Zigbee Integration: ZHA
  • HA Device Config: using /dev/serial/by-id path and I’ve confirmed that the symbolic link is still present and pointing to /dev/ttyACM0; I do not have a /dev/ttyACM1device in the system
  • Zigbee Routers: ~40, all Sylvania (LEDVANCE) A19, BR30, and light strip products with current firmware
  • Zigbee Endpoints: ~40 with a mix of Philips motion sensors, Samsung multipurpose sensors, a few other sensors, and 2 door locks
  • Zigbee Network Status: before this incident, my mesh was well-connected with LQI values of 255 reported for all parent-child links in the zha-map neighbors files. Links between routers had varying LQI values, but there was at least one LQI=255 link between each router and another router. Obviously not the case now.

Symptoms/Timeline:

Two nights ago, I restarted Home Assistant due to one of my BR30 bulbs being unresponsive - that has happened occasionally before, and either waiting a few hours or restarting HA has resolved it in the past. When HA restarted, ALL of my Zigbee devices were missing. My zha-network-card showed all devices with Online=false and LQI=N/A. After waiting 15min, it was still the same. In the Logs section of HA, I saw one error: [zigpy_deconz.zigbee.application] Unexpected transmit confirm for request id [XX], Status: TXStatus.NWK_ROUTE_DISCOVERY_FAILED. I’ve a similar error a few times in the past, but it never seemed to cause any visible issues.

I enabled the debug logging as recommended in the ZHA docs, restarted HA, and saw more error messages similar to the above one:

Error while sending 10 req id frame: TXStatus.NWK_ROUTE_DISCOVERY_FAILED

According to this post, that error code means “An attempt to discover a route has failed due to a reason other than a lack of routing capacity”. I’m not entire sure what that means in practice, but one thing that sounds plausible is that something is sending lots of Zigbee traffic and starving out other traffic - similar to a DoS attack. I don’t know how to confirm/investigate that with my current toolset, though. I don’t think anything has changed in my RF environment or my WiFi traffic to cause this, but I don’t have a spectrum analyzer, either.

After waiting overnight, it seems the Zigbee network is not totally dead, as a few of my motion sensors appear to have connected and provided HA with some occasional data. One was working reasonably reliably last night, but the 2 others that showed up on the zha-network-card weren’t reliably checking in. After restarting HA this morning, only 2 are listed on the zha-network-card and neither are transmitting data correctly.

I haven’t tried resettting/re-pairing any devices yet, but I can do that if necessary.

Currently, when I open/close some of my windows that have a battery-powered sensors, they still blink the green light that I think means that the data was successfully transmitted to their parent. I believe that they have an orange/red light when they can’t talk to their parent or are in pairing mode. That makes me think that the network may still be intact/active, but my ConBee II just can’t talk to it.

Things I’ve tried:

  • Restart HA (multiple times): no change
  • Gracefully shut down Ubuntu host that runs HA and then restart: no change
  • Move ConBee II USB stick to PC and start up deConz: result was that the stick is detected and appears to be configured on ch15. No devices shown connected to ConBeeII. No change when ConBee II is reattached to HA host (and host is rebooted).
  • Power down, wait for 10 sec, and then power up each of my mains-powered Zigbee devices (routers), one or two at a time: no change
  • Leave HA running and all devices plugged in overnight to see if anything changes: A few endpoint devices show intermittent connections in zha-network-card
  • Use RF scan feature of UniFi wireless access point to scan for 2.4GHz interference: nothing higher than ~-90dBm across all 11 WiFi channels

Things remaining to try:

  • Shut down HA Ubuntu host, turn off main breaker to house, power on HA host (it’s on a UPS), and then turn main breaker on - force mesh to come back all at once with HA freshly rebooted. Is that likely to do anything different than rebooting/restarting devices individually like I’ve already done?
  • Move WAP channel to ch11 (since ConBeeII seems to be on Zigbee ch15) - but things were working fine for months with current channels, so not expecting a miracle there…

One historical note that might be of significance: months ago when initially setting up my devices, I attempted to set up my Zigbee network on ch25. I successfully configured the ConBeeII stick to ch25 by plugging it into my PC and setting the channel via the deConz app. I connected a few devices to it with deConz and things were good. I then moved the stick to my HA host, thinking the channel setting would be retained once I set up the ZHA integration. Since I didn’t do anyhing with the ZHA YAML config, my understanding is that ZHA probably switched the channel back to the default of 15 when I set up my Zigbee devices via ZHA in HA. On the off-chance that didn’t happen, could my network have been running on ch25 all this time and then something happened when restarting HA that switched the controller back to 15 while the rest of the network is on ch25? If so, would testing that theory be as simple as using deConz to leave the network, switch the channel to 25, and then try to rejoin the network? Or does that process regenerate/reset network keys and such and would render my ConBee II stick unable to rejoin my existing network on any channel?

Home Assistant and Zigbee in general are still relatively new for me (~5mo), and I’ve learned a lot from the community so far - thanks for that! Hopefully documenting this issue will help someone else out in the future, too. Debugging this is getting beyond my current expertise and my internet sleuthing has not yielded an obvious solution. Thus, I’d really appreciate any insight or suggestions to understand the problem, and better yet, find a solution.

All my devices when dead after a recent reboot after upgrading to 0.115.3 I think. Googling led me to reading a few others who had the same problem. My fix was to re-pair each device. Was a bit of a pain, but all went well. Then my Ikea two button remotes both went unavailable after bumping to 0.115.5 I think. I can get them to pair, but they immediately go unavailable. I haven’t had the energy to report it yet.

Thanks @andynbaker - re-pairing the devices is what I ended up doing yesterday and things seem to be working ok now. Haven’t restarted/rebooted HA yet, though.

For anyone else who encounters this issue, here are some details on my experience:

  • I now remember a little more about what started this mess… I had restarted Home Assistant and saw a notification that zha_map had failed to start. The logs showed that it couldn’t access the ConBee II device and I believe there were a couple of zigpy-related messages about not being able to communicate. I’ve seen this once in a while before and in all instances, after a couple minutes, the Zigbee network was always up and devices were functioning properly. It required a HA restart to get zha_map to start working again though. I think what was different this time was that I restarted HA pretty much immediately; I didn’t wait a few minutes for the devices to show up in the HA UI. Upon restarting HA the second time, I remember (but didn’t save/memorize) seeing an error in the logs about a Zigbee unexpected response (completion?) - which I figured meant that a request was outstanding when HA was rebooted and then the response came back after the restart (and HA no longer had a record of the original request). I’m wondering if THAT was what provoked HA to stop seeing any router devices.
  • As for getting the devices back into HA, here’s what I did:
    • Put each router device into pairing mode (no “Remove Device” from HA) - e.g. for Sylvania lights, this it’s turn off/on 5x quickly; for Hue motion sensors, it’s hold the reset button until the green light comes on
    • With the ConBee II device in HA, select ADD DEVICES VIA THIS DEVICE, and each router device should be detected
    • If any router devices are out of range from the ConBee II, then add the new router via an already-added router device that’s closer to the one you’re adding
      • As these devices were added back to HA, all their previous customizations and entity names were retained
    • As router devices were added back to HA, most of them pulled whatever endpoint devices were attached to them with them. e.g. if a door sensor had a particular lightbulb as its parent in the Zigbee mesh, then when the parent lightbulb was added back to HA, the sensor would be detected without any other steps required. Sometimes this would take a few minutes, though.
      • If an endpoint device was not redetected quickly on its own, pulling the battery and reinserting it did the trick in most cases. Samsung SmartThings mulitpurpose sensors were super-easy for this.
      • Most of my Hue indoor motion sensors were redetected automatically over a period of a few hours. The 2 that didn’t responded putting them pairing mode and then adding to HA as I did for the router devices.
      • One of the Hue motion sensors was missing its light level sensor in HA after being redetected automatically. I’m guessing something went wrong when detecting clusters while it was being re-added to HA when its parent was re-paired. A manual re-pair of that motion sensor fixed it.
      • The same thing happened to the temperature clusters on a few CentralLite (Sylvania Smart+) window sensors as well, and re-pairing them directly fixed them, too.
    • The one thing that’s not currently fixed: I have some CentraLite Contact Sensor-A door/window sensors (brand: Sylvania Smart+) that have never reported a numeric battery level (only unknown), but the entities exist in HA. They are using the zigpy.device.Device quirk, while my other CentraLite 3320-L sensors (Lowes Iris, which look to be almost identical) report their battery level and use the zhaquirks.centralite.ias.CentraLiteIASSensor quirk. After this episode and re-pairing the Contact Sensor-A sensors, the battery entities are showing up as “Unavailable”. Not yet sure what’s up with that. Re-re-pairing the sensors doesn’t change anything.

Some extra context on what the logs looked like when things were broken… in case this is useful in the future:

Logs from restarting HA

Zigbee starting up:

DEBUG (MainThread) [zigpy.appdb] Loading application state from /config/zigbee.db

Then there are lines like this for each device, with some devices only having one line with Attribute id 4:

DEBUG (MainThread) [zigpy.appdb] [0x4110:2:0x0000] Attribute id: 4 value: Philips
DEBUG (MainThread) [zigpy.appdb] [0x4110:2:0x0000] Attribute id: 5 value: SML001

Then there’s a section where it looks like it’s trying to find quirks for each device:

DEBUG (MainThread) [zigpy.quirks.registry] Checking quirks for None None (*--REDACTED-CONBEEII-IEEE--*)

Some devices check a bunch of quirks and don’t show one found:

DEBUG (MainThread) [zigpy.quirks.registry] Considering <class 'bellows.zigbee.application.EZSPCoordinator'>
DEBUG (MainThread) [zigpy.quirks.registry] Fail because endpoint list mismatch: {1} {80, 1}

Some devices find a quirks entry:

DEBUG (MainThread) [zigpy.quirks.registry] Considering <class 'zhaquirks.samjin.multi2.SmartthingsMultiPurposeSensor2019'>
DEBUG (MainThread) [zigpy.quirks.registry] Found custom device replacement for *--REDACTED-BEDROOM-WINDOW-MOTION_SENSOR-IEEE--*: <class 'zhaquirks.samjin.multi2.SmartthingsMultiPurposeSensor2019'>

Then there’s another Attribute section like above:

DEBUG (MainThread) [zigpy.appdb] [0xd30e:1:0x0000] Attribute id: 4 value: LEDVANCE
DEBUG (MainThread) [zigpy.appdb] [0x3867:1:0x0500] Attribute id: 2 value: 33
DEBUG (MainThread) [zigpy.appdb] [0x3867:1:0x0500] Attribute id: 0 value: 1
DEBUG (MainThread) [zigpy.appdb] [0x3867:1:0x0000] Attribute id: 7 value: 3

Then this section where it looks like it’s initializing the network:

DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.protocol_version: 34>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter protocol_version response: [267]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.version (0,)
DEBUG (MainThread) [zigpy_deconz.api] Version response: 264a0700
DEBUG (MainThread) [zigpy_deconz.api] Command Command.device_state (0, 0, 0)
DEBUG (MainThread) [zigpy_deconz.api] Device state response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|2: 162>, 0, 0]
DEBUG (MainThread) [zigpy_deconz.api] Network state transition: OFFLINE -> CONNECTED
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.mac_address: 1>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter mac_address response: [*--REDACTED-CONBEEII-IEEE--*]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.aps_designed_coordinator: 9>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter aps_designed_coordinator response: [1]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.write_parameter (5, <NetworkParameter.watchdog_ttl: 38>, b'X\x02\x00\x00')
DEBUG (MainThread) [zigpy_deconz.api] Write parameter watchdog_ttl: SUCCESS
DEBUG (MainThread) [zigpy_deconz.api] Command Command.device_state (0, 0, 0)
DEBUG (MainThread) [zigpy_deconz.api] Device state response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|2: 162>, 0, 0]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.nwk_panid: 5>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter nwk_panid response: [0x1d72]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.nwk_address: 7>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter nwk_address response: [0x0000]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.nwk_extended_panid: 8>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter nwk_extended_panid response: [*--REDACTED-CONBEEII-IEEE--*]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.channel_mask: 10>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter channel_mask response: [<Channels.CHANNEL_15: 32768>]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.aps_extended_panid: 11>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter aps_extended_panid response: [00:00:00:00:00:00:00:00]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.trust_center_address: 14>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter trust_center_address response: [*--REDACTED-CONBEEII-IEEE--*]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.security_mode: 16>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter security_mode response: [3]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.current_channel: 28>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter current_channel response: [15]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.protocol_version: 34>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter protocol_version response: [267]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.read_parameter (1, <NetworkParameter.nwk_update_id: 36>, b'')
DEBUG (MainThread) [zigpy_deconz.api] Read parameter nwk_update_id response: [0]

Then an entry like this for each sensor that did manage to connect:

DEBUG (MainThread) [zigpy_deconz.zigbee.application] device: 0x9c8f - Philips SML001, FFD=False, Rx_on_when_idle=False
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Restoring *--REDACTED-MOTION_SENSOR-IEEE--*/0x9c8f device as direct child
DEBUG (MainThread) [zigpy_deconz.api] Command Command.add_neighbour (12, 1, 0x9C8F, *--REDACTED-MOTION_SENSOR-IEEE--*, 128)
DEBUG (MainThread) [zigpy_deconz.api] add neighbour response: [12, 1, 0x9c8f, *--REDACTED-MOTION_SENSOR-IEEE--*, 128]

Then it looks like HA is trying to send a request to one of the (Zigbee router) lights and it gets this:

DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 1 under 2 request id, data: b'0001000000'
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (20, 2, 0, <DeconzAddressEndpoint address_mode=2 address=0x3327 endpoint=1>, 260, 6, 1, b'\x00\x01\x00\x00\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 3 under 4 request id, data: b'0003000000'
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 5 under 6 request id, data: b'000500070003000400'
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 7 under 8 request id, data: b'0007000000'
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 9 under 10 request id, data: b'0009000000'
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 11 under 12 request id, data: b'000b00070003000400'
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|2: 34>, 2]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (20, 4, 0, <DeconzAddressEndpoint address_mode=2 address=0x3327 endpoint=1>, 260, 8, 1, b'\x00\x03\x00\x00\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|2: 34>, 4]
DEBUG (MainThread) [zigpy_deconz.api] Device state changed response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 166>, 0]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (24, 6, 0, <DeconzAddressEndpoint address_mode=2 address=0x3327 endpoint=1>, 260, 768, 1, b'\x00\x05\x00\x07\x00\x03\x00\x04\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 38>, 6]
DEBUG (MainThread) [zigpy_deconz.api] Device state changed response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 166>, 0]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (20, 8, 0, <DeconzAddressEndpoint address_mode=2 address=0xAD54 endpoint=1>, 260, 6, 1, b'\x00\x07\x00\x00\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 38>, 8]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (20, 10, 0, <DeconzAddressEndpoint address_mode=2 address=0xAD54 endpoint=1>, 260, 8, 1, b'\x00\t\x00\x00\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 38>, 10]
DEBUG (MainThread) [zigpy_deconz.api] Device state changed response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 166>, 0]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (24, 12, 0, <DeconzAddressEndpoint address_mode=2 address=0xAD54 endpoint=1>, 260, 768, 1, b'\x00\x0b\x00\x07\x00\x03\x00\x04\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 38>, 12]
DEBUG (MainThread) [zigpy_deconz.api] Device state changed response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 166>, 0]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_confirm (0,)
DEBUG (MainThread) [zigpy_deconz.api] APS data confirm response for request with id 4: d0
DEBUG (MainThread) [zigpy_deconz.api] Request id: 0x04 'aps_data_confirm' for <DeconzAddressEndpoint address_mode=ADDRESS_MODE.NWK address=0x3327 endpoint=1>, status: 0xd0
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Error while sending 4 req id frame: TXStatus.NWK_ROUTE_DISCOVERY_FAILED
DEBUG (MainThread) [zigpy.device] [0x3327] Delivery error for seq # 0x03, on endpoint id 1 cluster 0x0008: message send failure
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 13 under 14 request id, data: b'000d000000'
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_confirm (0,)
DEBUG (MainThread) [zigpy_deconz.api] APS data confirm response for request with id 6: d0
DEBUG (MainThread) [zigpy_deconz.api] Request id: 0x06 'aps_data_confirm' for <DeconzAddressEndpoint address_mode=ADDRESS_MODE.NWK address=0x3327 endpoint=1>, status: 0xd0
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Error while sending 6 req id frame: TXStatus.NWK_ROUTE_DISCOVERY_FAILED
DEBUG (MainThread) [zigpy.device] [0x3327] Delivery error for seq # 0x05, on endpoint id 1 cluster 0x0300: message send failure
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_request (20, 14, 0, <DeconzAddressEndpoint address_mode=2 address=0x3327 endpoint=1>, 260, 8, 1, b'\x00\r\x00\x00\x00', 2, 0)
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Sending Zigbee request with tsn 15 under 16 request id, data: b'000f000700'
DEBUG (MainThread) [zigpy_deconz.api] APS data request response: [2, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 38>, 14]
DEBUG (MainThread) [zigpy_deconz.api] Device state changed response: [<DeviceState.128|APSDE_DATA_REQUEST_SLOTS_AVAILABLE|APSDE_DATA_CONFIRM|2: 166>, 0]
DEBUG (MainThread) [zigpy_deconz.api] Command Command.aps_data_confirm (0,)
DEBUG (MainThread) [zigpy_deconz.api] APS data confirm response for request with id 10: d0
DEBUG (MainThread) [zigpy_deconz.api] Request id: 0x0a 'aps_data_confirm' for <DeconzAddressEndpoint address_mode=ADDRESS_MODE.NWK address=0xad54 endpoint=1>, status: 0xd0
DEBUG (MainThread) [zigpy_deconz.zigbee.application] Error while sending 10 req id frame: TXStatus.NWK_ROUTE_DISCOVERY_FAILED

Yep, I too run a conbee II. I switched back to deCONZ yesterday as my thermostats are just about fully supported there now (I originally switched to ZHA because of the thermostat control). I like ZHA and I like that it does not require running another container, but I just kept losing some devices and would regularly need to repair. This last episode forcing a total repair of all devices led me to give deCONZ another go.

This last episode forcing a total repair of all devices led me to give deCONZ another go.

I’m contemplating the same thing… I lost many WAF points with this recent episode :wink: I get that occasional issues are part of the deal when using under-development software, but it has made me appreciate my Lutron and Hue stuff that “just works”. But they’re each locked into their own ecosystem and I want more… so here we are.

Funny about the deConz issue: I, too, started out with deConz because that was the first thing I found. I switched to ZHA because they didn’t yet support the CentraLite door/window sensors I got cheap when Lowes/Amazon cleared them out. A release a couple months ago listed support for them, so perhaps that’d work now. But you still have a container on your HA host running deConz… do you know if restarting HA (but not rebooting the host itself) restarts deConz?

I’m considering the option of having the ConBee II attached to something other than my HA host… by doing that, HA can be restarted/rebooted without disturbing the Zigbee mesh. But that’s more hardware and additional coordination/failure points.

I’m also wondering whether my ~85-device network would operate better as two ~45-device meshes run by 2 separate coordinators, mainly because I’ve seen reports that the Sylvania (LEDVANCE) lights may drop packets on very busy networks due to small RX buffer size.

Maybe this will motivate me to start a current Zigbee implementation pros/cons list thread… I know there’s data out there, but it’s all over the place and hard to tell what’s outdated/obsolete.

I’m under the impression no other containers restart when home assistant core restarts. So I think with both deconz and the new openzwave add-ons, the mesh stays up during a core restart. I also have a second pi running along side my primary home assistant device. I too thought about breaking it off, but the deconz container really doesn’t seem to bog down the host unless I’m going into it via VNC. So I’m just going to keep it on my main HA host and add it to the stack.

i have the same problem with ZHA after some HA-server was offline for some time.
anyone know how to fix this? is this being worked on ?

using latest HA with ZHA and conbee-II-stick

FWIW, the situation that happened in the initial post with everything being offline and requiring re-pairing has not happened again, even with lots of reboots, restarts, HA updates, device adds/removals, etc.

Based on what I observed and what I’ve learned since then, I suspect that the issue due to a change in the channel-configuration capabilities of the ZHA integration.

Details:

  1. I had tried to switch my Zigbee network to channel 25 by putting the ConBeeII stick into my PC, using deCONZ to update the channel, confirming I could see all my devices, and then moving the stick back to my Home Assistant host. I suspect (but can’t confirm) that this did in, in fact, change the channel with an older version of HA where the ZHA integration didn’t attempt to change the channel.
  2. I updated to a new version of Home Assistant that had a change to allow some additional configuration of the ZHA integration via YAML, including channel number
  3. I suspect that this update caused the ZHA integration to switch my ConBeeII stick to the default channel (15), since I didn’t have one specified in my configuration.yaml file
  4. My solution was to re-pair my repeater devices to the ConBee (or each other) and then the Endpoint devices re-joined the network after a few minutes. My network is now running on channel 15 and has been reliable for ~5mo now.
3 Likes

thank you. if it happens again, i’ll try as you suggested. :+1: