Zigbee not healing

I am debugging a weird issue regarding my zigbee network. I have a wide network (over 50 ZigBee nodes) and everything is working fine. All the current nodes are wall powered, so they all act as routers. They are Leviton switches and dimmers throughout.
Using ZigBee2MQTT with an ethernet coordinator, and a few of the same ethernet units programmed as routers for extra coverage. I use the map to see how everything is connected, and I can say, it is pretty well covered. I am using channel 25 for the ZigBee to clear any of my Wi-Fi. All worked great for a while, until one of the breaker tripped, and a few opf the wall switches went offline (expected). The weird behavior is that all of the nodes (other switches and dimmers) that were connected directly to the switches that went out, also went offline.
I expected the ones that lost power to go off, and the other ones to find another route back to the coordinator, but that did not happen.
Any ideas?
I tried the solution mentioned on the thread below by unplugging the coordinator, but no luck.
I still have the breaker tripped until I solve this, in case I can’t repeat it it, I think it is a corner case (maybe?)

This is my crazy network now. Aqara outlets, switches, dimmers, and breakers.

As I am aiming at a very reliable network given how busy it is, should I have more of the routers (the ethernet ones with the antenna) around? and bind all switches to them when I reconnect?
I expect not, that the ZigBee network should always find a route to the coordinator. I prefer to leave it self-healing rather than push for a route via binding, but not sure how to solve the issue at hand if another breaker trips sometime and I loose too many nodes again.

Not directly addressing your problem. That said with as large a zigbee2mqtt setup as you have, I would setup a 2nd zigbee2mqtt network with it’s own coordinator for testing. I do this we docker, I have both a production zigbee2mqtt and a test zigbee2mqtt docker containers running on the same physical machine. Each zigbee2mqtt’s have their own usb coordinator (in my case). Each have unique MQTT base topics. So I can feed devices from both into HA if I want.

I add new devices to the test zigbee2mqtt instance and see how they work and interact. In your case, setup one or more of your leviton switches to the test instance and then add one or more of your end device to these routers. Watch the map, status and logs in the test setup and you take your routers off and on line. see how the network reconfigures. My experience is that the rerouting will occur fast enough that you might see a 1 or 2 second noticable reaction ‘hiccup’ in the end device but as long as their is another route, after this short event the network is back to normal operation.

Additionally you can test zigbee2mqtt version upgrades on your test network before rolling out on your main network with a pretty low time and effort cost for this test.

You do mention that you have some Aqara devices. From my experience with some of this brand (and if you search zigbee2mqtt forums and this forum you will find some lengthy discussions about some Aquara device standards compliance issues, not all Aqara devices but some) devices, odds behaviors seems to occur with these device being routed and routing with other manufactures devices.

Good hunting!

Turned the breaker back on, and everything works…so it is a non-healing setup which is not ideal. Even if I split them, any breaker going out in one of the coordinator network will take out the nodes connected to it. Very strange ZigBee behavior. now sure if it is the coordinator or some setting I have in my system.

1 Like

That is why I recommended setting up a 2nd zigbee2mqtt setup for test. Without such, trying to figure out what device is causing the issue is going to be more complex with your large number of nodes. Unfortunately, I have yet to find a good tool that allows a look at a zigbee networks routes and such over a period of time. So with a test network with a limited number of nodes, I was able to setup some test configs with various router and end devices and comprehend the network routes before and after I took a router off line. As I said, with the routers I have the end devices found new routes to the coordinator within a short period of time (of course if you have a battery powered end device that is sleeping, until it decides to wake up, the zigbee2mqtt map will show it just floating in space).

Good hunting!

Will keep at it. Thanks for the advice.