zigbee2MQTT Instability

Zigbee2MQTT and ember is simply broken… You are not the only one.

Time to move to ZHA?

To try to stabilise

I have moved zigbee2MQTT to a factory fresh Home Assistant green to only do zigbee2MQTT

I have moved to an Ethernet Poe smlight Zigbee coordinator to ensure I can place it in the optimal location for a house wide zigbee.

I have optimised WiFi and zigbee,

Hue on channel 25

zigbee2MQTT on 11

Wi-Fi channels 6 & 11

Struggling to keep zigbee2MQTT up and running without playing the offline wackomole

1 Like

I also had to reset most of the devices. Remove them completely and reconnect. A bit of struggle to get to some of those…

Well…. Not holding my breath, but all of a sudden zigbee2MQTT has been solid for the last 5 hours with only 2 devices dropping offline.

Error messages dropped to one every couple hours down from the constant deluge.

Fingers crossed

Ok after the 18th rebuild of the zigbee2MQTT network this year hit a record where it ran stable for 21 days, much much better than the average 13 days.

So progress until this morning, multiple mains devices fell off and several battery devices stopped responding

So repaired them again but a daily 6-8% zigbee failure rate is not sustainable.

Tried a zigbee update on the SMlight, I can see that that the Home Assistant green is connecting several times minute then disconnecting.

The zigbee2MQTT log reports :-

[12:45:04] INFO: Preparing to start...
[12:45:04] INFO: Socat not enabled
[12:45:05] INFO: Starting Zigbee2MQTT...
Starting Zigbee2MQTT without watchdog.
[2024-11-03 12:46:25] error: 	z2m: Error while starting zigbee-herdsman
[2024-11-03 12:46:25] error: 	z2m: Failed to start zigbee
[2024-11-03 12:46:25] error: 	z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-11-03 12:46:25] error: 	z2m: Exiting...
[2024-11-03 12:46:25] error: 	z2m: Error: network commissioning timed out - most likely network with the same panId or extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms
    at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
    at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:370:31)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:91:21)
    at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:158:16)
    at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29)
    at Zigbee.start (/app/lib/zigbee.ts:69:27)
    at Controller.start (/app/lib/controller.ts:161:27)
    at start (/app/index.js:154:5))
    at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:372:23)
    at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:91:21)
    at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:158:16)
    at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29)
    at Zigbee.start (/app/lib/zigbee.ts:69:27)
    at Controller.start (/app/lib/controller.ts:161:27)
    at start (/app/index.js:154:5)

Given it is connecting to the same SMLight zigbee2MQTT has become delusional in thinking that the smlight it is configured to use is somehow a different smlight it was connected to.

I have re-started Home Assistant, I have hard rebooted the SMLight and restarted zigbee2MQTT many many many times.

Given we have already ruled out wifi or other zigbee network clashes.

The SMLIGHT is running POE ethernet and away from all other electrical devices and positioned centrally in the house and 75% of the devices, all of which were paired first, including 8 Ikea zigbee routers, I am struggling to find out why zigbee2MQTT is so entirely flakey.

Fortunately at re-build 5 or 6 I moved as much as I could back to a reliable Hue hub, so most of the lighting is running reliably as it has done for the best part of 10 years.
Yes, it is zigbee channel 25 at the other end of the zigbee range away from the SMLIGHT/ zigbee2MQTT.

I’ll bring up an Aqara hub online to manage blinds, door/window sensors and leak detectors while I try to find a way to get zigbee2MQTT stable or find another way to manage zigbee devices reliably.

If anyone has any ideas I would really like to get this working before I hit the anniversary of constant rebuilds of zigbee2MQTT and Home Assistant installs at Christmas.

Ok so it looks like zigbee2MQTT has got itself all confused.

So it looks like you have to reset configuration.yaml file removing pan_id, ext_pan_id and network_id and replacing with

pan_id: GENERATE
ext_pan_id: GENERATE
network_key: GENERATE

Well that got zigbee2MQTT working again but seems to have go the extra goof up and removed all the devices.

Well I guess at least zigbee2MQTT is stable as long there no zigbee devices :slight_smile:

I need to find a lower maintenance solution, well one that can connect to zigbee devices and not lose them every couple of weeks.

Fortunately Aqara and Hue hubs are solid working with zigbee devices so I can keep the core functionality working.

This is not a goof, it is expected. Zigbee network | Zigbee2MQTT

Did you try ZHA?

If not a goof then a design flaw, zigbee2MQTT decides the coordinator it is talking to is creating a second identical zigbee network - so instead of using the exactly the same coordinator and zigbee network you have to create a whole new network to workaround z2m’s phantom network and it deletes everything is the design.

Funny thing was that I had run zigbee2MQTT reliably for 18 months as a standalone install with a NR and MQTT broker and I really liked the idea of bringing in all under Home Assistant so I could better monitor and manage updates.

Since Christmas 2023 I have had to wipe and rebuild Home Assistant several times - even purchased a Home Assistant green, then had to wipe and rebuild a couple of times to get a stable zigbee again.

So far this year I have had to re-pair approximately 1,400 devices. I started with 150-200 zigbee devices and have been reducing it every two weeks as it all died. I farm the devices out to more reliable zigbee networks, moved 100 devices off to Hue - totally solid - walk win a room and every time the lights turn on.

I am now down to ONLY 40 router devices and 5 end devices and trying to get that stable.

I did have all the Aqara Blind controllers on zigbee2MQTT but every time zigbee2MQTT dies it is a pain to go around every room at night and in the morning and close and open them.

The 15 leak sensors and 16 door/window sensors and the 3 vibration sensors have all been offline for months because before I get around to repairing them zigbee2MQTT has died again.

So today brought out an Aqara E1 hub to run all the Aqara stuff and get some stability going.

With Hue running on 25, zigbee2MQTT on 15 and Aqara on 20 they should not clash.

Maybe it is just having zigbee2MQTT under Home Assistant that makes it so unreliable, so on the next big failure I’ll run up an external zigbee2MQTT and have it feed the Home Assistant mqtt broker.

I did try ZHA but with every device taking up to an hour to show it was paired it was getting really difficult to figure out what was working or not.

I am sure there is something really obvious and stupid here but I tried to rule them out by using a freshly wipe to factory then updated, Home Assistant green.

Spending 2-4 days a month on just re-pairing - re building Home Assistant / zigbee2MQTT is kinda defeating the benefits of home automation and while it is so unreliable everything you build like energy management/monitoring, security, light automations all get broken every two weeks.

In the end it works out to be more work than the reward.

I need to find a more consistent/reliable way, maybe use Hubitat to run all the zigbee and have it publish to Home Assistant so I can manage from there.

I need to look around, 11 months is too long to get a system up and running.

It sounds a lot like you have issues with interference. Have you tried placing the coordinator differently? I had issues with devices dropping off my self, but after I placed the coordinator in a different floor it’s been solid.

Yes gone down that path.

I have Wi-Fi on 6 and 11.

I have tried usb coordinators, on long extensions. I have tried two different smlight Ethernet coordinators.

I have move the location in the house three times.

I have tried changing channels.

All to no avail.

But Hue remains strong, with 100 devices, it is solid reliable for 10 years now.

Hue is on zigbee channel 25 and zigbee2MQTT is on 15, I have put Aqara on channel 20, because I need to keep the Aqara devices reliable, and that zigbee network has no routers on it all are end devices.

So if there is an interference issue it is only impacting zigbee2MQTT.

We shall see, having just completed rebuild 19 for the year, this time with 42 routers and 8 end devices. We will see how stable it remains. The record is three weeks so far.

Failing that I’ll run up Hubitat and use the zigbee on that and feed it back to the Home Assistant MQTT. I have three hubitats and can have all three running on 11, 15 and 20, move the aqaras across.

Have you tried running only one Zigbee network? No Aqara and Hue, just the Z2M network?

Yes.

At Christmas last year for a month tried only running zigbee2MQTT I wanted to be the mother of all zigbee2MQTT networks with 85 hue routers, 15 Hue endpoints, and another 85 routers and 15 endpoints.

It was so unreliable my wife was getting very very angry because the lights stopped working - a total sense of humour failure :slight_smile:

So moved back to Hue to ensure lighting was reliable, which it has been absolutely solid on channel 15, 20 and for the last 6 months on channel 25 well out of the way of zigbee2MQTT.

Only after the 18th failure of zigbee2MQTT this year did I run up Aqara because I need the leak sensors, door sensors and window sensors, along with the vibration sensors on Parcel box and letterbox to be reliable.

This has also reduced the devices on zigbee2MQTT hopefully aiding its stability :slight_smile:

I am currently at 59 devices on zigbee2MQTT, 47 router and 12 endpoints, I have 2 more router devices to add (they take several days to re-pair and usually take between 60-100 pairing attempts before the connect) and 7 more endpoints to add.

If I can get it handle 68 devices, 49 router and 19 endpoints and stabilise that would be awesome I can manage the others using the Hue and Aqara hubs.

but so far the longest it has remained reliable if 21 days - the average is about 13 days.

Need to get the bathroom lights working again, I’ll hold off the endpoints to see it stabilises. I haven’t had them running for months so no rush - nice to have not key.

FWIW, I’ve been running docker.io/koenkk/zigbee2mqtt:latest for literally years without trouble, but this last week it keeps freezing up. Not sure what’s going on!

Since Christmas and many many rebuilds zigbee2MQTT has been unreliable normally runs for 2 weeks then devices stop responding/dropping off the coordinator.

The longest it has gone is 3 weeks.

Each time it fails I have rebuilt, zigbee2MQTT or Home Assistant, moved between VM and Home Assistant green , factory reset Home Assistant green - a few times, changed coordinators, switched between usb and ethernet, and changed channels, moved most devices off to a Hue at channel 25, well away from wifi 2.4GHz and zigbee2MQTT

have gone through all the channel clashing and segregated everything out.

This latest version I moved some critical devices across to a spare aqara hub to keep door/window/leak sensors and blinds working.

I have loaded ~60 devices ~50 router and ~10 end devices, stable after 15 days. If it get to 21+ days will move the aquaria devices back to zigbee2MQTT

This year I have experienced the worst stability year out of 30 years of home automation and really hoping it will settle down so I can rebuild the basics back in.

This is definitely not normal, and the fact you said it also takes a day for paired devices to show up even when you tried ZHA suggests it’s not limited to Z2MQTT.

I know you checked already, but this points to an interference issue (or you’re reaching the max limit of directly connected devices on your coordinator).
If you’re running WiFi on channels 6 & 11, that doesn’t leave much room for the other 3 separate ZigBee meshes you’re running.

Appreciate the response :slight_smile:

Not normal at all.

I put ZHA on the newly factory wiped Home Assistant green.

Ideally I only run on zigbee hub, zigbee2MQTT - but with zigbee2MQTT being so unstable all year moved all the Hue lights over to Hue hub - channel 25 - out of the way of everything and one of the recommended channels for Hue. All ~100 devices have be totally solid for 9 months with hue motions sensors managing most of them.

I have temporarily put most of the Aqara devices on an Aqaba hub zigbee channel 20, the blind controllers/ door/window sensors, leak sensors and vibration sensors about 50+ end devices and they have been solid.

Now running zigbee2MQTT channel 15, with a reduced subset of zigbee devices 60 in all, 51 router and 9 end devices. Has been rock solid for 17 days.

This weekend if still stable I will start moving the Aqara devices across, Blind Controllers (on every window in the house), wait a week, then door/window sensors, wait a week then leak sensors etc etc

Then I can shut down the aqara hub.

So in the 18 odd rebuilds of zigbee2MQTT / Home Assistant this year I have rarely had more than 60 devices with 83% being routers so I doubt the overloading was an issue, at least hope not :slight_smile:

I am hopeful this one will work, it is on a freshly factory reset Home Assistant green, and I a fresh new smlight and I have moved zigbee2MQTT to channel 15.

So far so good, starting to feel so hopeful I have started implementing really basic automations again, I did not set them up for the last 4 wipes and the zigbee2MQTT normally totally failed within 14 days.

So hopefully the year will end much better than it started.

Not sure what is going on but I had run zigbee2MQTT with mqtt and Node Red for a couple of years and it was solid.

Really liked the idea of Home Assistant running them to managed the updates and bringing it all together.

fingers and toes crossed.

If you really, really, really, want your zigbee stick to be successful and your main zigbee mesh, I would do these if I were you:

  • move my 2.4GHz wifi to channel 1 and 6, or ideally just channel 1.
  • and then move the hue hub to zigbee channel 15 and see how that goes.
  • and then move zha or z2m with usb stick to channel 25, along with those interference mitigation strategy you have known already.

Again, depends on how bad you want your z2m zigbee stick to work.

… and if the above z2m + sonoff would work, you might not even need hue hub… but that’s further down the road.

Oh and, instead of jamming 50 or 100 devices together at once, maybe go slow, plan for like ~10 per week, and see how your zigbee network would evolve.

If the above would fail, maybe stay with Hue Bridge and Aqara hub, forgot about z2m and move on??? Unless you have some specific reasons that you absolutely want to do USB sticks? (This last question ties back to the question above of how serious you want z2m to work.)

Yeah, I still say most likely suspect is interference.
I’m assuming you’re very familiar with this image.

If so, may I ask what prompted you to use Wifi channels 6 & 11? Wouldn’t 1 & 6 have cleared up (mostly) zigbee channels 20-26?

My secondary suspect is the sheer number of devices. Depending on the hardware/firmware version on your coordinator, you might have been running into limits for directly connected devices.
Have you tried building out your mesh with a few routing devices directly connected to the coordinator, then paired other devices via those routers? The Permit Join button allows you to specify via which device you want to pair if you click the dropdown.

Apologies if these questions have been asked and answered before. It’s a long thread and it’s been an even longer day for me.