Zigbee network keeps crashing ... feels like I tried everything

Hi everyone,

It’s a bit despaired that I’m reaching out as I usually get my way around reading over and over posts and issues from others.
But this time, I can’t make it work.

Context :
I’m running a small network of devices, a few Aqara Zigbee Switches, a couple Yeelight Bulb and a few plugs, both Zigbee and Wifi. All this on the Rpi 3B+ with SSD and ZZh Zigbee stick.

I used to have my switches paired with the Xiaomi Gateway itself connected to HA and all automation done via nodered. This was more than 2 years ago. And it ran perfect for that long with literally no crash whatsoever.

I recently thought it was time to update HA seeing the most recent development which are amazing (such a long way in 3 years).
So I want for a fresh HA install and it went rather smooth. I took this opportunity to “get rid” of the gateway and pair my switches directly with HA through MQTT (Mosquito and Z2M) and still using Node Red.

Since then, there wasn’t a single day I didn’t have to restart Z2M to make my network functional again.
I was working with an old Zigate V1 which is officially not supported. So I changed it for the praised ZZh thinking it would make things better. Nope.

So I tried all i could read:

  • extension cable, 1.3m, 5m, several positions far from the router and Rpi. Nothing
  • having only the ZZh stick on the Pi (with the SSD). Nothing
  • changing Zigbee channel > I’m currently on 25 which is empty around (I can scan the wifi channels with my ISP router)
  • unplug and replug the stick and well pushing the little reset button. Still nothing.

Nothing to do, after a few hours, like 6/8, the network is broken and I get the “failed to execute LQI for coordinator” and nothing works.

Nothing in the HA logs that is related to Z2m, only a couple things related to Xiaomi map extractor.
So I’ve enabled the Herdsman logging as I’ve seen asked in other posts, but I have to admit at this point it’s a bit Chinese to me…
If anyone has an idea, I’d love to hear it …

Thanks for reading through, that’s already something :wink:

Here is the last bit of the Z2M log in case it helps

022-03-15T19:42:35.577Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - incomingMsg - {"groupid":0,"clusterid":64704,"srcaddr":31227,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":105,"securityuse":0,"timestamp":4227057,"transseqnumber":0,"len":70,"data":{"type":"Buffer","data":[28,95,17,23,10,247,0,65,61,100,16,0,3,40,25,152,57,0,0,0,0,149,57,44,233,181,61,150,57,0,192,15,69,151,57,0,0,0,0,5,33,1,0,154,32,0,8,33,22,1,7,39,0,0,0,0,0,0,0,0,9,33,2,4,11,32,0,155,16,0]}}
2022-03-15T19:42:35.579Z zigbee-herdsman:controller:log Received 'zcl' data '{"frame":{"Header":{"frameControl":{"frameType":0,"manufacturerSpecific":true,"direction":1,"disableDefaultResponse":true,"reservedBits":0},"transactionSequenceNumber":23,"manufacturerCode":4447,"commandIdentifier":10},"Payload":[{"attrId":247,"dataType":65,"attrData":{"type":"Buffer","data":[100,16,0,3,40,25,152,57,0,0,0,0,149,57,44,233,181,61,150,57,0,192,15,69,151,57,0,0,0,0,5,33,1,0,154,32,0,8,33,22,1,7,39,0,0,0,0,0,0,0,0,9,33,2,4,11,32,0,155,16,0]}}],"Command":{"ID":10,"name":"report","parameters":[{"name":"attrId","type":33},{"name":"dataType","type":32},{"name":"attrData","type":1000}]}},"address":31227,"endpoint":1,"linkquality":105,"groupID":0,"wasBroadcast":false,"destinationEndpoint":1}'
2022-03-15T19:42:35.585Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext []
2022-03-15T19:43:29.319Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,3,69,196,188,30,0,32]
2022-03-15T19:43:29.320Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,3,69,196,188,30,0,32]
2022-03-15T19:43:29.325Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 3 - 2 - 5 - 196 - [188,30,0] - 32
2022-03-15T19:43:29.326Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - srcRtgInd - {"dstaddr":7868,"relaycount":0,"relaylist":[]}
2022-03-15T19:43:29.326Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext []
2022-03-15T19:43:29.330Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,62,68,129,0,0,0,0,188,30,1,1,0,102,0,115,198,115,0,0,42,28,95,17,4,10,1,255,66,33,100,16]
2022-03-15T19:43:29.331Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,62,68,129,0,0,0,0,188,30,1,1,0,102,0,115,198,115,0,0,42,28,95,17,4,10,1,255,66,33,100,16]
2022-03-15T19:43:29.333Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [1,3,40,29,152,57,195,245,240,64,149,57,42,76,9,65,5,33,12,0,154,32,0,8,33,92,17,9,33,0,2,188]
2022-03-15T19:43:29.334Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,62,68,129,0,0,0,0,188,30,1,1,0,102,0,115,198,115,0,0,42,28,95,17,4,10,1,255,66,33,100,16,1,3,40,29,152,57,195,245,240,64,149,57,42,76,9,65,5,33,12,0,154,32,0,8,33,92,17,9,33,0,2,188]
2022-03-15T19:43:29.336Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [30,29,151]
2022-03-15T19:43:29.337Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,62,68,129,0,0,0,0,188,30,1,1,0,102,0,115,198,115,0,0,42,28,95,17,4,10,1,255,66,33,100,16,1,3,40,29,152,57,195,245,240,64,149,57,42,76,9,65,5,33,12,0,154,32,0,8,33,92,17,9,33,0,2,188,30,29,151]
2022-03-15T19:43:29.338Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 62 - 2 - 4 - 129 - [0,0,0,0,188,30,1,1,0,102,0,115,198,115,0,0,42,28,95,17,4,10,1,255,66,33,100,16,1,3,40,29,152,57,195,245,240,64,149,57,42,76,9,65,5,33,12,0,154,32,0,8,33,92,17,9,33,0,2,188,30,29] - 151
2022-03-15T19:43:29.339Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - incomingMsg - {"groupid":0,"clusterid":0,"srcaddr":7868,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":102,"securityuse":0,"timestamp":7587443,"transseqnumber":0,"len":42,"data":{"type":"Buffer","data":[28,95,17,4,10,1,255,66,33,100,16,1,3,40,29,152,57,195,245,240,64,149,57,42,76,9,65,5,33,12,0,154,32,0,8,33,92,17,9,33,0,2]}}
2022-03-15T19:43:29.343Z zigbee-herdsman:controller:log Received 'zcl' data '{"frame":{"Header":{"frameControl":{"frameType":0,"manufacturerSpecific":true,"direction":1,"disableDefaultResponse":true,"reservedBits":0},"transactionSequenceNumber":4,"manufacturerCode":4447,"commandIdentifier":10},"Payload":[{"attrId":65281,"dataType":66,"attrData":{"3":29,"5":12,"8":4444,"9":512,"100":1,"149":8.581094741821289,"152":7.53000020980835,"154":0}}],"Command":{"ID":10,"name":"report","parameters":[{"name":"attrId","type":33},{"name":"dataType","type":32},{"name":"attrData","type":1000}]}},"address":7868,"endpoint":1,"linkquality":102,"groupID":0,"wasBroadcast":false,"destinationEndpoint":1}'
2022-03-15T19:43:29.350Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext []
2022-03-15T22:01:26.822Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtLqiReq - {"dstaddr":0,"startindex":0}
2022-03-15T22:01:26.823Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,3,37,49,0,0,0,23]
2022-03-15T22:01:37.835Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtLqiReq - {"dstaddr":0,"startindex":0}
2022-03-15T22:01:37.836Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,3,37,49,0,0,0,23]
Zigbee2MQTT:error 2022-03-15 23:01:43: Failed to execute LQI for 'Coordinator'
2022-03-15T22:01:44.927Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtLqiReq - {"dstaddr":31227,"startindex":0}
2022-03-15T22:01:44.928Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,3,37,49,251,121,0,149]
2022-03-15T22:01:55.938Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtLqiReq - {"dstaddr":31227,"startindex":0}
2022-03-15T22:01:55.939Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,3,37,49,251,121,0,149]
Zigbee2MQTT:error 2022-03-15 23:02:01: Failed to execute LQI for 'Prise Test 1'
2022-03-15T22:02:02.954Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtLqiReq - {"dstaddr":7868,"startindex":0}
2022-03-15T22:02:02.955Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,3,37,49,188,30,0,181]
2022-03-15T22:02:13.963Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtLqiReq - {"dstaddr":7868,"startindex":0}
2022-03-15T22:02:13.964Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,3,37,49,188,30,0,181]
Zigbee2MQTT:error 2022-03-15 23:02:19: Failed to execute LQI for 'Prise murale Chambre'
2022-03-15T22:08:40.682Z zigbee-herdsman:controller:endpoint Command 0x00158d000313e2b7/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false})
2022-03-15T22:08:40.683Z zigbee-herdsman:adapter:zStack:adapter sendZclFrameToEndpointInternal 0x00158d000313e2b7:7868/1 (0,0,1)
2022-03-15T22:08:40.684Z zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - dataRequest - {"dstaddr":7868,"destendpoint":1,"srcendpoint":1,"clusterid":6,"transid":4,"options":0,"radius":30,"len":3,"data":{"type":"Buffer","data":[1,4,0]}}
2022-03-15T22:08:40.685Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,13,36,1,188,30,1,1,6,0,4,0,30,3,1,4,0,144]
2022-03-15T22:08:46.690Z zigbee-herdsman:controller:endpoint Command 0x00158d000313e2b7/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)
Zigbee2MQTT:error 2022-03-15 23:08:46: Publish 'set' 'state' to 'Prise murale Chambre' failed: 'Error: Command 0x00158d000313e2b7/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)'

overheating or underpowered?

You didn’t say which Xiaomi switch you have but I notice a lot of them seem to have this note on pairing:

You may have to unpair the switch from an existing coordinator before the pairing process will start. If you can’t do this, try to remove battery (if it has one), push the button (to completely discharge device), place the battery back and try pairing again.

Xiaomi devices I know are a bit weird. I’ve seen koenkk and others mention a few times that Xiaomi did not actually implement the zigbee spec completely so they have some odd behaviors.

In addition to changing the channel the other “clean slate” button (so to speak) is to change the the encryption key as described here and then repair everything. That helped me get rid of unexplained errors when I changed coordinators. Although I was using z2m before and after not going from a Xiaomi hub to z2m so a bit different use case.

Other then that I’d actually suggest bringing these details over to either z2m’s forum or discord. HA is basically just an Mqtt client in this case, z2m is doing all the zigbee work. So you might have better luck finding folks with answers there.

1 Like

Indeed; I’ve started to lean toward this. I’d be fine moving everything to a NUC frankly, but just want to be sure this is not a problem I’ll have as well and I’d be doing it for nothing…
My network is honestly not that big and RAM + CPU usage in HA seems well in the ok range.

That’s true but on the other hand, I never had a problem pairing them.
I did remove them from the Gateway first, otherwise indeed, they don’t show up in Z2M.
And it’s not just the switches, Z2M tells me first “failed to execute LQI with Coordinator”. Which seems to mean it’s not only the switches?
The whole network map is broken within Z2M.

But you’re right, might be more for a Z2M forum :slight_smile:

Thanks for reading and answering though.

I run Z2MQTT just fine on a pi 3B+ with a network of about 20 devices, and it is solid as a rock. I was using a CC2531 stick, but recently upgraded to the new Sonoff 3.0 stick and it is still stable.

Have you tried flashing the latest FW to the ZZH stick? You can get the latest update from the Z2M site.

Yes, that’s also what I feel. It shouldn’t be a power problem.
And Yes, I flashed the ZZh when it arrived a week ago. You have too anyway, it comes with a dev firmware that only makes the led blink.

Someone else point me to a possible powersupply problem as running an SSD could be too much for the PI and suggested a powered USB hub to plug the SSD.

How are you running your PI? SSD as well?

@PJBLinkHA I run mine off SSD as well. I am using the official raspi power supply.

Since installing it, I have run it off CH11 only. I know it’s a pain to re-pair all devices, but maybe that is worth trying ?

Mine failed a couple month ago so I had to replace it.
I’ll try a different one in case but it was working well after I changed it and before I moved everything to Z2M.

For the channel, 11 is quite busy around me. I can use my ISP provided router to scan the area for the best channel to use. And 25 is completely empty.
Any chance it would just “not like” 25 for no reason ? Or any disturbance not related to other WIFI ?

I don’t know, at this point, I can try anything …

I have a sonoff 3.0 small network, and the latest Z2M introduced a little instability. Last month’s never went down, this month switches sometimes just stop working.

Interesting. Aqara switches?
Do the other devices stay linked to the coordinator when you pull the map?

Bon, I’ve tried to solve any potential power problem but it doesn’t change anything.
What I’ve done since last time:

  • adding a powered USB hub for the SSD as suggested
  • changing the power supply for a brand new and official Raspberry one.
  • moving the SSD away from the PI and the Zigbee dongle on the other side so now physically it’s “Zigbee Dongle < RPi > SSD”, with like 15 cm between each (knowing that I already tried moving the Zigbee Dongle 3 meters away from the Pi and SSD and it didn’t change anything)

This made absolutely no difference, my Zigbee network is still crashing after 6/8 sometimes 12 hours.

I do have a few error in the HA log which I couldn’t find any source for. Researching them leads to a lot of non conclusive answers as those errors are too broad. I’m including them below in case it’s helpful.
The “Can’t read supervisor data” seems to be annoying as after this one, I can’t access the supervisor and I have to manually reboot the Pi by pulling the plug off (soft reboot doesn’t work, not even command line on supervisor, neither do the rebuild and repair commands by the way).
BUT this error is not always there when my Zigbee network crashed and vice versa.

I’m a bit desperate at this point because I tried everything I read and could and not luck… Does anyone have an idea?

Logger: homeassistant.components.websocket_api.http.connection
Source: components/websocket_api/http.py:254
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 09:41:43 (1 occurrences)
Last logged: 09:41:43

[547516776992] Disconnected: Did not receive auth message within 10 seconds

Logger: aiohttp.server
Source: /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py:405
First occurred: 25 mars 2022, 21:48:40 (2 occurrences)
Last logged: 01:38:31

Error handling request
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py”, line 334, in data_received
messages, upgraded, tail = self._request_parser.feed_data(data)
File “aiohttp/_http_parser.pyx”, line 551, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message=‘Pause on PRI/Upgrade’

Logger: frontend.js.latest.202203012
Source: components/system_log/init.py:190
First occurred: 25 mars 2022, 20:54:12 (5 occurrences)
Last logged: 25 mars 2022, 20:54:21

:0:0 ResizeObserver loop completed with undelivered notifications.

Logger: homeassistant.components.hassio
Source: components/hassio/init.py:569
Integration: Home Assistant Supervisor (documentation, issues)
First occurred: 15:45:19 (2 occurrences)
Last logged: 16:16:00
Can’t read Supervisor data:

With some detail for this last one:

2022-03-26 15:06:09 ERROR (MainThread) [frontend.js.latest.202203012] TypeError: e is undefined
2022-03-26 15:06:13 ERROR (MainThread) [frontend.js.latest.202203012] TypeError: e is undefined
2022-03-26 15:11:29 ERROR (MainThread) [frontend.js.latest.202203012] :0:0 ResizeObserver loop completed with undelivered notifications.
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/45df7312_zigbee2mqtt/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_samba/stats request
2022-03-26 15:45:19 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data:
2022-03-26 16:03:12 WARNING (SyncWorker_4) [custom_components.xiaomi_cloud_map_extractor.camera] Unable to retrieve map data
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mariadb/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/45df7312_zigbee2mqtt/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_ssh/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_configurator/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/cebe7a76_hassio_google_drive_backup/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_samba/stats request
2022-03-26 16:16:00 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data:   

Geez, what a PITA…
Sorry, can’t add anything to what you’ve tried.

Hope you get it solved very soon!

Thanks me too … :frowning:
I guess I’ll try to re-install HA. Maybe some corrupted file somewhere …

Not using Zigbee myself (for exactly these kind of reasons), but I think the post above by @CentralCommand is important. Xiaomi doesn’t properly implement the Zigbee standard in their devices, they modified some protocol timing to save battery life as far as I remember, making them a PITA to use with other controllers. They will obviously work with Xiaomis own gateway (which is probably all they and most users care about). If you keep having issues and don’t want to spend your time trying to work around that stuff, maybe going back to using them over their own gateway would be the most sensible thing to do ?

Thanks for your reply.

I would agree if I wasn’t also loosing the “connection” with the coordinator in Z2M. Unless you think they can make the whole network fall down?
And I also have Aqara devices, which yes are Xiaomi behind, but that are widely used and quite “renown” (a motion sensor and a few contact sensor). Nobody reported such problems as far as I know?
And a lot of people use them as they are listed as compatible devices on Z2M website.

I might be wrong but I think if the Aqara devices would be so unstable I would have read about it since I spent so many hours researching the issue.

I’ll try a fresh install tonight as I’m kind of out of solution for now. And I feel more and more that the Supervisor crash could be somehow related.
Just hoping the backups won’t bring the issue with them…

Just found about a new error in the supervisor log :

[supervisor.api.ingress] Ingress error: Cannot connect to host ssl:default [Connect call failed (‘’, 8099)].

Could this be related?

A lot of people have issues with Aqara devices when not used on their own hub. There’s even a note in the official HA ZHA integration docs (Note that some Zigbee devices are not fully compatible with all brands of Zigbee router devices. Xiaomi/Aqara devices are for example known not to work with Zigbee router devices …).

And since this is a fundamental problem with all Xiaomi / Aqara devices, HA isn’t the only affected platform. Here’s what Hubitat has to say about this. And Xiaomi isn’t the only bad player in the field. Ikea devices also stretched the standard and people have lots of problems with their devices when not used on their official hub for that reason. Zigbee is a mess, because the standard is not enforced and manufacturers basically do whatever they want.

Thanks for sharing. Interesting indeed.
Although, the two kinds of problem they mention (pairing and stability) are related to using some type of coordinators or routers and the list is quite precise. And I’m not using any of them. The Coordinator is a a ZZh which is highly compatible and all my router are Xiaomi so compatible between themselves at least.
Still I’ll order a non Xiaomi Zigbee device to be sure.

In the meantime, I re-installed and restored HA. Exact same issue…
I’m also trying this : How to increase the swap file size on Home Assistant OS
To solve my “Can’t read supervisor data” errors. I have a feeling it’s not unrelated.

I am having the same logs and problems.
My Raspberry 4 crashes around 3 to 5 am (only when I have the sticj plugged in - I have a flashed Sonoff Zigbee dongle 3.0)

At first, my Z2M wouldn’t even start, but after adding a 1M extension cable it started working, but only until these problems started.

I am waiting for a 5v 4A power supply to arrive to test it, but I had no success with many different approaches.
I tried increasing the Swap ram memory and disabling the UAS (since my supervisor logs were really slow even before the Z2M).

Please report if anyone figure how to solve this.