Zigbee network keeps crashing ... feels like I tried everything

I have a sonoff 3.0 small network, and the latest Z2M introduced a little instability. Last month’s never went down, this month switches sometimes just stop working.

Interesting. Aqara switches?
Do the other devices stay linked to the coordinator when you pull the map?

Bon, I’ve tried to solve any potential power problem but it doesn’t change anything.
What I’ve done since last time:

  • adding a powered USB hub for the SSD as suggested
  • changing the power supply for a brand new and official Raspberry one.
  • moving the SSD away from the PI and the Zigbee dongle on the other side so now physically it’s “Zigbee Dongle < RPi > SSD”, with like 15 cm between each (knowing that I already tried moving the Zigbee Dongle 3 meters away from the Pi and SSD and it didn’t change anything)

This made absolutely no difference, my Zigbee network is still crashing after 6/8 sometimes 12 hours.

I do have a few error in the HA log which I couldn’t find any source for. Researching them leads to a lot of non conclusive answers as those errors are too broad. I’m including them below in case it’s helpful.
The “Can’t read supervisor data” seems to be annoying as after this one, I can’t access the supervisor and I have to manually reboot the Pi by pulling the plug off (soft reboot doesn’t work, not even command line on supervisor, neither do the rebuild and repair commands by the way).
BUT this error is not always there when my Zigbee network crashed and vice versa.

I’m a bit desperate at this point because I tried everything I read and could and not luck… Does anyone have an idea?


Logger: homeassistant.components.websocket_api.http.connection
Source: components/websocket_api/http.py:254
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 09:41:43 (1 occurrences)
Last logged: 09:41:43

[547516776992] Disconnected: Did not receive auth message within 10 seconds


Logger: aiohttp.server
Source: /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py:405
First occurred: 25 mars 2022, 21:48:40 (2 occurrences)
Last logged: 01:38:31

Error handling request
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py”, line 334, in data_received
messages, upgraded, tail = self._request_parser.feed_data(data)
File “aiohttp/_http_parser.pyx”, line 551, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message=‘Pause on PRI/Upgrade’


Logger: frontend.js.latest.202203012
Source: components/system_log/init.py:190
First occurred: 25 mars 2022, 20:54:12 (5 occurrences)
Last logged: 25 mars 2022, 20:54:21

:0:0 ResizeObserver loop completed with undelivered notifications.


Logger: homeassistant.components.hassio
Source: components/hassio/init.py:569
Integration: Home Assistant Supervisor (documentation, issues)
First occurred: 15:45:19 (2 occurrences)
Last logged: 16:16:00
Can’t read Supervisor data:

With some detail for this last one:

2022-03-26 15:06:09 ERROR (MainThread) [frontend.js.latest.202203012] https://192.168.1.50:8123/frontend_latest/bfb7ed92.js:1:1608 TypeError: e is undefined
2022-03-26 15:06:13 ERROR (MainThread) [frontend.js.latest.202203012] https://192.168.1.50:8123/frontend_latest/bfb7ed92.js:1:1608 TypeError: e is undefined
2022-03-26 15:11:29 ERROR (MainThread) [frontend.js.latest.202203012] :0:0 ResizeObserver loop completed with undelivered notifications.
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/45df7312_zigbee2mqtt/stats request
2022-03-26 15:45:19 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_samba/stats request
2022-03-26 15:45:19 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data:
2022-03-26 16:03:12 WARNING (SyncWorker_4) [custom_components.xiaomi_cloud_map_extractor.camera] Unable to retrieve map data
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mariadb/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/45df7312_zigbee2mqtt/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_ssh/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_mosquitto/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/a0d7b954_nodered/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_configurator/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/cebe7a76_hassio_google_drive_backup/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_duckdns/stats request
2022-03-26 16:16:00 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /addons/core_samba/stats request
2022-03-26 16:16:00 WARNING (MainThread) [homeassistant.components.hassio] Can't read Supervisor data:   

Geez, what a PITA…
Sorry, can’t add anything to what you’ve tried.

Hope you get it solved very soon!

Thanks me too … :frowning:
I guess I’ll try to re-install HA. Maybe some corrupted file somewhere …

Not using Zigbee myself (for exactly these kind of reasons), but I think the post above by @CentralCommand is important. Xiaomi doesn’t properly implement the Zigbee standard in their devices, they modified some protocol timing to save battery life as far as I remember, making them a PITA to use with other controllers. They will obviously work with Xiaomis own gateway (which is probably all they and most users care about). If you keep having issues and don’t want to spend your time trying to work around that stuff, maybe going back to using them over their own gateway would be the most sensible thing to do ?

Thanks for your reply.

I would agree if I wasn’t also loosing the “connection” with the coordinator in Z2M. Unless you think they can make the whole network fall down?
And I also have Aqara devices, which yes are Xiaomi behind, but that are widely used and quite “renown” (a motion sensor and a few contact sensor). Nobody reported such problems as far as I know?
And a lot of people use them as they are listed as compatible devices on Z2M website.

I might be wrong but I think if the Aqara devices would be so unstable I would have read about it since I spent so many hours researching the issue.

I’ll try a fresh install tonight as I’m kind of out of solution for now. And I feel more and more that the Supervisor crash could be somehow related.
Just hoping the backups won’t bring the issue with them…

Just found about a new error in the supervisor log :

[supervisor.api.ingress] Ingress error: Cannot connect to host 172.30.33.3:8099 ssl:default [Connect call failed (‘172.30.33.3’, 8099)].

Could this be related?

A lot of people have issues with Aqara devices when not used on their own hub. There’s even a note in the official HA ZHA integration docs (Note that some Zigbee devices are not fully compatible with all brands of Zigbee router devices. Xiaomi/Aqara devices are for example known not to work with Zigbee router devices …).

And since this is a fundamental problem with all Xiaomi / Aqara devices, HA isn’t the only affected platform. Here’s what Hubitat has to say about this. And Xiaomi isn’t the only bad player in the field. Ikea devices also stretched the standard and people have lots of problems with their devices when not used on their official hub for that reason. Zigbee is a mess, because the standard is not enforced and manufacturers basically do whatever they want.

Thanks for sharing. Interesting indeed.
Although, the two kinds of problem they mention (pairing and stability) are related to using some type of coordinators or routers and the list is quite precise. And I’m not using any of them. The Coordinator is a a ZZh which is highly compatible and all my router are Xiaomi so compatible between themselves at least.
Still I’ll order a non Xiaomi Zigbee device to be sure.

In the meantime, I re-installed and restored HA. Exact same issue…
I’m also trying this : How to increase the swap file size on Home Assistant OS
To solve my “Can’t read supervisor data” errors. I have a feeling it’s not unrelated.

I am having the same logs and problems.
My Raspberry 4 crashes around 3 to 5 am (only when I have the sticj plugged in - I have a flashed Sonoff Zigbee dongle 3.0)

At first, my Z2M wouldn’t even start, but after adding a 1M extension cable it started working, but only until these problems started.

I am waiting for a 5v 4A power supply to arrive to test it, but I had no success with many different approaches.
I tried increasing the Swap ram memory and disabling the UAS (since my supervisor logs were really slow even before the Z2M).

Please report if anyone figure how to solve this.

Hi there,

I’m still in testing phase but I think I found the issue.
It seems my SSD enclosure was faulty, working but most likely generating a lot of errors.
I also move the Zigbee dongle to the powered hub, probably help.
Then fresh install and it seems to be working so far. I’m reinstalling addons and devices one after the other with a few days in between so I can identify where crashed come from if any.

I’m not there yet but seems on track.
Hope that helps but it seems my pb was very specific…

I might have a “faulty” SSD SATA adapter to USB form my Pi. I was having a huge delay with Supervisor Logs and after disabling the UAS it started working. Weird to affect the system only when adding a Zigbee dongle and running Z2M.
I will try with the Power supply i’m still waiting to arrive and after that (if not solved) I might change the adapter.

What erros did you see?

It’s hard to say what was related to what (see above, I posted the logs and some errors) but I had regular “can’t read supervisor data” which I couldn’t find the source for. The disk sort of connecting/disconnecting, sometime the LED light from the enclosure was only partially lit up which is not normal on this enclosure.
I changed it and since then, running smoothly.

@PJBLinkHA any update here? Things are working as they should?

Note that USB 3.0 ports are known to cause serious interference with Zigbee Coordinator, and while that is not an issue on Raspberry Pi 3 as it has no USB 3.0 ports, if you move to a NUC or other computer with only USB 3.0 ports then you should also consider putting the Zigbee Coordinator on a powered USB 2.0 hub to be sure it is not close to any USB 3.0 ports or other pheriphials and cables connected to USB 3.0 ports (especially SSD disks connected to USB 3.0 ports are known to cause a lot of interference for Zigbee).

Regardless of the actual root cause, always aim to keep the firmware of the Zigbee Coordinator updated, add more products acting as Zigbee Routers devices and implement workarounds for interference, see:

https://github.com/home-assistant/home-assistant.io/pull/18864

and

https://www.home-assistant.io/integrations/zha#best-practices-to-avoid-pairingconnection-difficulties

Understand and remember that Zigbee signals are weak so rely on a strong Zigbee network mesh (meaning many Zigbee Router devices) and are very sensitive to RMF/EMI/RMI interference so it makes it much easier to troubleshoot and find the real root cause if have already optimized your setup and environment to work around that.

I’ve been meaning to report for a while but didn’t take the time, sorry.
Yet, I’m not sure it’s going to be very helpful for anywone as it was super specific to my case: my SSD enclosure was faulty, creating errors accumulating, creating lag and corruption on the pi. Too much for the Zigbee network which was going down.
I’m lucky I noticed the led blinking a bit weakly on the enclosure while doing my third re-install of HA, otherwise, I could have searched forever …
Since I changed the enclosure, not even one crash.

What also helped before and after that issue was:

  • moving the Zigbee dongle to a powered USB (2) Hub (even with genuine pi power supply, it’s not enough for the Zzh for example)
  • increasing the swap file. I have a Pi 3B and HA has become a bit too much for 1go of Ram. This [(How to increase the swap file size on Home Assistant OS)] is super easy and work perfectly. It solved all of the other issue I had on the PI.

Hope this helps someone one day!
And thanks to all who took the time to answer :slight_smile:

@ marcoskp, did you find your problem?

FYI, have summarized all my general tips here since my ZHA docs PR is taking forever to be reviewed:

https://github.com/zigpy/zigpy/wiki/General-tips-on-improving-Zigbee-network-range

Radio frequency interference impact on all 2.4GHz wireless devices is a well known issue with USB 3.0:

https://www.usb.org/sites/default/files/327216.pdf

…and how USB 3.0 interfere with Zigbee is actually covered in linked tips :wink:

Workaround is to move the Zigbee USB dongle to a powered USB 2.0 hub or to a native USB 2.0 port.

1 Like