I’m trying to improve the reliability of my zigbee mesh
added extra repeaters (now 6 repeaters and 5 end devices (2 are both)
Updated firmware in both Sonoff Dongle-P’s (coordinator and repeater)
changed my Wi-fi channels (to 1 and 6) and changed zigbee channel to 25
physically moved the coordinator Sonoff Dongle-P closer to the rest of network (HA PC is in corner of house, so swapped to 3m USB extension cable)
reset and added IKEA signal repeaters, now using all convenient power sockets.
I have one link (coordinator to Sonoff BasicZBR3) which is 5m line of sight, yet usually reports 50-75 LQI - lower than those going through solid concrete walls.
… and still the Visualisation is showing only yellow, red and grey lines. Yes I understand that LQI is a measure of the signal strength (though manufacturers can calculate it differently), and have read comments that the Visualisation is pretty but not particularly accurate … so I have setup a lovelace panel to summarise LQI and time last updated.
Then I looked at the graph of history of LQI. I had not realised that the IKEA repeaters go unavailable at the same time, on a regular two hours on, 2 hours unavailable pattern. Curiously the BasicZBR3 has the lowest LQI numbers, but stays connected.
Hmmm, thinking that the Sonoff was getting a steady (albeit weaker) signal because of being line-of-sight, I used an extension power cord to move Tradfri 03 to where the Sonoff was (line of sight 5m), and bumped the Sonoff a bit further away … and this is the result
OK, setting “Consider mains powered devices unavailable after” to 1 hour (3600 seconds) has had a noticable improvement - the Tradfri repeaters now seem to be “unavailable” for 1 hour, but still sticking to their 4 hour cycle.
Doh !! Having my HA server and zigbee coordinator located in my study (second bedroom) means that the link between coordinator and living room (where most zigbee devices are located) is critical to the entire network. Doesn’t matter how good the rest of the mesh is, if messages are struggling to get to/from the coordinator, or all the traffic is causing a bottleneck on one link.
So I have now installed proxmox and HA on an OptiPlex 7050 micro PC, restored HA backup on to it, and relocated it to the living room nearly centre of my little apartment. A couple of repeaters are already in obvious locations and I now have 3 more repeaters to place where they should make a good mesh.
I’m curious that this topic has been viewed 135 in 3 weeks … but no replies except me adding more info.
Does this mean that I am the only person this is happening for, and no-one has any suggestions ?
Or that no-one is looking at LQI ? This seems surprising given the number of posts about device unavailability in zigbee
The changes I made above have certainly improved the situation, but all my Tradfri signal repeaters and light bulb still go unavailable at the same time, and show as back online at the same time. My Sonoff repeater does not show this behaviour.
I haven’t had any issues with my network(s), but when I look at my two repeaters I see that they have no RSSI or LQI information for extended periods of time. I haven’t noticed (or known how to check) if that is causing any problems, though.
I farted around with ZHA for a long time trying to better understand it and how to monitor it. You too have spent a lot of time trying to analyze the LQI and connections between devices with what appears to be net ‘little’ success for amount of effort. To some extent I feel bad, however my path and recommendation to you was to relatively slowly move to Zigbee2MQTT. Happy place relative to ZHA. If I understand your setup, you too have Proxmox, it is pretty easy to stand up a Debian VM running Docker and install Z2M (and if needed a Mosquitto MQTT server). Buy another TI 26xx based coordinator dongle and add one or two devices and monitor this for a while (including one of your Ikea routers) (sorry to be pedantic, however there is no such thing as a ‘repeater’ in Zigbee, the Ikea devices are acting as routers. To really understand mesh networking, it is important to grok the distinction. )
Good hunting, hope you find a solid Zigbee setup, I have with Zigbee2MQTT.
P.S. from my experience with ZHA, changing the Home Assistant ‘availability’ timeouts has no useful effect on whether a zigbee device is active on a network or whether you are seeing a true view of the network.
It’s rather difficult to manage complex networks of autonomous nodes - (n*(n-1)) meshes always give me headaches so my approach has been not to look too closely at the detail, but trust the individual nodes to “do the right thing” and just help them see enough RF energy from neighbours.
Adding and moving routers is basically what I’d do as well, which makes me wonder about an external factor like a RF noise source…
Repeated times between failure of exactly 600 seconds does sound like a protocol issue though - hence my IKEA firmware update thought (ZHA + TRÅDFRI + SilLabs coordinator works well here).
@donburch888 I do not see a mention by you of the type of USB cabling you are using between your Dell and the Zigbee coordinator. As is mentioned in this thread, poor USB 3 shielding is a documented cause of issues with Zigbee and other 2.4 Ghz devices close to the source of the interference. Putting your zigbee coordinator on a USB 2.0 hub between the coordinator and the Dell with a meter of cable and distance at least between the coordinator and USB 2.0 hub might be something to do. If your HA server setup looks anything like the referenced video, you have bigger issues .
802.15.4 mesh networks have been around a long time in tech years and are solid and were designed to not require management of the level are required of say a wifi network. If they had the issues that you are seeing with your ZHA setup, divisions like Signify, Ikea and Aqara would not be selling the 100k’s of units they are.
If your zigbee ‘map’ shows only a single router device connecting your coordinator to the rest of your zigbee mesh, that is probably a very weak link in a setup. Even on my ZHA setup when I had it running, the coordinator was connected to multiple routers and end devices. Several of them with stucco walls between. Below is a picture of Z2M network map that is healthy and does not have single point of failure between coordinator and remainder of network:
I enabled the LQI and RSSI entities for all my zigbee and wi-fi devices … but only my wi-fi devices are reporting a numeric value for RSSI. All my zigbee devices report RSSI as “unknown” (except for any that are powered off, which are “unavailable”), and have never reported any numeric value over the past month.
I am thinking now that maybe the Ikea repeaters are working, routing data packets correctly; but that the ZHA integration is erroneously flagging the Ikea repeaters as unavailable. Your post seems to confirm that the end devices are working as expected; and that it is the logging of LQI and RSSI which is the issue.