All Tradfri signal repeaters go unavailable together, every 4 hours

I’m trying to improve the reliability of my zigbee mesh

  • added extra repeaters (now 6 repeaters and 5 end devices (2 are both)
  • Updated firmware in both Sonoff Dongle-P’s (coordinator and repeater)
  • changed my Wi-fi channels (to 1 and 6) and changed zigbee channel to 25
  • physically moved the coordinator Sonoff Dongle-P closer to the rest of network (HA PC is in corner of house, so swapped to 3m USB extension cable)
  • reset and added IKEA signal repeaters, now using all convenient power sockets.
  • I have one link (coordinator to Sonoff BasicZBR3) which is 5m line of sight, yet usually reports 50-75 LQI - lower than those going through solid concrete walls.

… and still the Visualisation is showing only yellow, red and grey lines. Yes I understand that LQI is a measure of the signal strength (though manufacturers can calculate it differently), and have read comments that the Visualisation is pretty but not particularly accurate … so I have setup a lovelace panel to summarise LQI and time last updated.
image

Then I looked at the graph of history of LQI. I had not realised that the IKEA repeaters go unavailable at the same time, on a regular two hours on, 2 hours unavailable pattern. Curiously the BasicZBR3 has the lowest LQI numbers, but stays connected.


I have also noted that the long horizontal lines tend to indicate a lack of data points - not that the LQI stays the same !

Any suggestions about what might be causing this pattern ?

Hmmm, thinking that the Sonoff was getting a steady (albeit weaker) signal because of being line-of-sight, I used an extension power cord to move Tradfri 03 to where the Sonoff was (line of sight 5m), and bumped the Sonoff a bit further away … and this is the result

On the other hand, Visualisation now shows the first green link from the coordinator.

Even more curious, HA seems to be receiving data from the sensor connected to one of the Tradfri repeaters while it is unavailable !

Still 2 hours exactly connected, then 2 hours unavailable. Since all three Tradfri’s are affected at the same time I suspect it’s an issue at Home Assistant Integration or the Coordinator

Is it coincidence that 2 hours = 7200 seconds, which is in ZHA’s Global Options setting for mains powered devices ?

Wow, increasing from 7200 to 72000 seems to push he unavailable time out to 20 hours ! I wonder what happens if I reduce it to 3600 (1 hour) ? And I have turned all the debugging on again.

OK, setting “Consider mains powered devices unavailable after” to 1 hour (3600 seconds) has had a noticable improvement - the Tradfri repeaters now seem to be “unavailable” for 1 hour, but still sticking to their 4 hour cycle.

I am still no nearer to knowing

  • why there is a four hour cycle;
  • why all the Tradfri repeaters show unavailable at the same time; or
  • why there are so few data points when they are available (the horizontal lines generally indicate a gap between data points, not that the LQI stayed at the same value).

More research required …

Getting better all the time …

  • I upgraded to HA 2023.8 yesterday; and to 2023.8.1 at 12:35 today

  • Changed “Consider mains powered devices unavailable after” to 600 seconds

and the result is :

A few short 10 minute dropouts every 4 hours; and massively more data points logged.

Doh !! Having my HA server and zigbee coordinator located in my study (second bedroom) means that the link between coordinator and living room (where most zigbee devices are located) is critical to the entire network. Doesn’t matter how good the rest of the mesh is, if messages are struggling to get to/from the coordinator, or all the traffic is causing a bottleneck on one link.

So I have now installed proxmox and HA on an OptiPlex 7050 micro PC, restored HA backup on to it, and relocated it to the living room nearly centre of my little apartment. A couple of repeaters are already in obvious locations and I now have 3 more repeaters to place where they should make a good mesh.

I’m curious that this topic has been viewed 135 in 3 weeks … but no replies except me adding more info.

Does this mean that I am the only person this is happening for, and no-one has any suggestions ?
Or that no-one is looking at LQI ? This seems surprising given the number of posts about device unavailability in zigbee

The changes I made above have certainly improved the situation, but all my Tradfri signal repeaters and light bulb still go unavailable at the same time, and show as back online at the same time. My Sonoff repeater does not show this behaviour.

Well personally I have exclusively Tradfri Zigbee (repeaters and bulbs etc) on Z2M, not ZHA and see none of these issues you describe. Possibly same for a lot of others.

Good to see you documenting your progress though. :slight_smile:

I haven’t had any issues with my network(s), but when I look at my two repeaters I see that they have no RSSI or LQI information for extended periods of time. I haven’t noticed (or known how to check) if that is causing any problems, though.

There’s several threads on the forum looking at Zigbee issues, however they tend to be for more unusual or specific issues.

I find IKEA TRÅDFRI switches (UK 240V) work well as Zigbee mesh repeaters, although I’d suggest enabling firmware updates in ZHA in case that is an issue. Updated firmware certainly fixed my issues with the 5-button remote.

My own setup moved from the Sonoff USB radio with a great aerial to a Yellow with only a PCB antenna, so has always included powered Zigbee devices in the same room to ensure the best mesh possible.

If this helps, :heart: this post!

I farted around with ZHA for a long time trying to better understand it and how to monitor it. You too have spent a lot of time trying to analyze the LQI and connections between devices with what appears to be net ‘little’ success for amount of effort. To some extent I feel bad, however my path and recommendation to you was to relatively slowly move to Zigbee2MQTT. Happy place relative to ZHA. If I understand your setup, you too have Proxmox, it is pretty easy to stand up a Debian VM running Docker and install Z2M (and if needed a Mosquitto MQTT server). Buy another TI 26xx based coordinator dongle and add one or two devices and monitor this for a while (including one of your Ikea routers) (sorry to be pedantic, however there is no such thing as a ‘repeater’ in Zigbee, the Ikea devices are acting as routers. To really understand mesh networking, it is important to grok the distinction. )

Good hunting, hope you find a solid Zigbee setup, I have with Zigbee2MQTT.

P.S. from my experience with ZHA, changing the Home Assistant ‘availability’ timeouts has no useful effect on whether a zigbee device is active on a network or whether you are seeing a true view of the network.

It’s rather difficult to manage complex networks of autonomous nodes - (n*(n-1)) meshes always give me headaches so my approach has been not to look too closely at the detail, but trust the individual nodes to “do the right thing” and just help them see enough RF energy from neighbours.

Adding and moving routers is basically what I’d do as well, which makes me wonder about an external factor like a RF noise source…

Repeated times between failure of exactly 600 seconds does sound like a protocol issue though - hence my IKEA firmware update thought (ZHA + TRÅDFRI + SilLabs coordinator works well here).

@donburch888 I do not see a mention by you of the type of USB cabling you are using between your Dell and the Zigbee coordinator. As is mentioned in this thread, poor USB 3 shielding is a documented cause of issues with Zigbee and other 2.4 Ghz devices close to the source of the interference. Putting your zigbee coordinator on a USB 2.0 hub between the coordinator and the Dell with a meter of cable and distance at least between the coordinator and USB 2.0 hub might be something to do. If your HA server setup looks anything like the referenced video, you have bigger issues :wink: .

802.15.4 mesh networks have been around a long time in tech years and are solid and were designed to not require management of the level are required of say a wifi network. If they had the issues that you are seeing with your ZHA setup, divisions like Signify, Ikea and Aqara would not be selling the 100k’s of units they are.

If your zigbee ‘map’ shows only a single router device connecting your coordinator to the rest of your zigbee mesh, that is probably a very weak link in a setup. Even on my ZHA setup when I had it running, the coordinator was connected to multiple routers and end devices. Several of them with stucco walls between. Below is a picture of Z2M network map that is healthy and does not have single point of failure between coordinator and remainder of network:

I enabled the LQI and RSSI entities for all my zigbee and wi-fi devices … but only my wi-fi devices are reporting a numeric value for RSSI. All my zigbee devices report RSSI as “unknown” (except for any that are powered off, which are “unavailable”), and have never reported any numeric value over the past month.

I am thinking now that maybe the Ikea repeaters are working, routing data packets correctly; but that the ZHA integration is erroneously flagging the Ikea repeaters as unavailable. Your post seems to confirm that the end devices are working as expected; and that it is the logging of LQI and RSSI which is the issue.

I have not yet found the setting to enable that. UPDATE: Done, and the Tradfri devices (only a couple of months old) firmwares are up to date.

I have kept HA, HAOS and zigbee integrations up-to-date. I updated the Sonoff Dongle-P firmwares when I removed and re-biult my ZHA mesh.