After suffering frequent intermittent errors with TRVs, I tried to study the Zigbee network configuration, but found it hard to understand how the network configuration works and how to access it through HA.
My Zigbee network
I have a ZHA network with
ten Aqara E1 TRVs
eleven generic Chinese TS0601 TRVs
six more battery-powered end devices such as door and window detectors.
The network is in a house with thick walls, so has has an ample number of Zigbee repeaters:
six IKEA TRADFRI repeaters
an MHCOZY 4-way mains-powered relay switch, used to operate the zone valves
ten more Zigbee mains-powered smart plugs and switches of various brands that also act as repeaters.
The HA tools
⌠are hard to useâŚ
For Zigbee devices, in the section reconfigure â manage network, there is some information, including a list of âneighboursâ, each with a number that I take to be a measure of signal strength? There is however, no indication of which neighbours are actually being used to connect to the network. UNANSWERED
There does not seem to be any way of simply asking a Zigbee device what next device (repeater) it is connected to, or which are connected to it (unless that is the definition of a âneighbourâ?). The device main page always just says the device is connected to the main controller. UNANSWERED
The only way of analysing the topology seems to be with the âvisualisationâ feature â a great idea but hard to use. The Visualisation feature in the latest version of HA no longer has a search function. Finding a given node is therefore difficult. My diagram fills more than a page. If I zoom in I lose context. If I zoom out everything get obscured by labels. There used to be a search box, but that has been withdrawn for some reason. How do you find a node now? UNANSWERED
The 'physics feature no longer works. I think that used to label the lines with a signal strength and colour them, but now they are all blue with no numbers. **ANSWERED: instead of the physics feature, one can hover over a line.
It is not even clear that the visualisation feature is a correct representation of the actual topology. I found a TRV connected for no apparent reason to a repeater two rooms away rather than one in the same room. I tried turning off the repeater that was further away to see what would happen. Several minutes later the TRV was still online and still responding fine, but the visualisation feature (after several refreshes) still claimed that it was connected to the powered-off repeater! UNANSWERED
If the visualisation is to be believed, the network IS reconfiguring itself dynamically. When I recently added a few more repeaters, they were âabsorbedâ into the network within a couple of hours. That is to say, they had taken a role in the network by connecting to other devices. However, the logic looks bonkers! Interconnections look arbitrary rather than based on each device connecting to the next nearest node. UNANSWERED
Wondering if the Zigbee network was not updating quickly enough I tried changing the network refresh rate temporarily from 7200 seconds to 60 seconds. This had the unexpected effect of taking all the battery-powered devices offline altogether! I changed it back, then they rejoined after a few minutes. What happened here? UNANSWERED
I have looked in vain for ways to force or encourage a device to connect to the nearest repeater. Most sources say remove the device and then add it back in, but it seems (if the visualisation is to be believed) that the re-added device always insists on connecting to the same repeater as before, even when that repeater is turned off! SOLVED
When I tried removing, physically moving, then re-adding a repeater with a new name, it took all the end devices with it. Apparently none of them tried to find a closer repeater; they seem to âstick with what they knowâ even after a name change! UNANSWERED
When the nearest repeater goes offline, the 4-way zone relay, goes offline too, even though there are other repeaters well within range. UNANSWERED but no longer relevant; I swapped it for a WiFi switch.
Yes it is a USB SONOFF controller stick, placed about 2m away from the computer (a Dell OptiPlex 7080 Micro), which does not have a WiFi card fitted.
There are no other emitting devices nearby. I am not using Bluetooth.
The house has a TP-Link WiFi mesh whose nodes are nowhere near the computer or Zigbee controller. The computer is connected to the local network by Ethernet cable.
Older Aqara devices (using Zigbee 1.2 instead of Zigbee 3.0) were known to be hard headed and would try to maintain a connection to the first device you paired them to. Remove them from the network, find the closest zigbee device for each one, then click âadd devices via this deviceâ in the router device page. That should sort out your issue.
Hover over the lines. You will see the signal strength.
For what itâs worth, Iâve got a couple of bulbs which literally blew up and have been disconnected since last winter. ZHA visualisation still shows a connection with an LQI of 45 to the coordinator and 131 between each other after all these months!
They should not be because the Zigbee controller is set to channel 25, which should be above any WiFi channel.
You seem to think it is a connectivity problem. Is that just a hunch or have you seen this sort of problem with TRVs before?
I donât think it is a connectivity problem because (mostly) the TRVs are online and a manual setting is immediately reflected on the dashboard. My own automation tries five times at two minute intervals if a new setting is not confirmed by the actual reading, so that should overcome intermittent communication problems anyway.
That is very helpful. Will try that in situations that look particularly bonkers. After doing that, will it still re-assign the connection if the node goes offline, or a new node is added with stronger signal strength?
Following comments on this page and research elsewhere, I gathered some tips that might or might not be correct. Working with the as yet unverified theory that the problem is with the Zigbee network rather than the TRVs (or HA software), I today tried the following:
I moved the Zigbee coordinator to the centre of the house. It is connected by an extension cable to the computer running HA and about 2m from the box.
One source recommended connecting the SONOFF Dongle to to USB2.0 instead of 3.0. When I did that the computer crashed and could not be restarted! That is weird and inexplicable, so I put it back to USB3,0 and it works again.
I think that recommendation was based on USB3.0 emitting more interference than USB2.0 â is that true? Anyway the USB ports are next to each other so changing socket does not removed it from the emission anyway. I assumed that not replugging, but using an extension lead is the sensible solution.
One source says that Aqara TRVs âpreferâ middle or lower channels to the higher ones ⌠so I tried changing the Zigbee channel from 25 to 15. This channel is recommended by several sources as being between WiFi channels 1 and 6 (and well away from 11)
In ZIgbee network settings, I reduced the wait time for battery devices fro 7200 to 300 seconds (5 minutes). My theory is that the Visualisation diagram was mostly wrong because it was 2 hours out of date. Still havenât really understood this. I previously tried an even lower number, but then it cut off all Zigbee devices.
The house already has at least one Zigbee repeater in every room - either an IKEA Tradfri or a smart plug of some description. Can there be too many repeaters in a room??
I rebuilt the Zigbee network by adding repeaters and devices using the âAdd devices to this deviceâ feature for a manual configuration. I started at the coordinator and added the nearest repeaters. To those I added the next nearest. Finally, I added the TRVs and other battery Zigbee devices to the repeater in the same room.
It is a 19C house with thick walls. By checking the LQI levels I was able to confirm my theory that transmission is better vertically through a floor/ceiling than horizontally through walls. I cannot find any sources on what is an acceptable LQI and whether it measures the link to the next device or all the way to the coordinator?
Result?
None of this made any difference at all.
It all works as it did before, which is to say most of the time but not reliably enough for a satisfactory automatic heating system.
Tearing our hair now! At wits end!
Any ideas? Is it even a Zigbee network problem? TRVs are giving the biggest difficulties â devices such as door closure sensors do not seem to be as unreliableâŚ
Open questions
(Some new, some reformulationsâŚ)
Does it make any difference whether a SONOFF Zigbee 3.0 Dongle is plugged into a USB2.0 or3.0 port?
Why would plugging the SONOFF Zigbee 3.0 Dongle into a USB2.0 port cause a Dell Optiplex computer to crash and not restart?!?
Does rebuilding the network topology manually mean (a) it stays like that forever and a section will fail if a repeater fails; (b) it will reconfigure itself if there is a failure but return to my settings when a failed device returns; (c) it will go ahead and reconfigure itself any time anyway and ultimately ignore my topology.
Can I select a Zigbee device and have HA tell me (a) what device this one is connected to upstream (the device page always indicates the main coordinator), and (b) what devices are connected to it downstream (there is something called âneighboursâ but I donât know if that is the same thing? Yes, in theory these connections are shown on the Visualisation diagram, but that is so busy I cannot read it easily, and anyway I do not trust it for reasons given in the original post.
How exactly does HA decide whether a battery-operated TRV is online (available) or offline (unavailable)? And if a device is âunavailableâ does that affect the way or whether set temperature commands are passed on?
Is it true that Aqara (or any other make of) TRVs âpreferâ middle or lower channels to the higher ones? Why?
I get some issues with my temperature sensors also.
I canât find it now but it was fairly recent, a sensor reported a flatline for a few hours.
And a few days before another temperature sensor did it.
I believe itâs a lot due to them being end devices and possibly because they are end devices at the ends of the homes.
Itâs not very common that you have a router outside of the window, so that means the TRVs are always at the ends.
Door sensors could be closer to middle of the home but some of them probably are at the outer edges too.
But I believe they have one distinct difference, they have a wakeup action.
There is something physical done that can be used to wake them up.
TRVs and temperature sensors are more snoozing than anything else. Temperature sensors has the advantage of being on a schedule at least.
The TRV is something we expect to respond when itâs a sleep, possibly in the edge of the home with a less than ideal reception.
Not sure about the âsleep modeâ theory, but it is the case that battery-powered devices (is that what you mean by âend-deviceâ?) have a preset time before they are considered unavailable. In your networks settings find âConsider battery-powered devices unavailable after (seconds)â. This is typically set by default to several hours, though I now reduced mine to 300 seconds (5 minutes). I do not know if this wait time also applies to the device becoming available again. I do not know if it is the interval between polls from the coordinator, or the interval between spontaneous reports from the device. We need a Zigbee expert!
The location in the home is less relevant than the proximity to the nearest repeater. You need enough to cover everywhere there are devices.
True.
But your not going to get a good mesh at the edge of your home unless you place several routers next to each other.
I donât know for sure but my experience is that itâs not just the closeness to the router, sometimes they route to other routers that is further away.
Possibly because the router disconnected or the end device did and accidentally connected to the wrong one.
But I believe a strong mesh around the end device makes it less likely for them to drop off or not respond.
If you can then test moving one TRV to the center of a room with lots of routers and see if it responds better there.
Obviously you would to replace the TRV with a manual TRV during the test.
Only advice I can give you is not to do the above beyond short-term testing. Youâll chew through your batteries on those devices.
Before you revert your change, try this: walk up to one of those unavailable TRVs and (short) press the button on it. Is it marked as Available before the 5 minutes are up? If not, then they are indeed dropping off the network (unless the battery hasnât dropped considerably already).
Much as I love Aqara stuff, I havenât heard good things about those TRVs. It might be time to shop around for a replacement.