Seeking advice to get to reliable ZigBee, incl. low level diagnosing

Before I lose my mind completely, I’d like to ask for some advice from the community, and maybe guidance on how to get to lower-level diagnosing of my ZigBee network.

I’ve been reading posts such as this great one on getting started / common problems with ZigBee but as times go by I don’t feel I’m getting to a better place.

I’m running Home Assistant on a Raspberry Pi 3B+ , using a Sonoff 3.0 Dongle, running ZHA.
I’m in a three story house. HA + Dongle are at the lower level and I’ve been adding devices all over the place, let’s call floors 1, 2 and 3 (american style).

The devices on floor 1 for the most part are working OK, a couple in floor 2 are fine but floor 3 is super flaky.
My first reaction has been adding mains-powered devices. I’ve been adding more and more of them, caring to place them as close as possible to one another, trying to build a “path” of routes between them.
Most of the mains powered devices are embedded relays, such as the Sonoff Zbmini2. I have several End units as well. In total, I have a mixture of Xiaomi Home devices (door sensors, presence sensors, relays), Sonoff zbmini relays, and a couple Tuya relays as well.

I was hoping that as I add more mains-powered relays, the network would become more reliable, but I’m experiencing almost the opposite.

I’ve read about interference with 2.4Gz WiFi, but unfortunately I’m a bit limited on what I can do about that since I rely on my WiFi for other uses at home and the WiFi router I’m using (it’s a 4 unit TP-Link Deco X20) doesn’t allow changing the channel, which it chooses automatically.

To make matters a bit more complicated, I also have a Xiaomi Gateway 2 on Floor 2, which uses its own ZigBee network, also integrated (via Xiami Aquara integration) to HA.

As an engineer, but a bit of a newbie on ZigBee, I started to go down the route of trying to find diagnosis tools to be able to ping inidividual devices and understand if there is anything that is misbehaving, or where the problem is coming from exactly so I can address it, but I’m not finding that very easy to get into so far - currently exploring ZHA Toolkit.

Besides waiting for the network to fix itself, adding more and more mains powered devices, do you have any advice on how to start making some progress in achieving some stability in the network?

Thanks a lot,
Julian

Turning the co-ordinator OFF (and HASS) for over 60 minutes triggers a timer in Zigbee devices which effectively forces a recalculation of the radio mesh. If you’ve paired devices not in their final location, this might help.

I’d give up trying to apply Ethernet-type tracing to a mesh radio network. RF doesn’t propagate the way you think it does, so you’ll confuse yourself looking too closely - yes, a quick look for general gaps might help spot where to add more mains-powered devices (routers / repeaters), but no more.

Two networks on the same 802.15.4 RF frequency isn’t going to help. Turn the Xiaomi Gateway 2 OFF for 24h and see what happens. I suspect Xiaomi might be masking floor 3.

Oh, and make sure firmware updates are enabled. The IKEA 5-button remote works with the latest firmware.

Some folk report Z2MQTT works better with some devices than ZHA, but I suspect this is very variable.

If this helps, :heart: this post!

1 Like

Hi, you might also have a look at Guide for Zigbee interference avoidance and network range/coverage optimization

@adamantivm Since most of your questions are relatively generic I posted my reply to the existing guide thread so it can help others too → Guide for Zigbee interference avoidance and network range/coverage optimization - #24 by Hedda

2 Likes

Thank you all for the awesome and encouraging responses.
I’ll keep posting what I end up finding helpful and discover.
I am also happy to share as much detail as anyone wants to hear about my specific set-up, just don’t want to overwhelm with unnecessary details.

These are generally the things I’m going to try next:

  • Try the 60-minute coordinator off trick suggested by @FloatingBoater
  • Grudgingly do these type of 24hr removal of potentially interfering devices (Xiaomi Gateway), one of the Decos is also kind of on the way in the 2nd floor - this will be faced with a good amount of protest from the family
  • (if I can succeed at learning how) re-purpose an CC2531 USB dongle I have to do packet sniffing around the house
  • Continue to add more mains-powered devices trying to form alternate pathways around the house reaching the 3rd floor (e.g.: climbing through the sides, farther away from WiFI routers)

One note about my interference situation, for others facing similar things + the curious.

The main Zigbee network based on the Sonoff 3 is running on channel 15.
The TP-Link Deco has selected channel 8 for the 2.4 WiFi
The Xiaomi Gateway 2 ZigBee is on channel 20.

Update on this topic: after being on the verge of ditching ZigBee altogether, I was finally able to get to a stable state - at least for now, one week and counting.

There are all the things I did, in descending order of what I consider did the most difference:

  1. I used a WiFi Analyzer to discover sources of interference and moved Zigbee routers to alternate locations to avoid them.
  2. I upgraded the firmware on the Sonoff USB Dongle
  3. I added Zigbee devices in their final location, using Settings > Devices and Services > Zigbee Dongle “Configure” > “+ Add Device”
  4. I managed to move the 2.4Ghz channel on my mesh wifi routers
  5. I used a Zigbee sniffer to understand in more detail how were packages being routed and dropped

I initially did the following, which didn’t make any difference at all

  1. Turn off HASS for > 60 mins to make all Zigbee routers reconfigure
  2. Turn off the Xiaomi gateway for a few days to avoid interference with it (it is now ON again and not causing any trouble)

These things didn’t help but also got HASS working worse than before, so I disabled them again after doing some fruitless tests with them

  1. Installed and enabled ZHA-Toolkit to do things like pinging devices
  2. Enabling debug logging for several ZHA components

At this point, things are much more stable than before.

I’d be happy to share details on any of the things I tried and learned along the way if it helps anybody.

Thanks everyone for the help.

BTW for anyone landing here and wanting to see more info, I highly recommend thoroughly reading @Hedda 's guide, including a thorough and thoughtful response to my conondrum here: Guide for Zigbee interference avoidance and network range/coverage optimization - #24 by Hedda

That’s the best source of comprehensive information about getting ZigBee working for you that there is, hands down

1 Like

Its also worth noting that aqara devices (any maybe some other manufacturers) are notious for not changing their zigbee routing once paired. I notice this if (say) a smart plug has been accidentally turned off… any aqara devices routing through that to the coordinator also go offline.

As I added quite a few aqara end devices early on (mostly contact sensors and temperature/humidity), I found it helpful to re-pair these once my network had more devices (and repeaters) on the network.

I also live in a town house, but its a new build and the walls are paper thin. Interestingly tho, my lowest devices (by LQI score) are not necessarily the furthest away, as the
sonoff coordinator seems fairly directional depending on how you setup the external antenna.

My network is pretty stable, with the worst LQI score around 120, and most devices around 150-250.

I also use the ZHA toolkit to monitor my network and send my phone a notification if a device has dropped off the network, like so:

(The script below is triggered by a time based automation that runs every hour, and I’ve also excluded some devices from being reported, like the IKEA remotes, as they can be very lazy and appear offline when they are not).

alias: Check for ZHA offline devices and notify
sequence:
  - parallel:
      - - wait_for_trigger:
            - platform: event
              event_type: zha_devices_ready
        - repeat:
            for_each: "{{ wait.trigger.event.data.devices }}"
            sequence:
              - if:
                  - condition: template
                    value_template: >
                      {{ (repeat.item.available | bool() == false) and ((('IKEA
                      of Sweden TRADFRI on/off switch' not in repeat.item.name)
                      and ('IKEA of Sweden Remote Control N2' not in
                      repeat.item.name) and ('_NOT-USED' not in
                      repeat.item.name)))}}
                then:
                  - service: notify.mobile_app_ro_p30_pro
                    data:
                      title: Device Offline
                      message: >-
                        {{ "(%s) LQI: %s" % ( repeat.item.user_given_name,
                        repeat.item.lqi ) }}
      - service: zha_toolkit.zha_devices
        data:
          event_done: zha_devices_ready
mode: single

1 Like