Seeking advice to get to reliable ZigBee, incl. low level diagnosing

Update on this topic: after being on the verge of ditching ZigBee altogether, I was finally able to get to a stable state - at least for now, one week and counting.

There are all the things I did, in descending order of what I consider did the most difference:

  1. I used a WiFi Analyzer to discover sources of interference and moved Zigbee routers to alternate locations to avoid them.
  2. I upgraded the firmware on the Sonoff USB Dongle
  3. I added Zigbee devices in their final location, using Settings > Devices and Services > Zigbee Dongle “Configure” > “+ Add Device”
  4. I managed to move the 2.4Ghz channel on my mesh wifi routers
  5. I used a Zigbee sniffer to understand in more detail how were packages being routed and dropped

I initially did the following, which didn’t make any difference at all

  1. Turn off HASS for > 60 mins to make all Zigbee routers reconfigure
  2. Turn off the Xiaomi gateway for a few days to avoid interference with it (it is now ON again and not causing any trouble)

These things didn’t help but also got HASS working worse than before, so I disabled them again after doing some fruitless tests with them

  1. Installed and enabled ZHA-Toolkit to do things like pinging devices
  2. Enabling debug logging for several ZHA components

At this point, things are much more stable than before.

I’d be happy to share details on any of the things I tried and learned along the way if it helps anybody.

Thanks everyone for the help.