Two philips hue devices kicking each other off the zigbee network

I have recently bought my second Philips Hue device, a smart plug. It connects fine via ZHA… but about once a day it goes offline and takes my previously-stable Hue Tap Dial with it. Only a reboot seems to solve the problem. It will last for days otherwise.

The tap dial is fine as long as the smart plug is not plugged in. It’s been stable that way for weeks, both before and after I discovered this problem. But I have no idea how to diagnose this problem.

In the HA event log for the switch there’s nothing. It doesn’t even know that the device has stopped connecting (still get a little green LED when I press a button on the device, too). In the HA event log for the smart plug it gives me a specific date and time when the status became unavailable, but there’s nothing corresponding to that in the HA logs. About 25 minutes before I have some of these:

2024-01-12 23:16:04.391 WARNING (MainThread) [zigpy.zcl] [0x3175:1:0xfc00] Unknown cluster command 0 b'\x04\x00\x000\x00!\x00\x00'
2024-01-12 23:16:05.190 WARNING (MainThread) [zigpy.zcl] [0x3175:1:0xfc00] Unknown cluster command 0 b'\x04\x00\x000\x01!\x08\x00'
2024-01-12 23:16:05.977 WARNING (MainThread) [zigpy.zcl] [0x3175:1:0xfc00] Unknown cluster command 0 b'\x04\x00\x000\x01!\x10\x00'
2024-01-12 23:16:06.775 WARNING (MainThread) [zigpy.zcl] [0x3175:1:0xfc00] Unknown cluster command 0 b'\x04\x00\x000\x01!\x18\x00'
2024-01-12 23:16:06.880 WARNING (MainThread) [zigpy.zcl] [0x3175:1:0xfc00] Unknown cluster command 0 b'\x04\x00\x000\x03!\x19\x00'

But those are warnings not errors, and I have tons of log spam since my family still turns off the hard switches for devices regularly (working on it).

My setup: ZHA with devices from eglo and these two philips hues connecting directly to my Sonoff ZBDongle-E 3.0 USB Dongle Plus,EFR32MG21 + CH9102F on the raspberry pi 4 running Homeassistant.

Any advice or pointers would be very helpful.

What you have you tried to provoke the issue? Does interacting with the Hue Tap Dial in any way trigger the issue with the smart plug?

Unfortunately no. I can’t make it replicate other than by waiting a few hours. I have a suspicion that it’s due to zigbee leader elections.

E.g. my wifi drops out, causing my Shelby switches change to local-only mode, which actually cuts power when the switch is in the “off” position (as opposed to the normal mode, which triggers automations in homeassistant but leaves the light powered). That disconnects a central device from the zigbee network so the other nodes reorganize, and… these two get stuck in a race condition I guess?

It would also help explain the incredibly short battery life (1Mo) of my switch.

Short battery life is a common symptom of not having enough Zigbee Router devices or EMI/RMI (EMF and radio interference) that are causing bad reception and therefore the messages need to be resent which drains the battery. The Best practice is that all battery devices should connect to a Zigbee Router device close to it in the same room that will relay/forward the messages, and not connect directly to the Zigbee Coordinator.

Anyway, most common issues are either EMI or/and having too few Zigbee Router devices, and regardless of the root cause, before troubleshooting any deeper you should first really take some basic actions to avoid EMI and add more Zigbee Router devices that act as repeaters/extenders so try to follow all the best practice tips in this guide (as simply doing so removes most known Zigbee issues), see → Zigbee networks: how to guide for avoiding interference + optimize using Zigbee Router devices (repeaters/extenders) to get a stable network with best possible range and coverage

Also suggest upgrade to a later Zigbee Coordiantor firmware → https://darkxst.github.io/silabs-firmware-builder/ (more info here → GitHub - darkxst/silabs-firmware-builder: Silicon Labs firmware builder)

If still have problems after doing all of that then recommend factory reset the Philips Hue smart plug device and re-pair it to ZHA again for new interview, (there should be no need to remove it from ZHA).

Plus read all of these other ZHA documentation sections which can also be related: