Hey all,
For a few months now, my EcoSmart BR30 light bulbs which are in ZHA groups per room have been unreliable, occasionally seeming to lock up and not responding to commands to turn on/off or to change color/brightness. If I wait a few seconds and try again, usually it’ll go back to normal. It seems to be worst when I do multiple commands in quick sequence, like turning off right after I turn on. This issue doesn’t seem to reproduce when I control an individual bulb within the group, even when toggling really quickly. These bulb groups were working very reliably for a few months after moving into this house in June and getting these new bulbs all set up, but maybe some time around September or October this problem first started. I don’t know exactly when, unfortunately today has been my first free day to sit down and reach out for help. It has persisted through several HA updates and of course restarts.
I am running HA 2021.1.5 in a Docker container on a Linux desktop, using ZHA for Zigbee, with the HUSBZB-1 USB stick for Zigbee and Z-Wave.
My first suspicion was that this was a reception problem, as the house is large and old and we had WiFi reception issues due to some really thick and heavy walls, and the server with USB stick is in the basement. However, I tried a few things to address this with no change:
(1) I attached my Zigbee USB stick to a USB extender and taped the stick high on the wall, away from my computer.
(2) I found that the Zigbee channel that the stick was using overlapped in frequency with WiFi 2.4GHz channel 1, and at the time my WiFi was using channel 1 for 2.4GHz (on auto mode), so I configured my WiFi auto channel selector to exclude channel 1 and only choose from channels 6 or 11. Analyzing the nearby WiFi signals shows that none of my neighbors have strong channel 1 signals nearby that should be affecting anything.
Some other evidence further suggests that reception is not the issue:
(1) The overhead bulbs in my basement, right next to my Zigbee USB stick with no walls or any other form of interference between them, exhibit the same issue just as often as anywhere else in the house.
(2) When the issue happens, it applies the same to all bulbs in the group. I’m not familiar with the deep implementation details of Zigbee multicast, but if there were reception issues I’d sometimes expect some members of the group to receive the message and others wouldn’t, even if they’re in the same room.
My current suspicion is some sort of software issue, either in the bulbs themselves (but why would it start after a few months?) or in ZHA (I wish I had taken more notice of if this happened after a HA upgrade). I have turned on ZHA debug logging, reproduced the issue, and captured this: http://sprunge.us/K1k0ts
If you search for APS_DEFRAG_DEFERRED
, I believe that’s the timing where the issue occurred. Previous times that I’ve attempted to debug this using the logs, I’ve seen the same status code come up when the issue happens.
Around this same time, I noticed that my hand-rolled adaptive/Circadian lighting setup started to turn on these bulbs and they’d be an incorrect lighting temperature, and then later on they’d be updated, which leads me to believe that the “turn on” command for these bulbs under the hood ends up being one which turns the light on and then another that sets the temp/brightness, and that often the first command would succeed then the second would fail.
What do you think? Thank you in advance for your help!