Zigbee button events mysteriously breaking

I am really not sure how to investigate or debug this - or even how to categorize it. Any pointer in the right direction of finding heads or tails of this would be greatly appreciated.

Background

For the last six or seven years I have been running the same base setup: Dockerized Home Assistant interacting with a Zigbee network run by a Conbee II stick via the Deconz integration. The network consists of elements from a bunch of different vendors - such as IKEA, frient, and Aqara.

The primary way we interact with Home Assistant is via scheduled automations and IKEA I/O dimmer Zigbee buttons - implemented as automations responding to Deconz events, with logic toggling devices directly or via scenes, dictated by device states (ie “button A push: if light B is off, turn it on, otherwise turn on light C”).

Generally I only update things quite late - true both for Home Assistant and Deconz, priorisizing stabilily/security releases over featur ones.

Current version of Home Assistant in the docker container is 2024.2.5, Phoscon is 2.25.3 and the Zigbee network is on channel 25 to avoid wifi4 networks.

Issue

A month or so ago I started seeing really strange behaviour. As I recall, this did not coincide with any changes to Home Assistant configuration or updates. The first symptom was lots of our “off” buttons no longer having the usual visible effects (ie press the off button and the lights did not turn off as expected).

At first I assumed we had been really unlucky with lots of batteries depleting at the same time, so I gave them a fresh charge when I got around to it. This did not have any effect, so until I managed to set aside the time, we operated lights & everything else with direct control via the mobile app. When I did find the time to investigate, I noticed just how strange the behaviour actually was, but monitoring entity states via the mobile app as I pressed one of the borked off-buttons:

I isn’t that the event is not triggering, the press is just handled in a really strange way.

Example

Button Kitchen Door is an IKEA I/O dimmer switch. One automation responds to the deconz event of the on button being pressed by activating the Kitchen On scene, while another automation responts to the deconz event of the off button being pressed by activating the Kitchen Off scene.

At least that is how it has been configured for years and that is also what happens when the automations are triggered via the Home Assistant dashboard.

Unfortunately that is not what happens when you press the physical button. With the logbook open, pressing the physical button causes an absolute mess of activity:

  1. One of the three kitchen lights turn on - both reported by the log and observably. The “turned on” event has no “triggered by” section as it normally does when triggered by the scene activation which in turn would have been triggered by the automation, triggered by the deconz event.

  2. A flurry of devices in other rooms report turning on in the log, but observably their state does not actually change. These log events also do not report “triggered by” anyhing.

  3. In the log order, only after all of the above does the log entry from the deconz integration arrive: Button Kitchen Door “Short press” event for “Turn on” was fired.

  4. Immediately following in the log are first the notice of the automation being triggered by the event ‘deconz_event’, and then the activation of the Kitchen On automation, triggered by the automation, triggered by the Deconz event.

  5. While the manual activation of the scene would now normally be followed by “Kitchen Light [A-C] turned on triggered by service Scenes: Activate”, none of those are present. In stead all the unrelated previously “turned on” devices get “turned off” events logged - still with no triggering information.

The end result is one of the three lights have been turned on.

The flow for the “off” button is exactly the same - except in stead of unrelated devices first reporting an “on” event and then an “off” one, they first report “off” and then back on - while still not actually changing state. The one of three lights which turns on with the “on” button does not turn off with the “off” button.

Again: Triggering either the “on” or “off” button automations via the dashboard works just as it always has - no mess in the log, all lights responding as expected.

Looking at the deconz log while pressing the button reveals no surprises:

  1. The press event is received.

  2. If it was the “on” button then a “set state” event for the one light is received - if it was the “off” button then nothing happens after the button press event.

  3. No trace of all that unrelated on/off state reporting from the home assistant log.

Attempted remedies

I have updated both Home Assistant and Deconz to latest stable versions (listed in “Background”). I don’t see a way to explicitly update the Deconz integration, so I assume it is on latest for the given Home Assistant version. Even though they have not caused me any issues in the past and the Deconz log did not reveal any issues, I have tried to remove my Aqara thermostats from the network - as I read they might be a source of noise on the network.

None of this has had any effect on the issue.