ZHA + SkyConnect some zigbee devices stop working

Hi!

I’m here today with an issue that’s been annoying me for weeks, to which I’ve found no fix. I have several zigbee devices working on my HA with SkyConnect: Aqara button switches, Aqara temperature sensors, Phillips dimmable switches and some TRADFRI bulbs.

Seemingly at random some of the Aqara devices will stop working, Phillips and TRADFRI ones keep working. The only way to make them work again is to go into ZHA’s Add Device and re-add them, by doing this they pick up their old name and config, and they keep working.

A couple years ago I was using an aliexpress zigbee radio usb, a Pi3 and Zigbee2MQTT. I moved to a Pi4 with SkyConnect and after a few weeks this started to happen. I’ve just moved to a Pi5, with SkyConnect too, and the same thing keeps happening.

I’m not sure how to debug this, the only way I’ve found online to get some sort of debuggin with ZHA is to enable Debug Logging, do a few actions on the devices, stop logging, and that downloads a log text file which I can inspect. After these devices become unresponsive, this is what I see in the log:

2024-04-16 08:02:01.188 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x708E](lumi.remote.b1acn01): last_seen is 35093.77426600456 seconds ago and ping attempts have been exhausted, marking the device unavailable
2024-04-16 08:02:01.189 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x708E](lumi.remote.b1acn01): Update device availability -  device available: False - new availability: False - changed: False
...
2024-04-16 08:03:58.211 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xD380](lumi.remote.b1acn01): last_seen is 38757.394718408585 seconds ago and ping attempts have been exhausted, marking the device unavailable
2024-04-16 08:03:58.212 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xD380](lumi.remote.b1acn01): Update device availability -  device available: False - new availability: False - changed: False
...
2024-04-16 08:04:02.241 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x76D3](RWL022): Device seen - marking the device available and resetting counter
2024-04-16 08:04:02.242 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x76D3](RWL022): Update device availability -  device available: True - new availability: True - changed: False
2024-04-16 08:04:03.187 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x9122](lumi.weather): Device seen - marking the device available and resetting counter
2024-04-16 08:04:03.188 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x9122](lumi.weather): Update device availability -  device available: True - new availability: True - changed: False

I’m not sure what could be going on with these devices, it’s specially odd that after re-adding them they pick up the old config and keep working just fine. Could this be because of the TRADFRI bulbs? My understanding is that these act as routers, so the devices that stop working themselves might not be hanging from the SkyConnect. I’ve also checked LQI and it’s good, also changed all the batteries.

Any help is much appreciated!

Have you checked the Aqara devices on the GitHub issues page?