Couple of days ago, I noticed that all entities coming from zigbee devices are becoming unavailable every X minutes (varies from few minutes to few hours), and then getting its values after ~10 seconds. This happens to all such entities at the same time and has been happening for the past 3-4 days. I don’t think I have changed or upgraded anything in the days before I noticed it.
I have HA (was 2024.10, upgraded to 2024.11.2 yesterday), Zigbee2mqtt and Mosquitto, all running in Docker containers. Digging into the z2m logs, i noticed this:
[2024-11-16 22:47:13] debug: z2m:mqtt: Received MQTT message on 'homeassistant/status' with data 'offline'
[2024-11-16 22:47:29] debug: z2m:mqtt: Received MQTT message on 'homeassistant/status' with data 'online'
Then, in the Mosquitto logs i found this:
2024-11-17 14:28:38: Received PINGREQ from 5vZtSyPpf2TLqj2qyKZZUb
2024-11-17 14:28:38: Sending PINGRESP to 5vZtSyPpf2TLqj2qyKZZUb
2024-11-17 14:28:44: Received PINGREQ from mqttjs_13d8b3d3
2024-11-17 14:28:44: Sending PINGRESP to mqttjs_13d8b3d3
2024-11-17 14:29:44: Received PINGREQ from mqttjs_13d8b3d3
2024-11-17 14:29:44: Sending PINGRESP to mqttjs_13d8b3d3
2024-11-17 14:30:08: Client 5vZtSyPpf2TLqj2qyKZZUb has exceeded timeout, disconnecting.
2024-11-17 14:30:08: Sending PUBLISH to mqttjs_13d8b3d3 (d0, q0, r0, m0, 'homeassistant/status', ... (7 bytes))
2024-11-17 14:30:18: New connection from 172.17.0.1:42324 on port 1883.
2024-11-17 14:30:18: New client connected from 172.17.0.1:42324 as 5vZtSyPpf2TLqj2qyKZZUb (p2, c1, k60).
2024-11-17 14:30:18: Will message specified (7 bytes) (r0, q0).
2024-11-17 14:30:18: homeassistant/status
As I understood, all clients send PINGREQ to Mosquitto to keep the connection alive, which is happening most of the time. I can see two clients - 5vZtSyPpf2TLqj2qyKZZUb
and mqttjs_13d8b3d3
, of which I assume the first is HA and the second z2m.
We can see that at some point, the HA one stops sending the ping package, and after the keep alive timeout has passed (i have the default 60 seconds both in HA and Mosquitto), Mosquitto disconnects the client. 10 seconds after that, the client opens the connection again.
Does anyone have any idea where to look next and how to find out why is this happening?
I looked at the CPU and memory usage in HA, and there’s a small spike in the CPU usage whenever this happens, but even then it does not go above ~10%.
I’m running the system on a Raspberry Pi 4, with an SSD.
I also found this similar scenario, but in my case i could not relate the moments when Mosquitto is saving the database with the moments the entities are becoming unavailable.