My HA works fine, and running rock solid on a raspberry pi 3 for like 4 years now.
HA is run as a docker container plus this script GitHub - tribut/homeassistant-docker-venv: Run Home Assistant as non-root using the official docker image to run as user.
The os is a Raspbian, where Mosquitto is installed as a service.
Other docker containers are:
- postgres (where the database HA is using is hosted)
- cloudflare tunnel
- nodered
For some months now, I’ve been suffering from a very strange problem: often at 4:20 (or very close to this time) my HA loose connection to mosquitto for a bunch of seconds.
The result is that a lot of entities (zigbee2mqtt entities…) goes from their current status → unavailable → again the same status.
This is not causing serious troubles, since I don’t have automations that are fooled by this fault, but I really would like to understand what’s going on.
The problem occured on the following days:
- 07/08
- 08/08
- 09/08
- 28/08
- 7/09
- 12/10
- 14/10
- 17/10
- 19/10
- 6/11
- 7/11
- 10/11
- 11/11
- 12/11
- 15/11
- 18/11
I can’t find any recurring pattern/episode that could be a clue to find the culprit.
What I did on the 07/08?
Checking my git history, I see that, apart from a bunch of automations and script, I switched to 2022.8.1 version.
Anyway, I usually keep my system up to date, so I’m now running 2022.11.2 (and I have used 2022.9 and 2022.10 too).
What I have tried so far:
- update the raspbian
- update HA
- set a static IP Address (used to be dhcp’d)
- disable Fritz!box integration
- keep my cameras off
but now, I’m running out of ideas.
I will post a bunch of logs now, hoping that someone could help me to understand what’s going on here!!
Logs are from today occurrence, that happened at around 04:18
Home Assistant
Logs
2022-11-18 04:15:13.344 WARNING (MainThread) [homeassistant.components.media_player] Updating samsungtv media_player took longer than the scheduled update interval 0:00:10
2022-11-18 04:18:05.458 WARNING (Thread-8) [pychromecast.socket_client] [Bagno(192.168.1.73):8009] Heartbeat timeout, resetting connection
2022-11-18 04:18:05.863 WARNING (Thread-4 (_thread_main)) [homeassistant.components.mqtt.client] Disconnected from MQTT server 127.0.0.1:1883 (16)
2022-11-18 04:18:05.899 WARNING (Thread-9) [pychromecast.socket_client] [CC Casa(192.168.1.73):32051] Heartbeat timeout, resetting connection
2022-11-18 04:18:05.907 WARNING (Thread-7) [pychromecast.socket_client] [Cucina(192.168.1.190):8009] Heartbeat timeout, resetting connection
2022-11-18 04:18:06.156 ERROR (Thread-8) [pychromecast.socket_client] [Bagno([::ffff:c0a8:149]):8009] Failed to connect to service ServiceInfo(type=‘host’, data=(‘[::ffff:c0a8:149]’, 8009)), retrying in 5.0s
2022-11-18 04:18:06.361 WARNING (MainThread) [homeassistant.components.media_player] Updating samsungtv media_player took longer than the scheduled update interval 0:00:10
2022-11-18 04:18:06.382 WARNING (Thread-4 (_thread_main)) [homeassistant.components.mqtt.client] Disconnected from MQTT server 127.0.0.1:1883 (16)
2022-11-18 04:18:07.574 ERROR (MainThread) [homeassistant.components.template.light] Received invalid light is_on state: unavailable for entity light.ga_lampadina_comodino_sara. Expected: on, off, true, false
2022-11-18 04:18:07.681 ERROR (MainThread) [homeassistant.components.template.light] Received invalid light is_on state: unavailable for entity light.ga_lampadina_camera. Expected: on, off, true, false
2022-11-18 04:18:07.765 WARNING (MainThread) [homeassistant.helpers.entity] Update of media_player.tv_soggiorno is taking over 10 seconds
2022-11-18 04:18:07.768 WARNING (MainThread) [homeassistant.components.media_player] Updating samsungtv media_player took longer than the scheduled update interval 0:00:10
2022-11-18 04:18:07.772 WARNING (MainThread) [homeassistant.components.sensor] Updating energy sensor took longer than the scheduled update interval 0:00:30
2022-11-18 04:18:07.779 WARNING (MainThread) [homeassistant.components.light] Updating xiaomi_miio light took longer than the scheduled update interval 0:00:30
2022-11-18 04:18:07.784 WARNING (MainThread) [homeassistant.components.light] Updating xiaomi_miio light took longer than the scheduled update interval 0:00:30
2022-11-18 04:18:07.788 WARNING (MainThread) [homeassistant.components.light] Updating xiaomi_miio light took longer than the scheduled update interval 0:00:30
2022-11-18 04:18:07.791 WARNING (MainThread) [homeassistant.components.media_player] Updating spotify media_player took longer than the scheduled update interval 0:00:30
2022-11-18 04:18:08.004 WARNING (MainThread) [homeassistant.components.automation.z2m_last_update] Z2M: last update: Already running
2022-11-18 04:18:09.097 WARNING (MainThread) [homeassistant.components.automation.z2m_last_update] Z2M: last update: Already running
2022-11-18 04:18:09.102 WARNING (MainThread) [homeassistant.components.automation.z2m_last_update] Z2M: last update: Already running
2022-11-18 04:18:12.922 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [1856442056] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds. The system’s load is too high or an integration is misbehaving
Journalctl
Logs
nov 18 04:15:13 rpicasina3 containerd[498]: time=“2022-11-18T04:15:12.737954896+01:00” level=error msg=“get state for 1f560b6d7c76163b531ec37775422be980e26a6bf0fbef34be8cbb4f04e753e6” error=“context deadline exceeded: unknown”
nov 18 04:15:14 rpicasina3 containerd[498]: time=“2022-11-18T04:15:13.558128870+01:00” level=warning msg=“unknown status” status=0
nov 18 04:15:14 rpicasina3 CRON[29098]: pam_unix(cron:session): session opened for user root by (uid=0)
nov 18 04:15:14 rpicasina3 CRON[29099]: pam_unix(cron:session): session opened for user pi by (uid=0)
nov 18 04:15:14 rpicasina3 CRON[29103]: (pi) CMD (/home/pi/bin/duia_myip)
nov 18 04:15:14 rpicasina3 CRON[29102]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
nov 18 04:15:14 rpicasina3 CRON[29098]: pam_unix(cron:session): session closed for user root
nov 18 04:15:19 rpicasina3 CRON[29099]: (CRON) info (No MTA installed, discarding output)
nov 18 04:15:19 rpicasina3 CRON[29099]: pam_unix(cron:session): session closed for user pi
nov 18 04:15:44 rpicasina3 systemd[1]: run-docker-runtime\x2drunc-moby-1f560b6d7c76163b531ec37775422be980e26a6bf0fbef34be8cbb4f04e753e6-runc.HOqdaV.mount: Succeeded.
nov 18 04:16:47 rpicasina3 systemd[1]: run-docker-runtime\x2drunc-moby-1f560b6d7c76163b531ec37775422be980e26a6bf0fbef34be8cbb4f04e753e6-runc.8IybrG.mount: Succeeded.
nov 18 04:17:09 rpicasina3 CRON[29201]: pam_unix(cron:session): session opened for user root by (uid=0)
nov 18 04:17:09 rpicasina3 CRON[29202]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
nov 18 04:17:10 rpicasina3 CRON[29201]: pam_unix(cron:session): session closed for user root
nov 18 04:18:57 rpicasina3 systemd[1]: run-docker-runtime\x2drunc-moby-1f560b6d7c76163b531ec37775422be980e26a6bf0fbef34be8cbb4f04e753e6-runc.XpighZ.mount: Succeeded.
nov 18 04:20:03 rpicasina3 CRON[29347]: pam_unix(cron:session): session opened for user pi by (uid=0)
nov 18 04:20:03 rpicasina3 CRON[29349]: (pi) CMD (/home/pi/bin/duia_myip)
nov 18 04:20:05 rpicasina3 CRON[29347]: (CRON) info (No MTA installed, discarding output)
nov 18 04:20:05 rpicasina3 CRON[29347]: pam_unix(cron:session): session closed for user pi
nov 18 04:25:02 rpicasina3 CRON[29628]: pam_unix(cron:session): session opened for user root by (uid=0)
nov 18 04:25:02 rpicasina3 CRON[29629]: pam_unix(cron:session): session opened for user pi by (uid=0)
nov 18 04:25:02 rpicasina3 CRON[29631]: (pi) CMD (/home/pi/bin/duia_myip)
nov 18 04:25:02 rpicasina3 CRON[29630]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
nov 18 04:25:02 rpicasina3 CRON[29628]: pam_unix(cron:session): session closed for user root
nov 18 04:25:04 rpicasina3 CRON[29629]: (CRON) info (No MTA installed, discarding output)
Please note that:
error=“context deadline exceeded: unknown”
seems to be a coincidence, since it happened only tonight. It never happened again in one of the days above.
Processor and memory
Mosquitto
please not that noone else is disconnecting at the same time
Logs
1668735014: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668736815: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668738616: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668740417: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668741482: Client 1YwNHsbb9Q4dZ7x0ODLOKr has exceeded timeout, disconnecting.
1668741482: Socket error on client 1YwNHsbb9Q4dZ7x0ODLOKr, disconnecting.
1668741487: New connection from 127.0.0.1 on port 1883.
1668741487: New client connected from 127.0.0.1 as 1YwNHsbb9Q4dZ7x0ODLOKr (c1, k60, u’rositaha’).
1668742218: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668744019: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668745820: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668747621: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1668749422: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
Postgres
no logs recorded on the 18/11
Conclusion: I don’t know where else to search. The similar time seems to point to a scheduled job, but I cannot find it.
Can you help me?