I have HA running on a Raspberry Pi 4 and it runs quite stable since one year on a USB SSD.
However, at exactly 1:00 plus/minus 10 minutes each night, some of my entities (Warema awnings intgrated via MQTT) become unavailable, which I suspect comes from a crash of my MQTT broker (Mosquitto AddOn).
I saw that my CPU, RAM and temperature as well as network usage spike up around the time of the problem, like this:
How could I check if Mosquitto crashed during this time?
HA runs normally during the crash, as seen because all my other sensors (Zigbee, ESPHome, WiZ plugs, openWB) stay available and give normal values.
Check the HASS logbook for anytthing happening at 01:00:10 as that sounds like a daily scheduled systemd / cron job.
The other test is to power cycle everything and see if the issue moves in time - MQTT data can time-out (depending on retained and availability settings).
If this helps, this post!
PS The next question after Why? is ALWAYS What is in the logs?. The only question is how much help you need to interpret the logs.
I also suspect that, however Iām not the network administrator nor do I have access to the networking equipment, I just have an Ethernet port in my room where my HA and my PC get their Internet connection from.
How could I check if it really is a scheduled reboot of something?
Well that would interrupt any kind of operation, I would setup a very dumb script that curls every second to any http endpoint that returns the current datetime and writes the result to a file.
Then the morning after see if there are any holes in the sequence
I have the same problem: Every night at 04:18 (+/- 5 minutes), the Mosquitto Broker Add-on restarts. Excerpt from the Mosquitto log:
s6-rc: info: service legacy-services: stopping
time="2024-11-20T04:16:20+01:00" level=error msg="POST error: Post \"http://127.0.0.1:80/superuser\": read tcp 127.0.0.1:42690->127.0.0.1:80: read: connection reset by peer"
time="2024-11-20T04:16:20+01:00" level=error msg="POST error: Post \"http://127.0.0.1:80/acl\": dial tcp 127.0.0.1:80: connect: connection refused"
time="2024-11-20T04:16:20+01:00" level=error msg="Post \"http://127.0.0.1:80/superuser\": read tcp 127.0.0.1:42690->127.0.0.1:80: read: connection reset by peer"
2024-11-20 04:16:20: mosquitto version 2.0.18 terminating
time="2024-11-20T04:16:20+01:00" level=error msg="POST error: Post \"http://127.0.0.1:80/superuser\": dial tcp 127.0.0.1:80: connect: connection refused"
time="2024-11-20T04:16:20+01:00" level=error msg="POST error: Post \"http://127.0.0.1:80/acl\": dial tcp 127.0.0.1:80: connect: connection refused"
time="2024-11-20T04:16:20+01:00" level=error msg="Post \"http://127.0.0.1:80/superuser\": dial tcp 127.0.0.1:80: connect: connection refused"
2024-11-20 04:16:20: Saving in-memory database to /data//mosquitto.db.
[03:16:21] INFO: Service restart after closing
[03:16:21] INFO: Service restart after closing
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/mosquitto.sh
[04:16:32] INFO: Setting up user USER
[04:16:33] INFO: Certificates found: SSL is available
cont-init: info: /etc/cont-init.d/mosquitto.sh exited 0
cont-init: info: running /etc/cont-init.d/nginx.sh
cont-init: info: /etc/cont-init.d/nginx.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun mosquitto (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
[04:16:35] INFO: Starting NGINX for authentication handling...
s6-rc: info: service legacy-services successfully started
[04:16:35] INFO: Starting mosquitto MQTT broker...
2024-11-20 04:16:35: Warning: Mosquitto should not be run as root/administrator.
2024-11-20 04:16:35: mosquitto version 2.0.18 starting
2024-11-20 04:16:35: Config loaded from /etc/mosquitto/mosquitto.conf.
2024-11-20 04:16:35: Loading plugin: /usr/share/mosquitto/go-auth.so
2024-11-20 04:16:35: āāā Username/password checking enabled.
2024-11-20 04:16:35: āāā TLS-PSK checking enabled.
2024-11-20 04:16:35: āāā Extended authentication not enabled.
2024-11-20 04:16:35: Opening ipv4 listen socket on port 1883.
2024-11-20 04:16:35: Opening ipv6 listen socket on port 1883.
2024-11-20 04:16:35: Opening websockets listen socket on port 1884.
2024-11-20 04:16:35: Opening ipv4 listen socket on port 8883.
2024-11-20 04:16:35: Opening ipv6 listen socket on port 8883.
2024-11-20 04:16:35: Opening websockets listen socket on port 8884.
2024-11-20 04:16:35: mosquitto version 2.0.18 running
Excerpt from the HA-Core Log:
2024-11-20 04:16:24.615 ERROR (MainThread) [paho.mqtt.client] failed to receive on socket: [Errno 32] Broken pipe
What could this be?
Edit: Excerpt from the Supervisor Log:
2024-11-20 04:15:56.201 WARNING (MainThread) [supervisor.misc.tasks] Watchdog found a problem with core_mosquitto application!
2024-11-20 04:16:20.270 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping addon_core_mosquitto application
2024-11-20 04:16:25.811 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_core_mosquitto application
2024-11-20 04:16:28.533 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/aarch64-addon-mosquitto with version 6.4.1
2024-11-20 04:16:35.992 INFO (MainThread) [supervisor.auth] Auth request from 'core_mosquitto' for 'mqtt_user'
2024-11-20 04:16:36.505 INFO (MainThread) [supervisor.auth] Successful login for 'mqtt_user'
2024-11-20 04:16:38.349 ERROR (MainThread) [supervisor.services.modules.mqtt] There is already a MQTT service in use from core_mosquitto
I think I found the cause. When I installed Mosquitto years ago, options were written into the config.yaml (e.g., device_options, passlist, blocklist, etc.), which no longer belong there. I deleted the corresponding entries from the Mosquitto config.yaml and then restarted the entire system. Mosquitto has been running smoothly for two days now.
However, it is still very mysterious that watchdog caused a reboot every night at the same time due to alleged Mosquitto issues.