I recently observed that Home Assistant was disconnecting from my Mosquitto MQTT broker on a fairly regular basis. I would observe the following many times throughout the day.
homeassistant/status offline
homeassistant/status online
I tried reducing the keepalive interval to the minimum of 15s but that did not seem to help. I next checked the logs for mosquitto service which I run in an LXC container on Proxmox (I’m not running the add-on) and noticed that almost all of my clients were being disconnected and reconnecting at the exact some time. These disconnects seemed to coincide with the following log message:
Saving in-memory database to /var/lib/mosquitto/mosquitto.db
I then checked the the CPU on the LXC and noticed that when the DB was saved the CPU was hitting 100%. Then it dawned on me - I only had a single core dedicated to the mosquitto broker container and it looks like it was blocking on the IO and probably dropping all of the clients (I did not run a debugger or anything to confirm this but that is my suspicion). I went ahead and added an additional core to the container (new total of 2) and the problem went away. I’ve been running without random disconnects for about 24 hours.
Prior to finding this, I searched around for tips on how to solve this problem but none of them seemed to be relevant or worked for me. I wanted to share this in case anybody else has a similar situation. If all of your clients are dropping at the same time your DB is being saved, check your system resources and the size of your DB (mine was very small but some can get quite large). Hopefully that helps!