I think I’ve got this problem solved.
First - I had previously noticed in the Mosquitto Broker logs that a number of switches were disconnecting and reconnecting to the network. My brother-in-law suggested that this was because by default, the switches periodically look for alternate AP’s that have a better signal.
In my ignorance, I thought that was silly, as the devices aren’t moving around and therefore where they are connected when they were installed is the best signal they’re going to get, so I went into my router console and locked each device to a specific AP.
That was the wrong thing to do. Why? Because the switches continue to look for better AP signals every 44 minutes, based on the value of SetOption57.
Recall that I was having issue with Mosquitto reporting that switches were failing to send their keep-alive signal and were being disconnected as a result.
In an effort to provide debugging, I created a pair of automations for each device, one to set a persistant notification when a device became unavailable, and another when it became available.
I think there was some interference or collision activity that was associated with the 44-minute search process that exceeded the 30-second keep-alive window that’s hard-coded into the Tasmota framework. There is a SetOption where you can change the value, but when the device restarts, the K option remains at 30.
My battery backup failed and Home Assistant and my gateway and router and switch all went down. When we got everything up and running, there was a massive amount of failure to connect on all the devices locked to the AP that’s in the kitchen of my home. Devices locked to the AP in the bedroom-end of the house had no issues. Before I noticed it, I had like 27,000 notifications…
There was something screwy with the kitchen AP, so I took it offline, and one-by-one, removed the AP lock on any device locked to the kitchen AP. That calmed the noise as the devices one-by-one, found the bedroom AP and connected - some with very poor signal strength.
Then I noticed that I no longer had the messages about Mosquitto not seeing a keep-alive from switches.
I let things stand for a couple of days, then brought the kitchen AP back online. Slowly, devices closer to this AP moved their connection - and no disconnects by Mosquitto.
Now the only thing I saw in the Mosquitto log was a lot of “ already connected - closing connection”
I don’t think these are terrible - they all occur within the same second, but I’ve got 60 Tasmota devices (and growing), so that’s a lot of “noise” in the log.
I started comparing the times when the same switch would appear in the log, and most of the time, the entries were separated by 44 minutes…
I got rid of this “problem” by disabling SetOption57.
So now, all I’m seeing in the Mosquitto log is “saving in-memory database” entries.
So - lessons learned?
- If your installation is large enough (covers a lot of square feet) and you have multiple AP’s, let Tasmota figure out what the best connection is. Leave SetOption57 alone for a few days until the AP selections settle out, then disable the option.
- Do not lock your devices to a single AP. Baaaad things happen if that AP crashes or has connectivity issues. (My gateway console even warned me about this possiblity as I was locking each device…)
That’s all for now. Hope this long story has some benefit.
Best