I have an automation that alerts me if something goes wrong with a remote node communication with Hass via MQTT. It’s like a heartbeat monitor. If the message isn’t received/processed then the sensor ‘expires’.
Anyway after a few months I’ve come to realise that often the sensor doesn’t expire because the remote node isn’t sending the heartbeat it’s either that the broker isn’t able to listen for the messages for whatever reason or Hass isn’t acting on the results properly. When this happens, which isn’t often, this always seems to come after a spike in CPU load which soon falls away again but the sytem doesn’t recover. I have to restart the host in order for Hass to start behaving itself again.
Anyone else noticed this?