Getting an elert when a sensor fails to update

My HA environment is pretty much complete for now and my attention is moving to the topic of system monitoring. My HA environment uses ZWave, Zigbee, Shelley and OpenEnergyMonitor devices for sensing and control. Wherever possible, devices communicate with HA via MQTT.

I had a situation that caused me some grief this morning, and highlighted the need for some sort of monitor/alert system. I have a collection of OpenEnergyMonitor temperature/humidity sensors that talk to an emonHub instance running on a Pi3. This same Pi3 has a HA Supervised installation running openzwave and zigbee2mqtt.All three domains communicate via MQTT back to a central HA instance.

I noticed that the heating was running when it shouldn’t. A quick glance at HA showed that none of the temperature sensors had updated for some hours. Digging showed that whilst emonHub was running and conneted to the mqtt broker, its connection to the RFM69 addon had failed and therefore emonHub wasn’t getting sensor updates.

The problem was fixed with a power cycle of the Pi, but it highlighted a problem that could have been picked up by a system monitor that raised a notification if it hadn’t seen any mqtt traffic from emonHub in a definable period of time.
I looked at something like nagios or zabbix, but they have nothing that monitors beyond the connection to the mqtt broker - not the apps that should be sending data to the broker.

Has anybody explored this area?

If you are using MQTT Sensors and they are configured in YAML, you can add an option called expire_after.

For example, if you set expire_after to 900 seconds (15 minutes) it will expect to receive a value at least every 15 minutes (or sooner). If it fails to receive an update within that time period, it will set the sensor’s state to unavailable. This state-change can trigger an automation which can proceed to do whatever you feel is needed to mitigate the problem (minimally, to notify you that it’s happening).

Thanks @123. I’ll take a look.

I was having trouble keeping a raspberry pi machine that was collecting a bunch of bluetooth temperature sensors running. So wrote this very crude python program to reboot this machine when the monitoring program saw that no MQTT packets had been published by the collector server for a period. It is run, beside Home Assistant, not in HA. I run it in a Docker container, but you could set it up to run in a tmux session or systemd service.

It is a brutal solution, as it just does a ‘sudo reboot’ on the machine. And am not much of a coder, so…

But perhaps it will give you some ideas, good hunting!

You might also be able to find away to use the Linux/Raspberry Pi software watchdog features to monitor MQTT as well, I have not tried this, but it seems like it might be a possibility. Here are some links to write ups on using watchdog. Note, the watchdog service and it’s name have changed depending on Raspberry Pi OS, so check that you are using the right version for your OS version.


https://linux.die.net/man/5/watchdog.conf
RUNNING FOREVER WITH THE RASPBERRY PI HARDWARE WATCHDOG
https://diode.io/raspberry%20pi/running-forever-with-the-raspberry-pi-hardware-watchdog-20202/

not clear, based what version of pi and os is needed to setup
but I did the watchdog install above
https://raspberrypi.stackexchange.com/questions/108080/watchdog-on-the-rpi4

WatchDog for Raspberry Pi
24. September 2016 von Hödlmoser
https://blog.kmp.or.at/watchdog-for-raspberry-pi/

Enabling Watchdog on Raspberry Pi
Arslan Zahid
Dec 30, 2019 · 2 min read
https://medium.com/@arslion/enabling-watchdog-on-raspberry-pi-b7e574dcba6b