Troubleshooting a possible core bug with templates and/or utility_meter integration

Hello - I think I’ve stumbled on a possible bug in execution of template sensor triggers and/or utility_meter.

I have been attempting to create utility_meters that will track the cycles and run time of some sump pumps. My goal was to create three meters for each, tracking hourly, three-hour and daily totals. This allows me to easily create automations when things increase at a bad rate. After a number of issues with the actual triggers on some template sensors for the pumps, I thought I had everything working.

The approach I needed to settle on - after a number of things just didn’t seem to work as I expected - felt a bit like debugging by printf. I created a sensor for last_off_timestamp and last_on_timestamp that were triggered by state changes of on->off and off->on, respectively. Then had the duration just difference these. This allowed me to see the measures that I was deriving from and that got things working. Then, after figuring out that HA needs a full restart when you add utility_meter YAML, things seemed to work.

Almost. I noticed immediately tha the Shelly relay that drives one of the pump contactors no longer seemed to be getting log book entries. I have an automation that sends me a notification when the pump goes on, and that was working, so HA was seeing the switch transition, but no log entry was there. This was 3-4 days ago and I didn’t have time to troubleshoot. Then, a 3AM today in the middle of a prolific rainstorm, I decided to check it. I noticed that the problem was way more widespread. In fact there were no logbook entries at all (usually I can’t cope with the volume). Also, anything that relied on history wasn’t updating. For example, I have about 40 Rieman sum helpers that turn wattage into energy. All of these had no state history. Consequently the energy dashboard had no data for a few days. There are no entries in the overall system log (except the normal spurious thigns from AV receivers, etc, but no failures).

I started by just removing all the utility_meter YAML and restarting but the problem persisted. I then went back and removed all of the template sensors I had created during this attempt (I have others, I left those). After this the history came back.

As a career C/C++ programmer this feels in my world like what would happen if someone caught an exception and ate it. No logging and no recovery. It’s like something went wrong and just prevented a lot of things from being executed after that. I’m not well versed in the HA core (yet) so it will be slow going to troubleshoot where this is happening. If anyone has suggestions of where to start and - hopefully - any instrumentation I can get, it would be greatly appreciated.

I’m going to start now adding template entities back one at a time to see if I can find something that breaks it. After that I’ll try the same with the utility_meters.

Thanks,
Chris

Settings → System → logs

Yes, I’ve looked at the logs, there’s nothing there as I said. I think this means that whatever happened just wasn’t looked. Wasn’t sure if there was a way to change a logging level or something that might provide more.

Sorry, you mentioned the logbook a couple of times, I missed that you also mentioned the system log.