Constant issues with time based automations. Is anyone actually using this in a mission-critical setting?

I have been trying to get Hass (Hassbian) working for a few weeks now. This system is replacing a Arduino timer that controls a couple relays and switches running my fish room. Setup seems to be pretty straightforward, and the UI is nice.

But I seem to get a slew of errors in my log. The most frequent is “ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting”, but I also see a wide variety of other messages.

The errors aren’t a problem - I can spend a few hours each hunting them down and fixing them.

The problem is that the errors, when they happen at the wrong time, seem to cause my time based automations to not trigger. This means that my lights and water change system are turning on but not off (or vice versa).

Looking around, I see a bunch of other similar issues (here and here for example).

Is random failure of time based automations a known thing? Are you guys seriously using a system that can just randomly fail like this? Should I submit a bug report in Github (if so, where/how - sorry I am a bit of a noob at this)?

I am thinking about setting up a separate MQTT broker and writing a python script to handle all the time based stuff (and just use HA for logging/frontend UI), but this seems like a less-than-ideal solution.

Anyone else encountering stuff like this? Any suggestions?

Those timer out of sync errors are a symptom, not the problem.

Something is chewing up all of the available CPU time. The really bad thing is there is no way to tell what it is, and it probably happens to a lot more people than they realize (they get by with 90% CPU usage and don’t have time sensitive automations).

For me it was the embedded MQTT broker, which would slowly use more and more CPU until HA was pegged at 100% and then automations would start failing. As soon as I swapped it out for Mosquitto the combined CPU usage between the two of them dropped into the 2-5% range and everything works exactly as expected. However, I’ve seen other people say that for them it was discovery & media players.

It can also be caused by poor programming on the users side. Automations that run long, scripts that are trying to process video, etc… You can’t really blame HA for that, but there is no way to figure out what the problem is.

If you are using the embedded MQTT broker I would switch it out for a 3rd party one. If that doesn’t solve your problems then you can try disabling components/automations until you find out what the problem is. You can then create a issue (or add to an existing one) on github if you find out what your problem is.

Thanks for the suggestion! I’ve switched to Mosquitto. CPU usage has been hovering around 0 - 1%, so I don’t know if that is my problem.

Also - if I disable my automations one at a time, I am quite sure this will happen less often because there’s less “stuff” going on. I know because the trend sensor was kicking up a bunch of errors and was causing events to get missed all the time - removing my trend sensors reduced this, but things still get missed maybe 1-2% of the time.

I can’t have a LOW failure rate on things getting triggered, I need NO FAILURES.

With beta software there are no guarantees of no failures and the same goes for non beta s/w as well, with a system this complex there no chance at all that all the variables can be covered, you can mitigate it a bit by defensive programming and having backup/redundant systems in place but that entails more cost, the choice is yours at the end of the day - if the Ardruino system works (keep it running while either mimicking it with HA or keeping it as mission critical and just use HA as a monitor/reporting system till confident that all works as it should) then I would suggest adding mqtt to it and using that to keep HA informed of status such that you can be notified of when something is out of kilter.

See here for help in this field :smiley:

HA shouldn’t hide behind a “it’s just a beta” label, though.

Obviously I have failsafes in place (which is actually how I spotted this issue). I would like to move away from my old Arduino system (which is very difficult to expand/modify). Thanks for the link. HA is definitely - at least from a cursory glance - ideal for a aquarium controller.

Here is my - as-of-yet not fully tested standalone MQTT timer:

import schedule
import time
import os
import paho.mqtt.publish as publish
#import datetime

def mqtt(topic, payload):
    try:
        publish.single(
            topic,
            payload,
            qos=0,
            hostname="192.168.1.103",
            port=1883,
            client_id="python",
            auth={'username':"homeassistant", 'password':'ha, you wish!'}
        )
    except Exception as e:
        print("Error: %s" % str(e))

def lightsOn():
    mqtt("home/fishroom/switch/1", "on")

def lightsOff():
    mqtt("home/fishroom/switch/1", "off")

def smallTankWaterChangeOn():
    mqtt("home/fishroom/switch/10", "on")

def smallTankWaterChangeOff():
    mqtt("home/fishroom/switch/10", "off")

def largeTankWaterChangeOn():
    mqtt("home/fishroom/switch/11", "on")

def largeTankWaterChangeOff():
    mqtt("home/fishroom/switch/11", "off")
    
def heartbeat():
    mqtt("homeassistant/mainframe/heartbeat", "OK")

schedule.every(1).minutes.do(heartbeat)
schedule.every().day.at("07:30").do(lightsOn)
schedule.every().day.at("12:30").do(lightsOn)
schedule.every().day.at("08:30").do(lightsOff)
schedule.every().day.at("22:00").do(lightsOff)
schedule.every().day.at("07:45").do(smallTankWaterChangeOn)
schedule.every().day.at("11:45").do(smallTankWaterChangeOn)
schedule.every().day.at("14:45").do(smallTankWaterChangeOn)
schedule.every().day.at("18:45").do(smallTankWaterChangeOn)
schedule.every().day.at("07:30").do(largeTankWaterChangeOn)
schedule.every().day.at("11:30").do(largeTankWaterChangeOn)
schedule.every().day.at("14:30").do(largeTankWaterChangeOn)
schedule.every().day.at("18:30").do(largeTankWaterChangeOn)
schedule.every().hour.do(largeTankWaterChangeOff)
schedule.every().hour.do(smallTankWaterChangeOff)

while True:
    schedule.run_pending()
    time.sleep(1)

Just wanted to give a quick update on this:

Switching from the built in mqtt broker to mosquitto completely fixed the timing issues. I have not missed a single event since switching. Huge thanks to nordlead2005 for suggesting this.