Timer got out of sync. Resetting - is there a solution?

Is there any solution to these? It’s really hard to filter through the logs when it’s constant:

2018-02-10 19:10:10 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-10 19:10:13 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-10 19:10:15 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting

Version 0.63 but it’s been ongoing forever. Not sure if it’s related, but there are also a ton of these too:

2018-02-10 19:11:46 WARNING (MainThread) [homeassistant.components.sensor] Updating snmp sensor took longer than the scheduled update interval 0:00:10
2018-02-10 19:12:08 WARNING (MainThread) [homeassistant.components.sensor] Updating snmp sensor took longer than the scheduled update interval 0:00:10
2018-02-10 19:12:28 WARNING (MainThread) [homeassistant.components.sensor] Updating snmp sensor took longer than the scheduled update interval 0:00:10
2018-02-10 19:12:49 WARNING (MainThread) [homeassistant.components.sensor] Updating snmp sensor took longer than the scheduled update interval 0:00:10

These are usually related to high CPU usage or high system load. Do you have a lot of other things running on your system?

Nope, it’s a dedicated VM with 4CPU cores and 2GB of RAM. Load doesn’t appear to be super high:

I just upgraded so I suspect the load is a hair higher right now. I’ve seen them forever though.

Just upped it to 4GB of RAM. Still seeing them.

This is my cpu usage on a single core 512MB Pi B+. If you are running at your levels on a quad core VM I think you need to take a look at the components you are running to see if they are hogging the cpu.

I mean, that’s the entire reason I run it on a beefy VM so that I CAN enable tons of components. Is there any way to figure out what might be causing the usage without disabling things one by one? That could take weeks, I have quite an extensive config: https://github.com/danodemano/Home-AssistantConfig

I think if you disable half of them, you should be able to see in which half the problem lies, and then continue dividing by half.

The other thing you might check is that logging is turned back to warning by default, but I doubt if that is all of the problem.

I faced the same problem and it turns out for me it was the embedded MQTT broker.

https://github.com/home-assistant/home-assistant/issues/12117

EDIT: I looked through your config and nothing jumps out as a potential problem. You could do what I did and try to setup a mirrored system that runs parallel to it. Not ideal, but it allows you to pump data through it and see if disabling something causes the problem to go away. If you get lucky maybe you’ll figure it out while you are setting up the duplicate system.

Too bad there isn’t a way to profile the HA runtime, then this would be easy.

So I messed around a bit, starting with the SNMP sensor that that one was throwing a lot of warning as well. That seems to have greatly reduced the timer out of sync errors. It also dropped my CPU/memory usage significantly:

I’ve only have 5 timer out of sync’s over the past 1.5 hours. Prior it was every few minutes at least.

I agree, a was to profile the runtime would be a HUGE help. I’m going to let it run like this for a day or two before making any other changes and see what happens. Thanks all!

2 Likes

I just started seeing this with the latest update. I have run merrily along on a rPi 3 for quite some time without problem.

After the update I am also seeing:

Updating rest binary_sensor took longer than the scheduled update interval
Error getting new camera image: RawResponseMessage(version=HttpVersion(major=1,....

Is it all related?

Wow, thats a massive difference. Yeah HA really shouldnt be using alot of CPU up, unless it’s triggering ffmpeg or doing some form of audio processing.

Mine sits at around 1%.

So I’m back to getting quite a few of the errors, and the CPU/memory usage is back up. This is a 48 hours graph showing my change yesterday:

What’s really interesting is that it spikes overnight. Seems to be ~3:15 for the CPU and a few minutes later for the RAM:

Now I’m really scratching my head as to where to go from here… Side bar, I do run node-red on the same machine. Is it possible it’s doing something?

I run Node-RED and my performance with HA was bad before adding Node-RED and Node-RED didn’t make it faster either. Under normal operation, my Node-RED instance takes less CPU than HA (which isn’t shocking, since javascript is a good bit faster).

Now, this isn’t to say that Node-RED isn’t at fault, just that it isn’t inherently at fault. If you have a timer in Node-RED that causes it to pump tons of service calls into HA it’ll probably bring HA down.

1 Like

Yeah looking at htop node-red is taking just a few percent of the CPU, WAY less than the HASS instance. What do you mean by timer, something that is constantly querying HASS? I have a couple of event states but most of it is triggering lights and such in HASS or checking if a switch is on or off when running a flow.

I just mean anything that can trigger messages in node-RED and cause your flow to call an excessive amount of service calls (the non-ws node uses the http REST API for service calls). A hypothetical would be you meant to trigger a service call in HA once per minute, but thanks to a typo in your flow/code instead you are making thousands per second. Not likely, but you should make sure node-RED is behaving.

1 Like

Ahh ok. I just went through all my flows and everything looks correct. I also just shut down node-red for a few minutes with no appreciable difference in CPU/RAM usage:

Is there any way to see what a specific HASS process is doing? 10428 seems to be the one stuck at the top for usage:

So, interestingly enough, it appears the overnight spike is MySQL. It’s still high right now so I jumped in to take a look and MySQL is hammering the CPU:

Furthermore, this is a HASS query that is all balled up:

Anyone know what that query is doing and why it takes it ~12 hours to run?

EDIT: I went ahead and opened this as an issue on Github: https://github.com/home-assistant/home-assistant/issues/12374 A 12 hour running query is a problem of some sorts.

Odd. I’ve moved over to a different platform (from pi) and stopped getting the timer out of sync errors, but through the night I keep having trouble with Bluetooth where homeassistant can’t connect to the dongle (so it then marks us away, then home, then away etc).

I’ve been trying to get my head around it because it only happens in the middle of the night.

If something else is suddenly hogging resources at that time, might explain it. Further investigation required :+1:

For those not following the github issue, this is (at least partially) related to the recorder purge. It seems that MySQL is taking 9-12 HOURS to run the purge query slamming one CPU core that entire time. There is a PR in right now that re-writes the query which should eliminate this problem: https://github.com/home-assistant/home-assistant/pull/12448

Once that rolls out we’ll see what my performance looks like and if the out of sync messages are still there.

4 Likes

Been letting this run for a while but still getting tons of these:

2018-02-27 15:16:24 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 15:17:25 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 15:31:46 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 15:33:18 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 16:09:35 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 16:16:06 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 16:32:35 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 16:34:09 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting
2018-02-27 17:10:12 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting

CPU and memory usage are quite low so it’s not related to load: