Major issues with high CPU utilization on HAOS

Hey guys, since weeks every few days I have serious trouble with my HAOS-VM-instance.
I’m running Home Assistant in a VM with HAOS, which is hosted in Truenas and has an AMD 5600GT CPU. This CPU is relatively powerful compared to RPI’s and other SBC’s.

What happens quite often is, when I install any update that requires a restart - doesn’t matter if some HACS stuff, the CoreOS or any other thing that wants the system to be rebootet, after the reboot my CPU is having a blast. It goes up to the point where the system is basically unusable.

I tried basically everything I can think of, which has been disabling integrations, rolling back from the backups before the update, I also rolled back a snapshot today from before the update.
But nothing helps.

In the past mostly after trying like 3-5 different things one of them (sometimes backups, sometimes reboots etc.) helped. But I can not really dig into what the problem is.

I have seen about the DNS-issue and also disabled the DNS fallback, but that is also not solving the issue for me now.

I have Glances and it is caused by HA itself:

I wanted to check with py-spy, but I have no idea how to find the right PID.
This is what “top” from within the HA-terminal shows:

Not really useful, isn’t it?

This issue has wasted hours of my life, maybe someone is able to help and find with me the issue and stop it.

Ok so you’ve isolated it to the homeassistant container. Great.

Unfortunately that means it can literally be anything running as a ha integration because of how the memory model works.

You may want to check out the section in the cookbook about troubleshooting integrations it’s near the bottom of the troubleshooting section…

1 Like

As fas as I can see - there is some majority of my CPU used by something that is “cycle 9”:

How can I further debug? I have no idea ho to proceed fro here.

1 Like

See what’s listed right under that cycle 9 proc attached to it and using the rest of the proc MQTT. Either that proc is called by MQTT or is calling MQTT.

Whats the last thing you did with MQTT.

You can probably get HA to start in safe. Mode and diagnose that

Nothing special - I have very few MQTT-devices and even disabled the whole MQTT after I saw that just for testing. Did not solve anything.
But also I assume some other integrations might internally uses MQTT and I can not know which.

1 Like

That’s why safe mode helps. You’re going to have to go in (safe mode so you can boot) then disable all the integrations and reboot. The turn the on one at a time until you find it.

(yeah you’re in VM but you’re now in good ol fashion troubleshooting land friend, sorry)

1 Like

I think I found it - my Landroid-Lawnmower was the troublemaker.
It is just incredibly annoying.

For now I’ll leave it off and should be good to go.

1 Like

And you learned a new skill! Rock on.

No so much :smiley:
At the end of the day, I saw that MQTT is involved, but this skill did not help - I just disabled integration by integration again…

The thing I really can’t understand is, how can this still malfunction, despite I rolled back the updates? That is the reason I did not disable it in the first place.
I has caused problems before - but since I just rolled back (via a snapshot) the whole system, I basically was 100% certain, that this little update can not cause the problem.

Because it’s not the update in question thats doing it. Correlation is not causation.

It means while one may have TRIGGERED the other, one is not the CAUSE of the other. you also learned where to look.

2 Likes