High CPU load on HAOS since a while, need help troubleshooting

Since 27th of November my HAOS (on RPi4) is running at ~30% cpu use instead of as earlier, around 10%. Unfortunately I just noticed so I can’t recall if I changed something particular on that date…

I have tried disabling addons and integrations, and also booting in safe mode, but the high CPU usage persists. I’ve browsed through the log files but nothing really stands out (to me at least).

I tried following this guide (Instructions to install Py-spy on HAOS) to install and run py-spy, but I run into the similar (unanswered question/problem) as the last poster:

$ ./py-spy top  --pid 67
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: PosOverflow }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-maps-0.2.1/src/linux_maps.rs:81:65
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: receiving on a closed channel

If I just run top (after docker exec -it homeassistant /bin/bash) I get the following:

…which doesn’t really tell me much. Maybe the go2rtc is somewhat fishy? Not sure what it is, but it does not seem to be the source of the CPU usage anyways, that is homeassistant.

I have also tried running ha dns options --fallback=false (as mentioned here: https://www.reddit.com/r/homeassistant/comments/1bklg68/psa_got_consistently_high_cpu_usage_here_is_a_fix/) but that did not help neither.

I’d be really happy if I can somehow avoid needing to reinstall everything I have built… Can anyone point me on how to continue my trouble-shooting?

(Lesson learned: perform regular backups, not just after fresh installation…)

(Some other thing I have noticed but I don’t think is the cause of the problem:

  • after booting, my mqtt2zigbee devices are not responsive, until I restart mqtt2zigbee. I have however tried stopping mqtt2zigbee entirely and that did not help
  • I am not able to create a full or partial backup (the job fails, I have not dug deeper yet…)

Thanks in advance,
RD

Hello RD,

2024.5+: Tracking down instability issues caused by integrations might help.

1 Like

But is anything not working?

Because I’m pretty convinced that sometime between June and October this year the updates to the core has causes a much higher standard memory load.

Not necessarily a bad thing either. Unless you’re running on a Pi3…

This also anecdotally is backed up both by the recent update of minimum system requirements pulling off anything less than a Pi3 and recommending a Pi4 or better.

But in any case, using MORE RAM/CPU isn’t necessarily an issue. Exhausting ram/Cpu is an issue. You want the box to use all the resources it can gracefully because you paid for it. Unless the box starts suddenly starts crashing I’d not worry about it. So is it crashing?

1 Like

Unfortunately, yes. To start with, the system takes much longer to “come online” fully and f. ex. yesterdat it took MQTT ~2h before it was fully functional (before that throwing the error Error talking to MQTT: The client is not currently connected. repeatedly. Additionally, Glances has failed to start, and the system as a whole feels more unresponsive.

And, it is such a large jump in cpu load (and as a consequence, f. ex. temperature) that even if everything was working well, I’d be tempted to investigate as a jump from 10->30% is really large, as I didn’t install any new addons et.c. as I recall.

I could understand 5% or so perhaps, as you say, as HAOS grows it can of course need to consume more resources.

…Now I got Glances running btw, and this is what it shows: :-/

I’ll look into Profiler as per @Sir_Goodenough s suggestion.

BR
RD

2 Likes

It seems the culprit has been found - with some help from Profiler and bdraco who pointed out that mqtt was likely involved, I took a closer look and found an enormous amount of devices and entities:
image

It is likely my LilyGO ESP32 (433.92Mhz) receiver that has somehow gone rogue and created all/most of these imaginary devices (as I for some reason set its topic to homeassistant)… [Strange, as it’s been running a year+ w/o these kind of issues.]

I disabled the LilyGO ESP32 and deleted the all homeassistant → sensor topics (through MQTT Explorer) and directly saw a drop in CPU of ~15%! The ones that actually exist and are sent from other physical devices have come back, as they should.

(In the end I also decided to start semi-fresh from a few month old backup, and having set up almost everything as I want it, I’m not at ~8% CPU on average which is more in line with what it was before, and what I am used to. We’ll see if I take the chance/risk to reconnect the LilyGO ESP32 and see what happens… - after backing up my current setup, of course.)