Cannot upgrade beyond 2024.7.4. - High CPU issue on core

Hello experts,

Currently I am running my production HA on:

Core: 2024.7.4
Supervisor: 2024.08.0
Operating System: 13.1
Frontend: 20240710.0
6 cpu / 16G in a VM on Proxmox 8
AND, all works GREAT as long as I don’t upgrade to a later version.

To test I restored a backup to another (fresh build) Proxmox, but ALSO to a mini-pc, both virtual and physical cannot deal with the upgrade.

By now I have logging completely disabled on the test box, disabled ALL add-ons. I also have reverted back to SQLite and deleted the original database for good measures.

On the test-box went into OS and was able to get into the homeassistant container by:
docker exec -it homeassistant /bin/bash.

If I then run “top” the result is:
Mem: 3797928K used, 6385148K free, 4180K shrd, 156120K buff, 1057956K cached
CPU: 25% usr 0% sys 0% nic 73% idle 0% io 0% irq 0% sirq
Load average: 1.00 1.00 0.97 2/514 892
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
67 65 root R 3075m 30% 3 25% python3 -m homeassistant --config /config

The command “python3 -m homeassistant --config /config” is at a minimum of 25% constantly, whilst on the prod machine this is never more than 2.
Although 25% might not look like “a lot” it basically makes HA really (really, really) slow in response.

So now the real questions:

  • How can I further diagnose what is going on?
  • Anyone else seeing this?
  • Although I disabled (actually removed most) add-ons, could it be an integration having this impact on Core?

Thanks for help, thoughts and inspiration.

Don’t look just at add-ons, disable custom integrations too.


Disabled a lot of stuff, but not there yet.

The interesting bit is that the high cpu sometimes is immediate after starting, mostly it is around 20 minute mark, but this time it was almost one hour.

Hard to troubleshoot this too be honest, as when this happens there is no way I can get to logs anymore.

ok, for those interested, finally was able to upgrade to 2024.9.1 and now run for half a day without issues.

Eventually I don’t think that I can pin-point one source, but for sure none of the integrations, add-on’s or HACS (and through HACS installed components) were the issue.

I removed several “yaml” based integrations in my dev system and that eventually stabilized the installation.
replicated this on prod, removed a BUNCH of other integrations that I basically did not use and upgraded successful.

The only thing that is still left (had that with the old(er) releases also already) is a slow memory leak.

So that probably will be a different topic at some point.

Regarding the increased memory usage, apparently has to do with Proxmox.

So will ignore it for now and see how it affects stability.