High CPU usage -- potential steps to diagnose?

I’m running HA on an Android device using termux and udocker, so I acknowledge that this is not supported. So no help is expected.

CPU usage is normally very low (one core < 20% utilized). Occasionally after a restart CPU usage is notably higher (3+ cores ~100% utilized) and remains at a consistently high level – even after running for hours. Usually a restart returns the system to a lower CPU level. It really does seem to be that there are two levels, determined by something at system startup. The processes, from ps, using high CPU are “python3 -m homeassistant --config /config”. No news there.

I have installed the Profiler integration and used “Log current asyncio tasks”, “Log event loop scheduled” and “Log thread frames” to create logs in both high and low cpu usage situations. I see no notable, consistent differences between the log output.

Other than the CPU utilization, I see no differences in function.

Any suggestions for using Profiler or other tools to diagnose what is happening?

It’s not a critical problem for me – it usually takes only one or two restarts to return the system to an ultra-low cpu usage state (where it remains until a restart). But if I can spot a problem, maybe it is useful to the community.

Thanks for any pointers.

Try looking at the CPU load for a APP/Addon one at a time, located in the APP/Addon area. If it seems high, try stopping it for a few minutes, then restarting it. Watch CPU and memory it consumers. That should help you pin point a possible hog of CPU or memory.

As a “container” installation, I don’t have Apps or Add-ons.

I can try disabling integrations to see if the problem is related to one of them.

What Router do you have? I have a OpnSense, when I upgrade my router (and reboot it), I also have a high usage in Home Assistant. I restart of Home Assistant solves the problem.

In my case I shutdown Home Assistant, upgrade OpnSense, reboot OpnSense and than I start Home Assistant.

I disabled every integration and still have the same problem! Might even be worse – so far every restart has ended up with high CPU utilization. Will continue restarts to see if that changes. Previously it was maybe 50/50 – or possibly slightly more likley that it would be low CPU.

I guess I’ll also create a new HA configuration and see if I have the problem with a fresh setup.

@Ronny1978 It’s a TP-Link Deco. I’ve never sensed any relationship between HA and the router. I’m not sure I’ve rebooted it recently, but will try that.

@Ronny1978 router reboot does not appear to make any difference. A good thing to check!

Since you are down to just a basic HA setup (No integrations), it has to be network related, like DNS, mDNS retries

I tried with a fresh configuration and had the same result – sometimes it starts and has low CPU utilization, sometimes it ends up with high CPU utilization.

I didn’t even go past the Welcome screen on HA, so presumably very little was happening!

Any way to test and/or fix that? I’ve seen that mentioned in other high cpu posts – but it’s not clear what to do about it. I don’t think I can run “ha dns options --fallback=false” at my normal prompt, since home assistant is run in a udocker “container”. Can I edit a configuration file directly?

Try the profiler as mentioned here: 2024.5+: Tracking down instability issues caused by integrations
What does top -em -co%CPU look like?

Thanks @Impact Where am I running that command? A quick scan suggests that I need to run qcachegrind to view the callgrind output. Is that correct?

In the part of the UI the link links you to :slight_smile:
It’s one of the tools that can read these files, yeah.

I don’t know if this is helpful information, but if I enter a shell within the container to look at /etc/resolv.conf it shows:
nameserver 8.8.8.8
nameserver 8.8.4.4

You would have to ping and see the time from inside the HA container/VM.

Profiler does not appear to be reporting on whatever it is that is consuming CPU. The results I see (via cachegrind or callgrind) appear to be very similar in the low CPU and high CPU cases. I’ve been running the profiler for 300 seconds – the resulting total ns is always around there (e.g., 301,503,025,573 (100.0%) PROGRAM TOTALS).

Results always follow this pattern:

242,410,849,114 (80.40%)  ~:<method 'poll' of 'select.epoll' objects>
 31,762,376,436 (10.53%)  ~:<built-in method select.select>
  2,292,716,142 ( 0.76%)  /usr/local/lib/python3.14/asyncio/base_events.py:_run_once
  1,072,131,002 ( 0.36%)  ~:<built-in method _io.open>
    947,257,157 ( 0.31%)  /usr/local/lib/python3.14/asyncio/events.py:__lt__

During the sample period:
low CPU – only one core generally active, < 20% utilization
high CPU - 3 or 4 cores nearly 100% utilization

So I would expect to see something noticably different in the Profiler output.

Are there parameters I can adjust to have it monitor additional processes?

It might not be python (core) process related. Can you check top -em -co%CPU or similar like asked?

I don’t know how to do that within cachegrind. When I look at the processes that are running (using ps, top, htop, etc) - I see:

Or maybe this is helpful from top -H:

  PID   TID USER     PR  NI CPU% S     VSS     RSS PCY Thread          Proc
25440 25869 u0_a220  20   0  12% R 631808K 375428K  fg SyncWorker_6    python3
25440 25870 u0_a220  20   0  12% R 631808K 375428K  fg SyncWorker_6    python3
25440 25871 u0_a220  20   0  12% R 631808K 375428K  fg SyncWorker_6    python3

Not always the same SyncWorker_n – the next run of HA it was 1.

There may be a relationship with the number of cores I let HA use. I was limiting it to 4 of 8, and would get three cores at ~100% (each is 12% 100/8). When I let it use all 8 cores, I know have 7 threads using ~100% cpu.

  PID   TID USER     PR  NI CPU% S     VSS     RSS PCY Thread          Proc
27501 27978 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27982 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27981 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27977 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27980 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27983 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27979 u0_a220  20   0  12% R 762148K 373932K  fg SyncWorker_5    python3
27501 27501 u0_a220  20   0   0% S 762148K 373932K  fg python3         python3
15615 15615 u0_a220  20   0   0% S  35640K   2316K  fg sshd-session    /data/data/com.termux/files/usr/libexec/sshd-session

Yeah so it’s definitely HA/core related and the profiler should catch that. I’m not a big fan of it though. It crashes if you try to use it for memory for example.