CPU consumption keeps growing to max every 24h

htop consistently shows process python3 homeassistant is consuming 100% of available CPU, a few hours after device boots.

While CPU is at 100%, hass response to the user is sluggish to the point of being unusable.

CPU consumption remains at 100% until device is restarted.

On restart, hass performs great again, and CPU consumption resets back to a few percent.

CPU gradually increases and then jumps back up to 100% again after a few hours.

hass is running on a pi-3 device as Core inside the standard Docker container. Installed using the default instructions. Installed with the April 2023 release.

I’m keen to keep using the pi for other things in tandem with hass, so need to avoid the HAOS approach.

Any tips on what might be causing the problem?

This chart illustrates the problem. Red is total CPU consumed by all processess. Blue is homeassistant process. Device is reset at 05:00.

It’s likely you have a integration that is either leaking memory or tasks

You’ll need to use py-spy and the profiler integration to find it

As stated, you probably have a memory leak. I had this problem a while back, I think it was the NMAP tracker, but I can’t remember. As a safety measure, I created an automation to restart my raspberry pi if cpu usage ever went over 90% for thirty minutes and send me an alert that it was going to restart.
I also have alerts for if CPU usage stays over 60% for 30 minutes, and another if it stays over 80% for 30 minutes.

Thanks; I’ll gather some more info using these tools.

I’ve made some progress with the analysis tools.

Here’s visual output from the callgrind file via kcachegrinder.

It looks like good stuff, although I’m unsure what to do with it.

How can I use it to trace the leaking integration?

…and here’s the py-spy output:

Something is flooding your network with mdns traffic (or something is processing it over and over)

I suspect devolo_plc_api as the source

I’d open an issue here GitHub - 2Fake/devolo_plc_api: devolo PLC device API in Python

There was recently a change here Reduce zeroconf traffic by Shutgun · Pull Request #129 · 2Fake/devolo_plc_api · GitHub. I’m not sure if you have that version or not. Its not immediately obvious if the change caused it or fixed the issue.

Edit: It looks like that version of the lib is not in a production version yet Bump devolo_plc_api to 1.3.1 by Shutgun · Pull Request #93099 · home-assistant/core · GitHub so it might be fixed already but not released

– Don’t try this unless you know what it does – Here be dragons –

You could try installing that PR as a custom component to see if it fixes the issue

cd /config
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d devolo_home_network -p 93099

– Don’t try this unless you know what it does – Here be dragons –

This is great; thanks!

I’ll try disabling the Devolo integration. See if that helps.

(Devolo is the least important element of my setup.)

BTW - how could you tell it was mdns & probably Devolo? Would help grow my understanding of hass and its environment.

It was the name of the functions and the name of the library in the top output.

I maintain zeroconf so I was quite familiar with the naming :wink:

Good progress. I disabled Devolo plus a couple of default unused integratons (Plex, Radio Browser).

CPU consumption looking much better.

Gonna try a couple more tests next, try narrowing things down further.

Yep, it was the Devolo integration. With a subtlety.

CPU consumption grows over time IF:
Devolo integration enabled by auto-discovery AND
Several Devolo devices found AND
at least one Devolo device is then unplugged.

The key bit is unplugging a discovered device. This leads to steady CPU consumption.

Disabling the unplugged devices may help a little. Over a 15 minute test, the CPU seemed to improve slightly although did not properly recover in that time.

Disabling the unplugged devices and then restarting hass worked fine. CPU consumption back to normal.

Thanks again @bdraco - you were spot on.

Don’t forget to open an issue for the library if you already haven’t so others can benefit from the discovery

@bdraco I’m happy to raise an issue; just want it to be in the right place for good effect.

I had a quick look at the Devolo library ([GitHub - 2Fake/devolo_plc_api: devolo PLC device API in Python]). From the readme.md, it looks to be something standalone, which may have been integrated into hass (?)

If so, will there be some additional hass code somewhere else, which integrates devolo_plc_api into hass? And could it be this hass code which is causing the leak?

Its likely all in the library code so I would open the issue there.

Please let me know once you do and I’ll offer to assist finding it if the authors have any trouble locating the leak.

Done. See new issue here: https://github.com/2Fake/devolo_plc_api/issues/132