Possible memory-leak and CPU usage increasing over time - how to debug?

Hi,

I noticed an alarming increase in memory usage and also a high CPU-load. I’m running Supervisor 229, Home-Assistant 0.113.3 and no custom integrations (actually HACS is still installed but I already removed all custom integrations installed from HACS). Home-Assistant runs on an Intel NUC (intel i7 CPU) and “eats” about 40% of the available CPU power. Memory usage is about 500MB but increasing steadily. See these graphs from the last 7 days.


The dips in memory usage are from Home-Assistant restarts.

I use the following integrations:

  • Brother printer
  • iOS
  • Homekit Bridge
  • IKEA TRADFRI
  • iRobot
  • MQTT
  • Sonos
  • Ubiquiti UniFi
  • Z-Wave

I did a py-spy profiling for 5minutes, this is the result: https://app.box.com/s/dttogbrxax81fl176yr3l4tabzt05pke
Sorry, you need to download the file there as it is no longer clickable.

Any advice where I should start or how to do a CPU profiling? Home-Assistant still works and is very fast. It’s used daily and “productive”.

The Ikea integration was reported to be the issue here:

Sorry to interrupt, but what do you use to get/draw performance graf from your docker containers?

Thanks, I will definitely give this one a try. Tradfrii mostly worked but I planned to move these lights to a Zigbee2MQTT Gateway anyway.

I’ll temporarily remove this integration and report back in a few days.

The setup is a bit “involved” :slight_smile:.

I use cadvisor to collect all the data, prometheus as database and Grafana with the following dashboard to visualize: https://grafana.com/grafana/dashboards/893

It did help for a few hours. Up to about 8pm the CPU load was way lower than it was ever before. But at almost exactly 8pm it went through the roof.
Memory is also still increasing, but only slightly. This graph shows the last 24 hours.


At the same time the received network traffic went up too. So I wonder if it’s an external source. I’ll have to dig deeper.
IMHO removing Tradfrii did help for the first few hours so it was one step. But others are possibly needed too.

Give the py-spy application mentioned here a go and post the report in that thread. bdraco can use it to hunt down the issue:

Somehow I missed the thread about CPU-Usage in 0.113. Thanks for linking it. Just to make sure it is not related, I’ve now also removed HomeKit Bridge. Currently Home-Assistant uses about 25% of the i7 CPU – still a lot. But much less than before.
I’ll wait another day to see how it changes over the next hours.

Ok, looks like I’ve hit a brick wall here…

Just to make sure I removed HomeKit Bridge and moved back to 0.112.4. This is the result over 12h (I switched to 0.112.4 where the dip in the memory consumption is)

So about 10% CPU usage on a [email protected]. Way too much IMHO but better than this result on 0.114.0 for about 11 hours:

There aren’t many integrations left to blame… :slight_smile:

I also did run py-spy on 0.114.0 where the CPU usage was over 50%. I let it run for about 10 minutes, this is the resulting file. Sonos and mDNS seem to be involved but I can’t really interpret the results.

Any more ideas? I start to think about building a completely new environment but I do have hundreds of entities and many dozens of automations. So this is a big project :frowning:.

Please post the py-spy result in the other topic so that bdraco can interpret it.

Ok, thanks @tom_l. Let us close this topic and I’ll add my findings in the other one.

I’ve now removed most of the integrations. Unify was producing about 1mbps of steady network traffic. Now also removed. But still the high CPU usage.

1 Like