Home Assistant - High Memory Usage

danielo515 · October 9, 2020, 6:01am

Oh, silly me. I just took for granted that connecting through the addon I was getting into the home assitant container. I just followed your instructions step by step. I’ll report back.
By the way, I find easier to find the process using ps than htop or top, there are not many processes on the container anyway:

bash-5.0# ps -ef
PID   USER     TIME  COMMAND
    1 root      0:00 s6-svscan -t0 /var/run/s6/services
   32 root      0:00 s6-supervise s6-fdholderd
  187 root      0:00 udevd --daemon
  219 root      0:00 s6-supervise home-assistant
  221 root      3h22 python3 -m homeassistant --config /config
  325 root      0:00 /bin/bash
  342 root      0:00 ps -ef

Here is the output, not sure if this errors could make the output to be wrong:

py-spy> Sampling process 100 times a second for 120 seconds. Press Control-C to exit.

py-spy> 1.00s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.10s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.42s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.37s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.46s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.33s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.28s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.26s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.11s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.28s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.12s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.18s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.24s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.21s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.30s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.13s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.00s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.00s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.00s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> 1.01s behind in sampling, results may be inaccurate. Try reducing the sampling rate
py-spy> Wrote flamegraph data to '/config/www/spy-0.116.0.svg'. Samples: 12000 Errors: 0

Here is the resulting svg:

According to the UI of the new release, they are not using as much memory…

So I guess it tries to cache all the memory it cans because, hey, it is there!

bdraco · October 9, 2020, 3:30pm

Can you provide a py-spy dump as well?

danielo515 · October 9, 2020, 4:16pm

Do I need to run another command or does the one I already ran produce the desired output?

By the way, the actual problem was not the switch, but a raspberrypi running my main hassio. Everytime I plug it on the main network, it starts to flood it and all the computers on my home network start to spin their CPUs up to the sky, including the raspberry pi producing it (I can see it on it’s glances for example). If I isolate the raspberry pi on a switch, I can connect to it normally and everything is fine. Not sure if having several home assistant instances can cause this, but I had two hassios running on different machines for months without problem

bdraco · October 9, 2020, 4:31pm

It sounds like you might have some type of broadcast traffic loop.

Can you run a py-spy while its happening?

danielo515 · October 9, 2020, 5:19pm

That is what I thought, but how can a single device produce this?
Can you provide the exact py-spy params that you suggest to run? Thanks

bdraco · October 9, 2020, 5:34pm

py-spy record --pid 208 --duration 60 --output www/snapshot.svg
py-spy dump --pid 208
py-spy top --pid 208. (Hit ctrl + c after 60 seconds and copy and paste)

Adjust the output location and pid

danielo515 · October 9, 2020, 7:02pm

Sorry, I just removed the IP from the raspberrypi and it is not happening anymore.
Do you think the memory issue can still be debugged? Or it is something normal?

bdraco · October 9, 2020, 7:32pm

If it’s not happening anymore there isn’t much to be done since we would need a recording when the issue is occurring

danielo515 · October 10, 2020, 7:23am

Yep, that is what I was thinking. However, it was not probably a problem with Hassio per se, because it was affecting the entire network. Even the router become unresponsive at certain point.
But what about the memory? People report that 2gb is enough to run home assistant, and I assigned 4 to the VM and it is (according to the Hypervisor, proxmox) using 3.5 Gb

bdraco · October 10, 2020, 2:13pm

2GiB is usually more than enough unless you have thousands of entities. It could be a case of https://www.linuxatemyram.com/

danielo515 · October 12, 2020, 5:44am

Yep, probably it is that, because I don’t have enough elements to overwhelm a rpi instance I don’t think it will eat the ram of a 4gb VM.
Thanks

tgermain · October 16, 2020, 6:45am

Hello,

I came across this topic because I also have a memory issue, see Help: My HA is restaring, why?.

As far as I can see, the host is killing HA because it consumes too much memory (at the end, it reaches 90%).

How can I know what is causing the issue ?

moto2000 · October 16, 2020, 6:51am

Did you take a look at the link in bdraco’s post above? I’m pretty certain that explains what you are seeing.

tgermain · October 16, 2020, 7:10am

Actually yes, I know high memory usage is not an issue by itself. But my memory usage keeps growing over time and at some point, linux is killing HA:

[91096.975793] Out of memory: Kill process 6177 (python3) score 475 or sacrifice child
[91096.985885] Killed process 6177 (python3) total-vm:3798692kB, anon-rss:481348kB, file-rss:0kB, shmem-rss:0kB

My question is maybe not as precise as It should: How can I know what integration/component/other is causing the issue ?

bdraco · October 16, 2020, 10:19am

Two potential options are to write some code to use objgraph to dump every object in memory every 30 seconds and see where the growth is, or remove integrations one by one until the issue goes away.

tgermain · October 17, 2020, 7:12am

well, yeah, was afraid it would be the response I have like 30 integrations (and of course some integration, like mqtt, are part of the core of my automation) and I need at least 12 hours to figure out if there is a memory leak.

Anyway, thanks for the response, I’m gonna start with the less important one !

AFARGAS · October 25, 2020, 7:03pm

Reducing add-ons helped me.

Trevor · November 1, 2020, 10:17am

Hi Brian
Did you notice a difference in behaviour after shifting to a supervised VM?

Kariharri · November 3, 2020, 7:42am

I’ve been struggling with high memory usage for some months now. I’m running supervised HA on RPi3b+ running Raspbian. I really can’t track down the root cause for increased memory usage but it started this summer. Everything worked fine for months before that…
I’ve tried numerous of things to resolve this issue but nothing seemed to help until this. I increased cache size to twice the size of the RAM available on RPi3b+. After this RAM and cache usage has been steady. I haven’t seen the constant increase in cache and RAM usage that used to be the case before increasing the cache size. I know this might be killing my SDcard but it’s better than daily forced reboots. Maybe I need to migrate my setup to SSD boot.

moto2000 · November 11, 2020, 5:11am

I would recommend SSD boot setup, it’s pretty easy to do nowadays and the speed difference is noticeable.