Hassio swap use keeps on climbing despite no memory spikes

theCheek · October 7, 2019, 4:27am

Ok so here’s the results of monitoring the system at the time of the spikes:

And here’s the results of top 10 minutes later:

Finally here’s the results of some of those commands:

To be honest it all seems pretty normal, with no clear memory spikes happening. Swap does seem to gradually decrease over the day, with a strange drop around 7pm, but then at 7am (when google drive snapshop backup addon runs) it spikes beyond the level. I’m guessing theres a tiny memory leak when the addon runs, which causes a higher swap use than the decrease that happens during the day:
swap%20graph%2024hrs

I’m going to disable the autobackup today, and check the results tomorrow morning.

aLTeReGo · October 23, 2019, 11:40pm

I’ve been fighting this very same issue now on 100.2 since it was released. I’ve been disabling things slowly with each passing day trying to figure out the cause. So far, no luck. I’m worried that Hass may just no longer be viable on a Pi 3b anymore. It might need a Pi 4 with 2-4GB of RAM minimum or a NUC.

nickrout · October 24, 2019, 5:43am

If there truly is a memory leak, then just getting more memory is not a solution. It will still run out.

Until someone actually proves memory leak with proper debugging tools (eg gdb) then nothing can be done.

glyn · October 24, 2019, 6:42am

I had a similar problem on a pi 3b. Swap file usage rising to 100%. Upgraded to pi4 with 4gb ram. System now very stable with 0% swap file use.

theCheek · October 24, 2019, 7:54am

Having limited knowledge in this, I am sorry that I have not been able to provide the proof you speak of. However I believe I have done everything recommended to me in this thread. If you can please elaborate on how to use gdb to debug this issue, then I will gladly oblige.

paulcam · October 24, 2019, 9:08am

First we need to verify it’s HA that’s doing it.

Does stopping/killing HA remove the swap allocation? If so, that would point the finger fairly well. If not, it is likely elsewhere.

This might help see what is using swap space:

Then there is this.
http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/

It gets complicated if you wanted to use something like valgrind which will flag memory leaks quickly, because as far as the underlying python interrupter is concerned the memory will not be leaked and so valgrind will not report it, unless the python interrupter itself is leaking. Besides you need to recompile Python with debug code enabled.

jarekmor · November 20, 2019, 9:50am

I tried “swapoff -a && swapon -a” command as root but unfortunately “swap off” command is not permitted in Hassio.

jarekmor · November 20, 2019, 9:58am

I just realized that swap is increasing by itself day by day and Hassio restarts makes it worse.
I did the host reboot (“Hass.io>Host system>Reboot”) and after it swap went from 89% to 29%!

I think that there is something to do about it in the next Hassio update.

I tried to clear swap file but unfortunately “swap off -a” command (as root) is not permitted in Hassio.

So, now as I can see only reboot of a host is solution to clear a swap.
I created an automation to reboot the host when swap usage is more than 80%.
My automation is following (in automations.yaml):

id: ‘xxxxxxxxxxxxxx’
alias: SWAP reboot
description: ‘’
trigger:
- above: ‘80’
  entity_id: sensor.coreos_swap_used_percent
  platform: numeric_state
  condition: []
  action:
- data:
  message: SWAP more than 80%
  service: notify.ios_iphone_jm
- service: hassio.host_reboot

To run this automation you have to have Glances running to use numeric state from entity_id: sensor.coreos_swap_used.

paulcam · November 24, 2019, 9:29am

Nothing should be denied to root, if you are are truely root and not root in a chroot or docker. Either that or one of the system virtual partitions is not mounted, like /proc, /sys et al.

What error does it give you?

Pirol62 · December 18, 2019, 2:27pm

I have exactly the same issue since 100.x. Increasing swap day by day.
Did somebody opens an issue on GitHub? No matter - I already did.

PS: does anybody of you have speedtest running.
I aleady stratet the service as I had some DSL issues. The swap increased be 0.4%
Speedtest is running every 2 Hours. Perhaps it is one of the possible memory leakers

ottelo · December 22, 2019, 4:24pm

i’ve the same problem! But my system hangs every ~2h. Running on rpi3b using deconz.

EDIT:
I think the Deconz addon is the problem! Look at my thread for more informations.

github.com/home-assistant/core

Massive Memory Leak when running on Intel NUC

opened 12:15PM - 09 Dec 19 UTC

closed 09:58PM - 22 Dec 19 UTC

JokerVP

**Home Assistant release with the issue:** 0.102.0 **Last working Home Ass…istant release (if known):** 0.101.3 **Operating environment (Hass.io/Docker/Windows/etc.):** Hassi.io in Docker on an Intel NUC with Ubuntu 18.04.3 LTS Intel(R) Core(TM) i3-8109U CPU @ 3.00GHz 8GB RAM **Integration:** All the custom components have been removed. The issue still exists without custom components **Description of problem:** After a restart of Home-Assistant, everything works fine for about an hour. The RAM consumption of the Home-Assistant docker image is constant. After that, the consumption increases exponentially in steps. After another one or two hours, all the memory of the system is consumed up. Since the Linux tries to do memory swapping, the HDD LED is always on, CPU load is at 100%, and the system gets completely unresponsive. After about an hour, the Home-Assistant restarts and the system is stable again. The whole issue repeats. I was able to take the following picture of the RAM consumption of the Home-Assistant container by using Portainer: <img width="575" alt="Memory 0 102 3 (no custom)" src="https://user-images.githubusercontent.com/34209217/70434398-63c3c100-1a84-11ea-9190-ca0931882448.png"> While the system is doing the memory swapping, I was able to connect to the host system over SSH with a lot of patience (it is extremely slow). The "top" command showed the following CPU and memory usage: ``` top - 09:51:56 up 14:55, 2 users, load average: 116.85, 135.05, 88.71 Tasks: 378 total, 2 running, 325 sleeping, 0 stopped, 0 zombie %Cpu(s): 24.7 us, 31.0 sy, 0.0 ni, 0.0 id, 43.8 wa, 0.0 hi, 0.5 si, 0.0 st KiB Mem : 8035168 total, 113200 free, 7783184 used, 138784 buff/cache KiB Swap: 2097148 total, 0 free, 2097148 used. 25556 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9516 root 20 0 1332188 8892 0 S 100.6 0.1 27:35.67 grafana-server 59 root 20 0 0 0 0 R 99.7 0.0 274:51.01 kswapd0 1715 root 20 0 2386572 18528 0 S 8.0 0.2 23:34.93 dockerd 16466 root 20 0 8252480 6.866g 0 S 2.7 89.6 16:36.29 python3 6437 root 20 0 385560 3264 156 D 1.2 0.0 0:23.55 php7.2 3425 root 20 0 118424 8924 0 S 0.9 0.1 9:56.59 python3 26847 root 20 0 449932 5320 2176 D 0.9 0.1 8:38.62 Xorg 6465 root 20 0 109104 768 0 S 0.6 0.0 1:54.53 containerd-shim ``` The PID 16466 belongs to the main Home-Assistant container. **Additional information:** I have tried it with several versions. The issue is introduced with version 0.102.0. Updating to 0.102.3 does not work. The problem still exists in the 0.103.0b0. I am not a software engineer. Therefore, any help on further narrowing the issue down is highly appreciated.

mathewtaylor2007 · February 19, 2020, 10:34pm

Did any of you get this sorted ? I have the same issue, every few weeks the swap file hits 100%, no obvious culprit. i’m using a NUC with Ubuntu.

woodmoose · April 7, 2020, 11:52am

Hi, did you figure out how to fix this ? I’ve been dealing with a constant increase of the memory use on a my NUC (Hassio under proxmox) for a few months, and just realized today that the time the memory use increases significantly (2gig to 4gig, 5gig to 6gig…) is at the time I run an automation which does a full snapshot of my config…

Sounds like relatively connected to your problem, that’s why I’m asking…

nickrout · April 7, 2020, 8:41pm

Why do you think that is a problem? Is your system grinding to a halt? Getting OOM errors?

theCheek · April 8, 2020, 6:45am

Well I’ve sinced moved over to hassio running in Docker on an old laptop, and as @nickrout is suggesting the issue never grinds to a halt or gets OOM errors so I kind of forgot about it.

However to answer your question, yes, I do think this is directly related to Snapshot creation. Even after moving to a different Google Drive auto backup tool (GitHub - sabeechen/hassio-google-drive-backup: Automatically create and sync Home Assistant backups into Google Drive) the swap use does climb around the same time as a snapshot is being created, so I’m guessing its by design:

tom_l · April 8, 2020, 7:07am

I don’t use the Google Drive backup solution but was noticing significant resource use while creating my snapshots (to the point of generating timeout warnings). The memory and swap was released after snapshot creation had finished though. maybe it is the google upload portion of the code at fault?

I optimised my snapshots considerably and can no longer even notice the resource use. It may be of some use to you.

woodmoose · April 8, 2020, 8:29am

Well, first night without snapshot automation, and memory use stayed totally stable, no increase at all - there is definitely an increase of memory use which won’t decrease caused by the snapshot service…

@nickrout I think it’s a problem because the only way for the memory use to decrease / to get back to a normal use (around 2gig / 25%) is to reboot my proxmox VM, otherwise the memory use keep increasing until being to 100% of use. I am not an expert and have none OOM errors, but I can understand that is definitely not a normal behavior.

nickrout · April 8, 2020, 8:32am

This is how linux works. It caches stuff in RAM and discards stuff when it needs more memory. If something uses a lot of RAM there is no need to release that RAM until something else needs it.

CentralCommand · April 23, 2020, 2:09pm

Wanted to add to and bump this thread. Here’s a graph of my swap memory usage over the past 48 hours.

The precipitous drop occurred 4/21 at 3:30 PM, the last time I restarted the host system HA is running on. As you can see that dropped swap usage from nearly 100% to 0% for quite a while before climbing a little bit from what I believe is just regular usage.

The three big spikes upward where swap memory is claimed and then never released all occurred when a snapshot was taken. I use the Hass.io Google Drive Backup add-on to schedule a backup to occur nightly. I checked the timestamps of the snapshot file and each of those steep bumps corresponds exactly to when the snapshot was created.

The third and final bump occurs at a different time then usual because I took it manually this morning. I explicitly stopped the Google Drive Backup addon and went to the normal snapshot page to manually take one. I wanted to see if this swap issue was due to how the addon took snapshots and uploaded them to google drive or a problem with the native snapshot process. As you can see, the behavior is the same with the native snapshot process - big bump in swap usage that’s never going to be reclaimed until I restart the host.

I don’t really know what to make of this. It seems like a bug in the snapshot process, perhaps its a memory leak? Some others have raised the totally fair question of “what problems does this actually cause?” I don’t really know is the answer. I only recently started monitoring swap usage because I noticed glances was yelling at me about its high usage. I had been running near 100% most of last week which would seem to suggest its not actually an issue.

Then again I had also noticed some flaky then usual behavior out of HA last week such as occasional network drops and lost connections to my lights. This week things have seemed more stable. Are these two things related? Again, I can’t really tell, causation does not equal correlation.

This looks enough like a memory leak it might be worth submitting a bug just for that alone as those are always bad news. What do others think?

tom_l · April 23, 2020, 2:35pm

Did you not read nick’s post above yours?