Hassio swap use keeps on climbing despite no memory spikes

Ok so here’s the results of monitoring the system at the time of the spikes:

And here’s the results of top 10 minutes later:

Finally here’s the results of some of those commands:

To be honest it all seems pretty normal, with no clear memory spikes happening. Swap does seem to gradually decrease over the day, with a strange drop around 7pm, but then at 7am (when google drive snapshop backup addon runs) it spikes beyond the level. I’m guessing theres a tiny memory leak when the addon runs, which causes a higher swap use than the decrease that happens during the day:
swap%20graph%2024hrs

I’m going to disable the autobackup today, and check the results tomorrow morning.

I’ve been fighting this very same issue now on 100.2 since it was released. I’ve been disabling things slowly with each passing day trying to figure out the cause. So far, no luck. I’m worried that Hass may just no longer be viable on a Pi 3b anymore. It might need a Pi 4 with 2-4GB of RAM minimum or a NUC.

If there truly is a memory leak, then just getting more memory is not a solution. It will still run out.

Until someone actually proves memory leak with proper debugging tools (eg gdb) then nothing can be done.

I had a similar problem on a pi 3b. Swap file usage rising to 100%. Upgraded to pi4 with 4gb ram. System now very stable with 0% swap file use.

1 Like

Having limited knowledge in this, I am sorry that I have not been able to provide the proof you speak of. However I believe I have done everything recommended to me in this thread. If you can please elaborate on how to use gdb to debug this issue, then I will gladly oblige.

First we need to verify it’s HA that’s doing it.

Does stopping/killing HA remove the swap allocation? If so, that would point the finger fairly well. If not, it is likely elsewhere.

This might help see what is using swap space:

Then there is this.
http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/

It gets complicated if you wanted to use something like valgrind which will flag memory leaks quickly, because as far as the underlying python interrupter is concerned the memory will not be leaked and so valgrind will not report it, unless the python interrupter itself is leaking. Besides you need to recompile Python with debug code enabled.

I tried “swapoff -a && swapon -a” command as root but unfortunately “swap off” command is not permitted in Hassio.

I just realized that swap is increasing by itself day by day and Hassio restarts makes it worse.
I did the host reboot (“Hass.io>Host system>Reboot”) and after it swap went from 89% to 29%!

I think that there is something to do about it in the next Hassio update.

I tried to clear swap file but unfortunately “swap off -a” command (as root) is not permitted in Hassio.

So, now as I can see only reboot of a host is solution to clear a swap.
I created an automation to reboot the host when swap usage is more than 80%.
My automation is following (in automations.yaml):

  • id: ‘xxxxxxxxxxxxxx’
    alias: SWAP reboot
    description: ‘’
    trigger:
    • above: ‘80’
      entity_id: sensor.coreos_swap_used_percent
      platform: numeric_state
      condition: []
      action:
    • data:
      message: SWAP more than 80%
      service: notify.ios_iphone_jm
    • service: hassio.host_reboot

To run this automation you have to have Glances running to use numeric state from entity_id: sensor.coreos_swap_used.

1 Like

Nothing should be denied to root, if you are are truely root and not root in a chroot or docker. Either that or one of the system virtual partitions is not mounted, like /proc, /sys et al.

What error does it give you?

I have exactly the same issue since 100.x. Increasing swap day by day.
Did somebody opens an issue on GitHub? No matter - I already did.

PS: does anybody of you have speedtest running.
I aleady stratet the service as I had some DSL issues. The swap increased be 0.4%
Speedtest is running every 2 Hours. Perhaps it is one of the possible memory leakers

i’ve the same problem! But my system hangs every ~2h. Running on rpi3b using deconz.

EDIT:
I think the Deconz addon is the problem! Look at my thread for more informations.

Did any of you get this sorted ? I have the same issue, every few weeks the swap file hits 100%, no obvious culprit. i’m using a NUC with Ubuntu.

Hi, did you figure out how to fix this ? I’ve been dealing with a constant increase of the memory use on a my NUC (Hassio under proxmox) for a few months, and just realized today that the time the memory use increases significantly (2gig to 4gig, 5gig to 6gig…) is at the time I run an automation which does a full snapshot of my config…

Sounds like relatively connected to your problem, that’s why I’m asking…

Why do you think that is a problem? Is your system grinding to a halt? Getting OOM errors?

Well I’ve sinced moved over to hassio running in Docker on an old laptop, and as @nickrout is suggesting the issue never grinds to a halt or gets OOM errors so I kind of forgot about it.

However to answer your question, yes, I do think this is directly related to Snapshot creation. Even after moving to a different Google Drive auto backup tool (GitHub - sabeechen/hassio-google-drive-backup: Automatically create and sync Home Assistant backups into Google Drive) the swap use does climb around the same time as a snapshot is being created, so I’m guessing its by design:


swap graph 24hrs 08.04.20

I don’t use the Google Drive backup solution but was noticing significant resource use while creating my snapshots (to the point of generating timeout warnings). The memory and swap was released after snapshot creation had finished though. maybe it is the google upload portion of the code at fault?

I optimised my snapshots considerably and can no longer even notice the resource use. It may be of some use to you.

Well, first night without snapshot automation, and memory use stayed totally stable, no increase at all - there is definitely an increase of memory use which won’t decrease caused by the snapshot service…

@nickrout I think it’s a problem because the only way for the memory use to decrease / to get back to a normal use (around 2gig / 25%) is to reboot my proxmox VM, otherwise the memory use keep increasing until being to 100% of use. I am not an expert and have none OOM errors, but I can understand that is definitely not a normal behavior.

This is how linux works. It caches stuff in RAM and discards stuff when it needs more memory. If something uses a lot of RAM there is no need to release that RAM until something else needs it.

Wanted to add to and bump this thread. Here’s a graph of my swap memory usage over the past 48 hours.

The precipitous drop occurred 4/21 at 3:30 PM, the last time I restarted the host system HA is running on. As you can see that dropped swap usage from nearly 100% to 0% for quite a while before climbing a little bit from what I believe is just regular usage.

The three big spikes upward where swap memory is claimed and then never released all occurred when a snapshot was taken. I use the Hass.io Google Drive Backup add-on to schedule a backup to occur nightly. I checked the timestamps of the snapshot file and each of those steep bumps corresponds exactly to when the snapshot was created.

The third and final bump occurs at a different time then usual because I took it manually this morning. I explicitly stopped the Google Drive Backup addon and went to the normal snapshot page to manually take one. I wanted to see if this swap issue was due to how the addon took snapshots and uploaded them to google drive or a problem with the native snapshot process. As you can see, the behavior is the same with the native snapshot process - big bump in swap usage that’s never going to be reclaimed until I restart the host.

I don’t really know what to make of this. It seems like a bug in the snapshot process, perhaps its a memory leak? Some others have raised the totally fair question of “what problems does this actually cause?” I don’t really know is the answer. I only recently started monitoring swap usage because I noticed glances was yelling at me about its high usage. I had been running near 100% most of last week which would seem to suggest its not actually an issue.

Then again I had also noticed some flaky then usual behavior out of HA last week such as occasional network drops and lost connections to my lights. This week things have seemed more stable. Are these two things related? Again, I can’t really tell, causation does not equal correlation.

This looks enough like a memory leak it might be worth submitting a bug just for that alone as those are always bad news. What do others think?

Did you not read nick’s post above yours?