Hassio swap use keeps on climbing despite no memory spikes

paulcam · September 30, 2019, 8:08am

Not shown on your graphs is the buffers and cache memory. It is possible that for performance some large file is being memory mapped and large portions of it accessed. This will allocate memory for buffers and cache. If you have unused stuff in memory the OS will swap it to disk to make more room for currently active stuff, even if that is just cache/buffers.

The only way to know for sure is to log into the box and run “top” or “htop” and then hit SHIFT+M to sort by memory, placing the most intense users at the top.

IIRC the Pi 3b+ has 1Gb of RAM. So swapping when running large applications with 2Gb database is probably normal. We all love “the little PI that could”, but the reality is, they really can’t in a lot of cases.

paulcam · September 30, 2019, 8:41am

I just checked mine:
22544 root 20 0 352.5m 189.8m 24.6m S 1.6 1.2 165:44.75 /usr/local/bin/python3 /usr/src/ homeassistant/homeassistant/main.py --config /config

The 352.5m (in VIRT column) is saying that HA has requested 352Mb of RAM. Of which 189.8Mb (RES) is actually allocated because it has been used. (Linux uses a sparse memory model). Finding out what files it has mapped is a little more tricky and when I checked in /proc I did not see anything but python libraries. Was expecting to find the SQLite DB, but didn’t, maybe I’m looking in the wrong place.

Also of interest is that it’s running from /usr/src which is terribly bad practice and is running as root, this might be a side-effect of how the docker image was setup. It might not be inherent to HA’s install. I’ll have to check.

theCheek · September 30, 2019, 9:30am

Unfortunately I don’t see anything relevant to me in top, and htop isn’t present. This is probably as I’m running hass.io so everything is locked down. Here what I get:
top%20on%20hassio

paulcam · September 30, 2019, 9:33am

Interesting. It’s probably “the other” ssh you need. Is it port 2222? I’m not familiar with HassIO.

theCheek · September 30, 2019, 9:36am

I’m not aware of any ‘other’ ssh. I’ve only managed to access via SSH with the Hass.io addon “SSH Server” which defaults to the usual port 22. What is this ‘other’ you speak of (now I feel like I’m in an episode of lost!).

paulcam · September 30, 2019, 9:37am

I think this is relevant.

theCheek · September 30, 2019, 9:40am

ooh nice find! Ok, I’ll have to wait till I get home as I clearly need physical access to get this working. Will report back asap. Thanks for your help so far

theCheek · October 1, 2019, 3:21pm

So I think I may have found the issue, but it may also just be anecdotal. Your comments about the memory size of the Pi made me think about how it must be managing the homeassistant_v2.db size being 1.9gb. the fact that HA took forever to load the history looked like a symptom, so just went ahead and deleted the DB then put much stricter controls on the recorder component (5 days, disabled the motion sensors).

24 hours later, and swap has been constant around 12%. Will give it a bit more time but I think the DB was definitely a part of it.

theCheek · October 2, 2019, 4:30am

Ok I spoke too soon. This morning swap use had spiked at 7am to 50%.

But I think I found something. I run the Auto backup to Google Drive add-on. It runs at 4am, taking a snapshot of my configuration then uploads it to Google drive. The upload seems to finish around 7am. Coincidence? I’m thinking that as the snapshot size is around 300 mb, maybe the add-on loads it into memory while uploading it to Google drive, forcing the Pi to utilise swap?

nickrout · October 2, 2019, 4:41am

Does memory/swap use go back down when the backup finishes running?

theCheek · October 2, 2019, 4:42am

No it doesn’t seem to.

paulcam · October 2, 2019, 7:42am

Once swapped memory will remain swapped until something tries to access the pages or the pages are freed. If it was the backup/upload using cache memory, the symptoms would show the cache memory drop significantly after the process exits. That memory would then be free for stuff to be unswapped if it is needed, but would not immediately cause that to happen.

Using lots of swap is not, in itself an issue. There are often many Mb of pages of RAM that aren’t accessed very often or even again and moving them out of physical RAM can improve performance.

However your original post mentioned that it will continue to rise over some days until things start to get very slow. This is a symptom of memory leakage. It’s not an uncommon bug, a process allocates memory but does not free it when done, often throwing the reference to it away and leaving it allocated forever until that process is exited. This is not only an issue for languages like C/C++ with manual memory management, but exists in managed applications like Python or Java too.

If you restart services (without rebooting) does the swap usage drop? I expect you would need access to the actual host to do so.

Still best way to find out is to run “top” on the actual underlying host to see what is using most of the memory.

theCheek · October 2, 2019, 7:18pm

I hear you mate, but I seem to be having an issue getting it working with the public key method as described in your link. No doubt I’m doing something wrong. I used putty to create a public key, copied it into an authorized_keys file with ansi encoding and Unix line breaks, put it into the root of a fat USB stick, imported it from Hassio system, but no joy.

theCheek · October 3, 2019, 4:23am

Managed to figure it out. I knew I was just being a muppet (forgot to change the USB stick’s name to CONFIG).

Here’s my top, sorted by memory as advised:

theCheek · October 4, 2019, 9:34am

Unfortunately the top results above doesn’t really tell me much. Memory is well within the capabilities of the Pi, so can’t really figure out where the memory leak is. Tomorrow morning I’ll run it when the Swap seems to increase the most dramatically. Unless you have any other suggestions @paulcam ?

paulcam · October 4, 2019, 10:06am

Sorry, I was about to reply with that suggestion, but didn’t hit send and then forgot… here is my reply

Interesting. Doesn’t look all that bad, asides the swap. Load factor is a little high, but the PI is a dual? core, so 2.0 would be fully loaded. Might be interesting to capture it when the swap is spiking.

It looks like you have enough memory to do a forced swap purge. If you want to try that.
As root:
swapoff -a
swapon -a

It might take a while and if you don’t have enough RAM it will error out of terminate some processes.

paulcam · October 4, 2019, 1:27pm

This link might help you diagnose as well:
https://www.tecmint.com/commands-to-monitor-swap-space-usage-in-linux/

theCheek · October 7, 2019, 4:27am

Ok so here’s the results of monitoring the system at the time of the spikes:

And here’s the results of top 10 minutes later:

Finally here’s the results of some of those commands:

To be honest it all seems pretty normal, with no clear memory spikes happening. Swap does seem to gradually decrease over the day, with a strange drop around 7pm, but then at 7am (when google drive snapshop backup addon runs) it spikes beyond the level. I’m guessing theres a tiny memory leak when the addon runs, which causes a higher swap use than the decrease that happens during the day:
swap%20graph%2024hrs

I’m going to disable the autobackup today, and check the results tomorrow morning.

aLTeReGo · October 23, 2019, 11:40pm

I’ve been fighting this very same issue now on 100.2 since it was released. I’ve been disabling things slowly with each passing day trying to figure out the cause. So far, no luck. I’m worried that Hass may just no longer be viable on a Pi 3b anymore. It might need a Pi 4 with 2-4GB of RAM minimum or a NUC.

nickrout · October 24, 2019, 5:43am

If there truly is a memory leak, then just getting more memory is not a solution. It will still run out.

Until someone actually proves memory leak with proper debugging tools (eg gdb) then nothing can be done.