To be honest it all seems pretty normal, with no clear memory spikes happening. Swap does seem to gradually decrease over the day, with a strange drop around 7pm, but then at 7am (when google drive snapshop backup addon runs) it spikes beyond the level. I’m guessing theres a tiny memory leak when the addon runs, which causes a higher swap use than the decrease that happens during the day:
I’m going to disable the autobackup today, and check the results tomorrow morning.
I’ve been fighting this very same issue now on 100.2 since it was released. I’ve been disabling things slowly with each passing day trying to figure out the cause. So far, no luck. I’m worried that Hass may just no longer be viable on a Pi 3b anymore. It might need a Pi 4 with 2-4GB of RAM minimum or a NUC.
Having limited knowledge in this, I am sorry that I have not been able to provide the proof you speak of. However I believe I have done everything recommended to me in this thread. If you can please elaborate on how to use gdb to debug this issue, then I will gladly oblige.
It gets complicated if you wanted to use something like valgrind which will flag memory leaks quickly, because as far as the underlying python interrupter is concerned the memory will not be leaked and so valgrind will not report it, unless the python interrupter itself is leaking. Besides you need to recompile Python with debug code enabled.
I just realized that swap is increasing by itself day by day and Hassio restarts makes it worse.
I did the host reboot (“Hass.io>Host system>Reboot”) and after it swap went from 89% to 29%!
I think that there is something to do about it in the next Hassio update.
I tried to clear swap file but unfortunately “swap off -a” command (as root) is not permitted in Hassio.
So, now as I can see only reboot of a host is solution to clear a swap.
I created an automation to reboot the host when swap usage is more than 80%.
My automation is following (in automations.yaml):
Nothing should be denied to root, if you are are truely root and not root in a chroot or docker. Either that or one of the system virtual partitions is not mounted, like /proc, /sys et al.
I have exactly the same issue since 100.x. Increasing swap day by day.
Did somebody opens an issue on GitHub? No matter - I already did.
PS: does anybody of you have speedtest running.
I aleady stratet the service as I had some DSL issues. The swap increased be 0.4%
Speedtest is running every 2 Hours. Perhaps it is one of the possible memory leakers
Hi, did you figure out how to fix this ? I’ve been dealing with a constant increase of the memory use on a my NUC (Hassio under proxmox) for a few months, and just realized today that the time the memory use increases significantly (2gig to 4gig, 5gig to 6gig…) is at the time I run an automation which does a full snapshot of my config…
Sounds like relatively connected to your problem, that’s why I’m asking…
Well I’ve sinced moved over to hassio running in Docker on an old laptop, and as @nickrout is suggesting the issue never grinds to a halt or gets OOM errors so I kind of forgot about it.
I don’t use the Google Drive backup solution but was noticing significant resource use while creating my snapshots (to the point of generating timeout warnings). The memory and swap was released after snapshot creation had finished though. maybe it is the google upload portion of the code at fault?
I optimised my snapshots considerably and can no longer even notice the resource use. It may be of some use to you.
Well, first night without snapshot automation, and memory use stayed totally stable, no increase at all - there is definitely an increase of memory use which won’t decrease caused by the snapshot service…
@nickrout I think it’s a problem because the only way for the memory use to decrease / to get back to a normal use (around 2gig / 25%) is to reboot my proxmox VM, otherwise the memory use keep increasing until being to 100% of use. I am not an expert and have none OOM errors, but I can understand that is definitely not a normal behavior.
This is how linux works. It caches stuff in RAM and discards stuff when it needs more memory. If something uses a lot of RAM there is no need to release that RAM until something else needs it.
The precipitous drop occurred 4/21 at 3:30 PM, the last time I restarted the host system HA is running on. As you can see that dropped swap usage from nearly 100% to 0% for quite a while before climbing a little bit from what I believe is just regular usage.
The three big spikes upward where swap memory is claimed and then never released all occurred when a snapshot was taken. I use the Hass.io Google Drive Backup add-on to schedule a backup to occur nightly. I checked the timestamps of the snapshot file and each of those steep bumps corresponds exactly to when the snapshot was created.
The third and final bump occurs at a different time then usual because I took it manually this morning. I explicitly stopped the Google Drive Backup addon and went to the normal snapshot page to manually take one. I wanted to see if this swap issue was due to how the addon took snapshots and uploaded them to google drive or a problem with the native snapshot process. As you can see, the behavior is the same with the native snapshot process - big bump in swap usage that’s never going to be reclaimed until I restart the host.
I don’t really know what to make of this. It seems like a bug in the snapshot process, perhaps its a memory leak? Some others have raised the totally fair question of “what problems does this actually cause?” I don’t really know is the answer. I only recently started monitoring swap usage because I noticed glances was yelling at me about its high usage. I had been running near 100% most of last week which would seem to suggest its not actually an issue.
Then again I had also noticed some flaky then usual behavior out of HA last week such as occasional network drops and lost connections to my lights. This week things have seemed more stable. Are these two things related? Again, I can’t really tell, causation does not equal correlation.
This looks enough like a memory leak it might be worth submitting a bug just for that alone as those are always bad news. What do others think?