Out of memory in 2 days

Dear all,
Please help me out on figure the reason why I have an increasing RAM usage on my Esxi VM running HAOS.
Look at the attached picture and it’s ramp.
I have some errors in the log I cannot translate but I seriously don’t understand why this RAM behaviour.
Kindly share your inputs in helping me, this causes a system mandatory reboot every couple of days, if I don’t do it, my HA freezes or does not load historian data etc…
Thank you so much.


2024.5+: Tracking down instability issues caused by integrations.

1 Like

The hard way
Start bare bones
Monitor
Install HA
Monitor
Install 1 addon or intergration
Monitor
Repeat

A less hard way is to disable one addon or intergration and monitor

It is hardly to believe that HA itself got a memory leak because that would effect loads of users.

If you run HAOS or with addons, go to the supervisor in devices and enable the memory and cpu sensors for all addons. Then you can add addon usage to the graph. It might be an addon misbehaving.

The template error is most likely not it. Template errors cannot be fixed from screenshots of errors. Post the template code if you want us to look at it. But likely an entity is unavailable and you did not specify a default value for when that happens. So either add an availability template or use a default, e.g. use float(0) instead of float in all places.

Cool, should I stick on “ Tracking down a memory leak of python objects” chapter?

Thanks for your hint, I’ve installed glances, and looking to the memory allocation per each process Homeassistant is the only one constantly growing in memory allocation. Is not an addon the guilty, since the resources allocated per addon are almost the same constantly.

Start home assistant in safe mode:

This disables all 3rd party integrations.

If you do not see any memory leak then you know it is one of your custom integrations. If you do continue to see memory issues then it is either a core integration or you have a loop in one of your automations or scripts that never exits.

You can exit safe mode by restarting normally.

Are you saying the system reboots itself - in that case there may be something in the logs. If you’re manually rebooting, what happens if you don’t? There’s nothing wrong per se with using most of the memory because unused memory is wasted memory.

Do you have any addons installed? Some can cause memory hog, Studio code server or Firefox for example.
There are hidden (disabled) sensors for memory consumption for each addon, like “sensor.studio_code_server_ram_percentage” (name can be a bit different, since i translated it from Slovenian). Find those sensors, enable them for all addons and monitor if any of them raises. If you find one just make an automation which restarts that addon, say, each night at 00:00

    action: hassio.addon_restart
    data:
      addon: a0d7b954_vscode

Here you can see memory raising during the day and falling when adddon is restarted:
image
I’d say perhaps ram raises when i open addon but it doesn’t clean behind itself when i close it… but then again similar happened with firefox addon, but i didn’t open that addon for weeks… but it still raised each day.

1 Like

Hi dear, simply happens that the HA starts to lag, doesn’t show trends, historical data and such. sometimes even lights are not available anymore.
If I look at the console of Esxi, reports “out of memory” process killed…bla bla.
so I must restart manually to get all back in operation.

Do you have the visual studio code addon? I had something similar and it turned out to be the VS code addon for me. It’s a known issue,but not everyone experiences it.

1 Like

Really?! Let me give a trial.

I wrote that 3 days ago and asked you if you have vscode installed, but obviously you missed that…?

See here, also for a link to the github issue What is causing this memory leak?

didi the test, nothing changed.
the only repeating error I have is related to a sensor template I have somewhere.
this is exactly the problem, somewhere.
I activated logger with debug level for homeassistant.helpers but I didn’t get any hint on what sensor is causing the issue.
I made many attempts but it does not change. it ramps up like hell with no appearent reason why.

To determine if, say, studio code causes the probelm you must do following:

  • find and enable sensor.studio_code_server_memory_percentage (or similar name…) - sensor is disabled by default, so you must search among disabled sensors and enable it;
  • make a graph card with this sensor;
  • wait a couple of days and observe graph if it rises.

IF VS code is not the culprit do the same with all installed addons, perhaps you’ll find the one.

You won’t find any error in logs or similar about this.

thanks for the hint.
one question: I 95% believe there is a problem in some core templates or similar…because looking at glances, the only value rising is the memory of the homeassistant container…
I have a error in the log linked to a misconfigured template, I’m Ifraid that could be the guilty, but I cannot detect with certinty what template is faulty.

dear,
I believe 99% I have one of the automations or scripts looping.
what should I do, should I disable all automation first and see what of them is probably causing the loop?

No, act a bit more clever.
When you do not know where the fault is then cut your problem in half by disabling half of it. If your problem is gone then you know that it is in the disabled part. Just repeat this process untill only the problem one is left.

Updates: I discovered that this behavious is there since january 2023, but I didn’t care enough, clearly. I just got an explanation on the times I’ve rebooted because of some freezes without investigate.

now, I’ve cleaned up unused automations, I kept on the only relevant ones, I made housekeeping of custom components, scripts, removed all the error in the log linked to templating sensors…removed components causing the minimum error in the log, without success.

the patters is still there, is a neverending sawtooth.

don’t know what to do else, now i’m running without camera flows, as another test, disabling all the camera entities, but looks like without success.