I’ll try to execute this during a memory leak event, because so far nothing it is standing out.
@bdraco, currently experiencing a memory leak.
Aside from the tracking down a memory leak of python objects, what can I do more to help troubleshooting this?
I did disable some templates, and it seemed to be more stable. But I ended up, reactivating them, having to restart HA due to upgrade, and another integration.
Update: It might be the renders after all. I’m with uptime of 18 hours, and these are the following renders (over 6k).
2024-10-25 19:02:56.141 CRITICAL (SyncWorker_1) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({% set barTH = [960, 965, 970] %}
{% set windTH = 40 %}
{% set rainrateRainy = states('input_number.rain_rate_rainy') %}
{% set rainratePouring = states('input_number.rain_rate_pouring') %}
{% set bar = states('sensor.home_weather_station_relative_pressure') | float %}
{% set rainrate = states('sensor.home_weather_station_rain_rate') | float %}
{% set temp = states('sensor.home_weather_station_outdoor_temperature') | float %}
{% set wind = states('sensor.home_weather_station_wind_speed') | float %}
{% set wsStatus = states('input_boolean.weather_stations_is_installed') %}
{% if bar < 0 or rainrate < 0 or temp < -40 or wsStatus == 'off' %}
{{ states('weather.ipma_weather_daily') }}
{% else %}
{% set state = "clear" %}
{% if rainrate == 0 %}
{% if bar < barTH[0] %}
{% set state = "cloudy" %}
{% elif bar < barTH[1] %}
{% set state = "partly-cloudy" %}
{% elif bar < barTH[2] %}
{% set state = "few" %}
{% elif wind > windTH %}
{% set state = "windy" %}
{% else %}
{% set state = "clear" %}
{% endif %}
{% else %}
{% if temp < 0 %}
{% set state = "snowy" %}
{% else %}
{% if rainrate < rainrateRainy %}
{% set state = "light-rain" %}
{% elif rainrate < rainratePouring %}
{% set state = "rainy" %}
{% elif rainrate > rainratePouring and wind > windTH %}
{% set state = "stormy" %}
{% else %}
{% set state = "pouring" %}
{% endif %}
{% endif %}
{% endif %}
{{ state }}
{% endif %}) renders=6162> all_states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'sensor.home_weather_station_wind_speed', 'input_number.rain_rate_pouring', 'input_boolean.weather_stations_is_installed', 'sensor.home_weather_station_outdoor_temperature', 'sensor.home_weather_station_rain_rate', 'sensor.home_weather_station_relative_pressure', 'input_number.rain_rate_rainy'}) rate_limit=None has_time=False exception=None is_static=False>
2024-10-25 19:02:56.141 CRITICAL (SyncWorker_1) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({% set barTH = [960, 965, 970] %}
{% set windTH = 40 %}
{% set rainrateRainy = states('input_number.rain_rate_rainy') %}
{% set rainratePouring = states('input_number.rain_rate_pouring') %}
{% set bar = states('sensor.home_weather_station_relative_pressure') | float %}
{% set rainrate = states('sensor.home_weather_station_rain_rate') | float %}
{% set temp = states('sensor.home_weather_station_outdoor_temperature') | float %}
{% set wind = states('sensor.home_weather_station_wind_speed') | float %}
{% set wsStatus = states('input_boolean.weather_stations_is_installed') %}
{% if bar < 0 or rainrate < 0 or temp < -40 or wsStatus == 'off' %}
{{ states('weather.ipma_weather_daily') }}
{% else %}
{% set state = "Clear" %}
{% if rainrate == 0 %}
{% if bar < barTH[0] %}
{% set state = "Cloudy" %}
{% elif bar < barTH[1] %}
{% set state = "Partly Cloudy" %}
{% elif bar < barTH[2] %}
{% set state = "Mostly Sunny" %}
{% elif wind > windTH %}
{% set state = "Windy" %}
{% else %}
{% set state = "Clear" %}
{% endif %}
{% else %}
{% if temp < 0 %}
{% set state = "Snowy" %}
{% else %}
{% if rainrate < rainrateRainy %}
{% set state = "Light Rain" %}
{% elif rainrate < rainratePouring %}
{% set state = "Rainy" %}
{% elif rainrate > rainratePouring and wind > windTH %}
{% set state = "Stormy" %}
{% else %}
{% set state = "Pouring" %}
{% endif %}
{% endif %}
{% endif %}
{{ state }}
{% endif %}) renders=6162> all_states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'sensor.home_weather_station_wind_speed', 'input_number.rain_rate_pouring', 'input_boolean.weather_stations_is_installed', 'sensor.home_weather_station_outdoor_temperature', 'sensor.home_weather_station_rain_rate', 'sensor.home_weather_station_relative_pressure', 'input_number.rain_rate_rainy'}) rate_limit=None has_time=False exception=None is_static=False>
2024-10-25 19:02:56.141 CRITICAL (SyncWorker_1) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({% set barTH = [960, 965, 970] %}
{% set windTH = 40 %}
{% set rainrateRainy = states('input_number.rain_rate_rainy') %}
{% set rainratePouring = states('input_number.rain_rate_pouring') %}
{% set bar = states('sensor.home_weather_station_relative_pressure') | float %}
{% set rainrate = states('sensor.home_weather_station_rain_rate') | float %}
{% set temp = states('sensor.home_weather_station_outdoor_temperature') | float %}
{% set wind = states('sensor.home_weather_station_wind_speed') | float %}
{% set wsStatus = states('input_boolean.weather_stations_is_installed') %}
{% if bar < 0 or rainrate < 0 or temp < -40 or wsStatus == 'off' %}
{{ states('weather.ipma_weather_daily') }}
{% else %}
{% set state = "☀️" %}
{% if rainrate == 0 %}
{% if bar < barTH[0] %}
{% set state = "☁️" %}
{% elif bar < barTH[1] %}
{% set state = "🌥️" %}
{% elif bar < barTH[2] %}
{% set state = "⛅" %}
{% elif wind > windTH %}
{% set state = "🌬️" %}
{% elif is_state('sun.sun', 'below_horizon') %}
{% set state = "🌙" %}
{% else %}
{% set state = "☀️" %}
{% endif %}
{% else %}
{% if temp < 0 %}
{% set state = "🌨️" %}
{% else %}
{% if rainrate < rainrateRainy %}
{% set state = "🌦️" %}
{% elif rainrate < rainratePouring %}
{% set state = "🌧️" %}
{% elif rainrate > rainratePouring and wind > windTH %}
{% set state = "⛈️" %}
{% else %}
{% set state = "⛈️" %}
{% endif %}
{% endif %}
{% endif %}
{{ state }}
{% endif %}) renders=6618> all_states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'sensor.home_weather_station_wind_speed', 'input_number.rain_rate_pouring', 'input_boolean.weather_stations_is_installed', 'sun.sun', 'sensor.home_weather_station_outdoor_temperature', 'input_number.rain_rate_rainy', 'sensor.home_weather_station_relative_pressure', 'sensor.home_weather_station_rain_rate'}) rate_limit=None has_time=False exception=None is_static=False>
2024-10-25 19:02:56.141 CRITICAL (SyncWorker_1) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({% if states('input_boolean.weather_stations_is_installed') == 'off' %}
{{ states('weather.ipma_weather_daily') }}
{% elif states('sensor.home_weather_station_rain_rate') | float > states('input_number.rain_rate_pouring') | float %}
pouring
{% elif states('sensor.home_weather_station_rain_rate') | float > states('input_number.rain_rate_rainy') | float %}
rainy
{% elif states('sensor.home_weather_station_rain_rate') | float > 0 %}
light-rain
{% elif states('sensor.home_weather_station_uv_index') | float > 3 and states('sensor.home_weather_station_solar_radiation') | float > 200 %}
sunny
{% elif states('sensor.home_weather_station_relative_pressure') | float < 1005 and states('sensor.home_weather_station_humidity') | float > 85 and is_state('sun.sun', 'below_horizon') %}
cloudy
{% elif states('sensor.home_weather_station_solar_radiation') | float > 10 and
states('sensor.home_weather_station_solar_radiation') | float <= 100 and
states('sensor.home_weather_station_uv_index') | float <= 3 %}
cloudy
{% elif states('sensor.home_weather_station_solar_radiation') | float > 100 and states('sensor.home_weather_station_uv_index') | float <= 3 %}
partly-cloudy
{% elif states('sensor.home_weather_station_wind_speed') | float > 40 and states('sensor.home_weather_station_rain_rate') | float > states('input_number.rain_rate_rainy') | float %}
stormy
{% elif states('sensor.home_weather_station_wind_speed') | float > 40 %}
windy
{% elif states('sensor.home_weather_station_uv_index') | float <= 1 and states('sensor.home_weather_station_solar_radiation') | float < 100 and is_state('sun.sun', 'below_horizon') %}
clear-night
{% else %}
{{ states('weather.ipma_weather_daily') }}
{% endif %}) renders=6294> all_states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'sensor.home_weather_station_wind_speed', 'input_number.rain_rate_pouring', 'input_boolean.weather_stations_is_installed', 'sun.sun', 'input_number.rain_rate_rainy', 'sensor.home_weather_station_rain_rate', 'sensor.home_weather_station_relative_pressure', 'sensor.home_weather_station_uv_index', 'sensor.home_weather_station_humidity', 'sensor.home_weather_station_solar_radiation'}) rate_limit=None has_time=False exception=None is_static=False>
2024-10-25 19:02:56.141 CRITICAL (SyncWorker_1) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({{ 'on' in area_entities('Kitchen') | select('match', 'light.*') | map('states')
or
'on' in area_entities('Living Room') | select("match", "light.*") | map('states')
or
'on' in area_entities('Dining Room') | select("match", "light.*") | map('states')
or
'on' in area_entities('Entrance') | reject("match", ".*econdary.*") | select("match", "light.*") | map('states')
}}) renders=38> all_states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'light.kitchen_light'}) rate_limit=None has_time=False exception=None is_static=False>
2024-10-25 19:02:56.141 CRITICAL (SyncWorker_1) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({{ 'on' in area_entities('Master Bedroom') | reject('match', '.*controller.*')| select('match', 'switch.*') |
map('states') | reject('in', ['unavailable', 'unknown', 'none'])
or
'on' in area_entities('Master Bedroom') | reject('match', '.*remote.*') | select('match',
'light.*') | map('states')| reject('in', ['unavailable', 'unknown', 'none']) }}) renders=30>
I’m considering moving these templates to an automation that runs every minute, do you folks that would be better? at least has less updates.
Do whatever you want to do, just understand that it will only reduce you to 1440 updates a day. Also, these templates won’t cause memory leaks (with or without your changes) because they aren’t creating runaway data. More renders will typically increase your CPU usage, not memory.
I’m also facing a severe memory leak, with the usage growing from 1.9GB to 3.3GB within 3 hours (I added an automation to restart the host, before it gets stuck on swap).
I ran the profiler for an hour, and these were the largest diffs (aggregated and sorted):
('coroutine', 1707249, 614890)
('Context', 580482, 210352)
('method', 586198, 206621)
('Task', 569158, 205032)
('FutureIter', 569132, 205029)
('builtin_function_or_method', 581303, 205002)
('Future', 569166, 204961)
('ReferenceType', 602029, 204960)
('TimerHandle', 6322, 4733)
('State', 5953, 1783)
('tuple', 155190, 1410)
('dict', 181399, 1135)
('ReadOnlyDict', 10589, 778)
('Element', 6412, 645)
('Handle', 326, 218)
('CaseInsensitiveDict', 465, 145)
('BLEDevice', 476, 77)
('BluetoothServiceInfoBleak', 472, 76)
('deque', 1295, 73)
('partial', 6488, 72)
('frame', 59, 59)
('AdvertisementData', 148, 57)
('traceback', 58, 44)
('DNSAddress', 66, 36)
('SaveUpdateState', 36, 36)
('memoryview', 47, 32)
('managedbuffer', 43, 32)
('InstanceState', 103, 31)
('TemplateState', 57, 30)
('States', 55, 27)
('WSMessage', 84, 25)
('TransportSocket', 101, 23)
('_SelectorSocketTransport', 89, 23)
('KeyedRef', 824, 22)
('SelectorKey', 121, 21)
('socket', 131, 20)
('StreamReader', 32, 20)
('HassJob', 4266, 19)
('lock', 599, 19)
('CIMultiDict', 255, 19)
('ResponseHandler', 64, 19)
('HttpResponseParser', 64, 19)
('MemoryBIO', 38, 18)
('TimerContext', 63, 17)
('ReceiveMessage', 82, 15)
('LogEntry', 25, 15)
('StateAttributes', 12, 12)
('Struct', 1021, 11)
('DenonAVRTelnetProtocol', 11, 11)
('frozenset', 6975, 10)
('DNSNsec', 42, 10)
('UrlMappingMatchInfo', 10, 10)
I’m not sure what to do with it, as nothing seems glaring.
Looks like something is creating a massive amount of tasks and futures. That probably means it’s also using a lot of cpu time which might make it easier to track down.
Try running the profiler.start service and looking at the call grind file in qcachegrind. You can likely trace it back to the integration creating the most tasks.
If you are stuck post the call grind here and I’ll dig through it
Thank you!
See the call grind file here.
IIUC, create_task
is almost entirely called by denonavr.api.DenonAVRTelnetApi._handle_disconnected
.
I’ll try disabling that integration.
It sure looks like there is no guard to prevent that from being called if a task is already create and not done()
. Did you open an issue for the library?
Yes, the library owner has already committed a fix (though no new release yet).
I’m wondering if there is a way for HA to be more defensive, by tracking memory usage per integration. It could be far easier to pin-point issues, and HA could also disable integrations in case of such anomalies.
I came here as hint because of my used memory increase starting from installation 2024.11.
Petro suggested to do something from the hints above to dig into the causing integration or whatever is the reason for this.
Unfortunately, and sorry for that, but I don’t get from which hints I get such more information. At least for the descriptions neither the debug mode (for system crashes, restarts, …) nor the memory leak analys (here is no increading leak, it is directly there from system start) fits.
But most probalby I missed or don’t yet understand the right one.
Can someone give me a small hint, which procedure to follor to get more info about the doubling (and more) of my HA Blue memory?
Use the tracking down a memory leak of python objects.
Then try this comment
That will at least give you logs that you can post here for further help
So, here we go
I do not understand how to read it to find, where the memory is allocated.
I started directly after a system reboot (device, not only core). There it was around 45-50% (as always in the past). I left the system alone with profiler on. At the end I clicked through all pages and areas in UI of HA to see changes. Most time it stayed around 55%. I ended with 80-90% (have a gut feeling when it jumped rapitely, but have to double-check) before I stopped the profile log.
Can you or one see from the logs, where all the additional memory is taken?
It’s kind of a pain in the ass to read, but each memory growth is a list of the following: name of growth, total current size, increase in growth. The data is sorted by what had the largest growth. So, we are simply looking at the first or second elements after the phrase Memory Growth that are large. The first one is when it starts, you can ignore that one. It tells you everything that’s loaded at start thought.
At 12:21:41, there’s a huge jump in a lot of data. Same thing at 14:43.11. I’m not seeing anything that stands out. Maybe Nick has some incite.
Nothing stands out in the data. Usually when there is an obvious leak each cycle will show more of the same object types over and over. 'set', 120957, 2
is a bit high but thats not specific cause for concern.
I’d do an profiler.start
next.
Sadly this one looks like one of the harder ones to track down. It may require figuring out what the specific event is that causes the memory use to increase so it can be replicated.
Currently my gut feeling is, that it comes perhaps from there and perhaps from an integration or UI.
I have now enabled all memory and cpu sensors and will track them im parallel.
Here we go. VSC. The addon takes now more than more memory, never releases it back, add even more on re-opening it. Only add-on restart brings the memory back. The whole system gets otherwise more and more unresponsive, most probably because of memory swapping, etc. Including automatic system-hard-reboots, …
Here a start of the VSC UI. 40%. And Core (blue) and Supervisor (green) got swapped to disk I think. Orange is total up to over 95% what is ofc not healthy for the stability.
I seems to be only vscod-addon ans surprisingly it seems a known problem here, here or here.
Question, because I don’t see any reaction there at all: Is Frenck aware of it or perhaps on vacation or someone else took over?
I know, OT, if this thread is only about integrations causing problems. But wanted to let you and others know, how are seeins similiar and can stop digging with profiler. Thanks for your help anyways, Petro and bdraco.
That seems like its a problem with VSCode plugins or VSCode itself. I usually have to close it and restart it a few times a day when using it locally depending on how much I’ve used it. I’d guess its the same for running it on a server. Its probably something that has to be fixed upstream and likely nothing that can be done with the addon itself.
Could be, but I’m not sure. Because VSC addon was already uptodate the last weeks. The problem only started (at least here) with updating Core and HAOS. So perhaps a new memory handling of HAOS/Supervisor as well.
Did try to change the templates, but I keep experiencing the memory leaks.
Here is what I captured.
I didn’t include the first entry items, not sure if it is important to view those (e.g., dict had 232,627; list 105,698 ).
Anything that I should look for?
Object | Total Instances Last Entry | Sum of new over 2h | Log Entries |
---|---|---|---|
NodeStrClass |
19,192 | 2,956 | 1 |
NodeDictClass |
4,042 | 753 | 1 |
tuple |
156,527 | 317 | 10 |
NodeListClass |
863 | 272 | 1 |
cell |
77,540 | 215 | 12 |
Context |
11,197 | 194 | 10 |
Input |
278 | 161 | 1 |
HassJob |
7,496 | 129 | 16 |
ReferenceType |
34,956 | 128 | 18 |
builtin_function_or_method |
14,645 | 122 | 7 |
frozenset |
7,482 | 122 | 5 |
State |
6,799 | 119 | 9 |
function |
134,007 | 119 | 12 |
partial |
7,600 | 118 | 10 |
method |
17,333 | 113 | 12 |
set |
20,348 | 103 | 6 |
BinaryExpression |
477 | 92 | 1 |
TimerHandle |
1,377 | 77 | 5 |
BindParameter |
604 | 62 | 4 |
traceback |
201 | 52 | 7 |
That looks like something is holding on to parsed YAML for longer than it should.