0.117.0 continual memory increase

nontijt · November 11, 2020, 7:32am

Thank you for the explaining on how to setup custom components like that.
This part has been frustrating me more than once, because I didnt seem to have the required components exposed. GHeheh… now I know.

I have copied the files, and it is being used by HA. Under settings I see DSMR options.

I do NOT see as much increase of cpu load as before. So it does look like you improved this a lot for me.
After reboot the load has not yet stabalized fully, but this looks very proising. I will update this later today.

Just a sanity question on my stats above.
the IOwait spikes. are these completly as excpected? CPU=Pentium gold 5400, or is it HDD performance depending?

RobBie1221 · November 11, 2020, 8:09am

I’m not the IO Wait expert, but I think it’s HDD related. What still changed in this integration is that force_update is True, so where previously recorder only wrote changes to the database, now every datapoint is written (so roughly every 30 seconds when this is set as minimum time between entity updates).

nontijt · November 11, 2020, 8:40am

Just to confirm. You fixed the DSMR issue for me.
I see no significal difference between enabled/disabled now.

RobBie1221 · November 11, 2020, 8:57am

Good to hear. The PR has been merged and I think it’ll make it in 0.118.

nontijt · November 11, 2020, 9:26am

Some questions about the dsmr improvement you did, out of curiosity.
If I set the minimum time to 60 seconds, and there is a new measurement every 10 seconds, will that result in an update to the database every 10 seconds?
Or does it work like, it updates the database every 60 seconds, with 5 or 6 new datapoints?

what is the reason for wanting force_update: true with a fixed time. from your perspective.
If force_update = false, what will be the frequency of putting data in the database? other negatives?
Or are there other things depending and hooking into the dsmr database values, i.e. templates.?

Regardless. many thanks for this fix now already. much appreciated. !!

RobBie1221 · November 11, 2020, 9:44am

Databases (like recorder integration and influxdb integration) listen to state changed events. When they receive an event for an entity, they will write a datapoint. One remark there, “write” does not necessarily mean an actual write, the recorder can be configured with commit_interval and also influxdb buffers data not to trigger writes too much.

If force_update = False, there is no saying in when the database (recorder or influxdb) gets updated. If the value does not change, no state changed event is raised and this will essentially mean the recorder and influxdb will not see that data was received because they listen to state changed events. This is the reason that typically for sensor entities, force_update is set True, because without databases can miss points for hours even though data was received.

With this new feature, the minimum time between updates is configurable. Data that is received in between is flushed, so it will not back-write the intermediate data that has been received.

The way it is in 0.117, every update from meter generates a state event for all 30 entities. Some meters report data every 5 seconds, so that means 30 events every 5 seconds. Apparently for some hardware, like Rpi3, this is too much to handle. That’s why this minimum time between updates was added. When your hardware allows it, you can set the value to 0 and basically disable it and get every update in your database.

nontijt · November 11, 2020, 4:39pm

Thank you very much for the detailed explanation ! Much appreciated.
I understand now why my system was having trouble handling it, and why this is starting 0.117.

On thing of extra information for you on that. I dont think in this case the issue is only with limited hardware like Rpi’s. I am running a 4core intel with 32G Ram, 2xSSD’s in ZFS_mirror.

Hmmbob · November 11, 2020, 5:31pm

Rob, would/could this potentially also produce massive logging if the Home Assistant log_lvl is set to debug (==default)?

RobBie1221 · November 11, 2020, 5:40pm

I can imagine yes. I think debug level will log all state events. I have to say, with or without this change, if I set global level to debug, my logging goes wild anyway. I try to set a specific component or core if really neccessary to debug at most.

Hmmbob · November 11, 2020, 6:06pm

Agree, but it is the default, so if one does not set this themselves to info or higher…

https://github.com/home-assistant/core/blob/dbe89c1e021c86cd958fb36782cb45de1365932f/homeassistant/components/logger/init.py#L25

RobBie1221 · November 11, 2020, 7:06pm

I know, I have to say I was surprised it was the default, I personally consider it bad practice to keep logger with default debug always on.

Perhaps no one bothered yet to change the default…

nontijt · November 19, 2020, 7:09am

hi Rob,

just a quick question. Can you confirm the DSMR fix/improvement is merged in 0.118 ?

RobBie1221 · November 19, 2020, 8:04am

Yes, it’s merged. You can check the release notes all the way on the bottom under “All changes”. If you search for “dsmr” you can find the pull request.

nontijt · November 19, 2020, 8:49am

Great! thanks for the tip as well.
I never noticed the “All changes” dropdown. good info.

matthijsberg · January 6, 2021, 9:32pm

Hi. I’m migrating to a new server (from Ubuntu to Debian with the super visor supported). On the new server I found that I have 100% CPU usage (4Core i7) and memory (32GB) filles in under 30 minutes until everything freezes. Removing the DSMR component from the integrations GUI panel resolves it.

It’s a brand new install where I restore the HASSIO config and my add-on containers.

Version 2020.12.2

Config:

- platform: dsmr
  #  name: "Slimme Meter"
  #port: /dev/ttyUSB0
  port: /dev/serial/by-id/usb-FTDI_FT232R_USB_UART_A52SJSS1-if00-port0
  dsmr_version: 5
  reconnect_interval: 60

NOTE: the cable is not attached to the new system and it works, without memory leak, on the old system. However I had the same a week ago. Restoring my broken network connections (DNS) most likely resolved that, but I can’t get it to work here.

I have Influx configured too, but on the system where it works CPU averaging < 10% .

Any pointer would be greatly appreciated.

matthijsberg · January 7, 2021, 10:59am

I did some debugging too, when running docker stats it is clearly the HA container, not the supervisor on my case. I installed pys-py and that gave me a SVG trace that I don’t understand, but perhaps someone here has a better understand of that.

top gives me:

top - 11:55:46 up 18:13,  2 users,  load average: 1.58, 2.02, 1.76
Tasks: 261 total,   2 running, 259 sleeping,   0 stopped,   0 zombie
%Cpu(s): 18.0 us,  6.3 sy,  0.0 ni, 75.5 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :  32058.7 total,  11841.0 free,   4527.0 used,  15690.6 buff/cache
MiB Swap:    976.6 total,    976.6 free,      0.0 used.  26858.8 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                               
14622 root      20   0 3330440   3.0g  47588 R 112.0   9.5   3:01.89 python3                                                                                                                                                               
15883 root      20   0  909100 102636  28444 S  44.2   0.3   0:21.65 node                                                                                                                                                                  
  312 root      20   0  152860  84960  83544 S  22.9   0.3  14:38.34 systemd-journal                                                                                                                                                       
  546 root      20   0  225824   4920   3132 S  13.3   0.0   7:23.14 rsyslogd                                                                                                                                                              
  707 root      20   0 2204056 130072  50604 S  10.0   0.4  14:05.86 dockerd                                                                                                                                                               
14129 root      20   0  111976  10276   3908 S   1.0   0.0   0:01.74 containerd-shim                                                                                                                                                       
   37 root      20   0       0      0      0 S   0.3   0.0   0:00.42 ksoftirqd/5                                                                                                                                                           
  666 root      20   0 1711724  61032  26312 S   0.3   0.2  10:23.28 containerd                                                                                                                                                            
14099 matthijs  20   0   11464   3892   3056 R   0.3   0.0   0:00.71 top                                                                                                                                                                   
    1 root      20   0  169824  10532   7864 S   0.0   0.0   0:02.50 systemd                                                                                                                                                               
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd                                                                                                                                                              
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                                                                                
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                                                                            
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd                                                                                                                                                  
    8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                                                                          
    9 root      20   0       0      0      0 S   0.0   0.0   0:00.59 ksoftirqd/0                                                                                                                                                           
   10 root      20   0       0      0      0 I   0.0   0.0   0:22.70 rcu_sched                                                                                                                                                             
   11 root      20   0       0      0      0 I   0.0   0.0   0:00.00 rcu_bh                                                                                                                                                                
   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.19 migration/0

a sudo py-spy top --pid 14622 gives me:

Collecting samples from 'python3 -m homeassistant --config /config' (python v3.8.6)
Total Samples 27700
GIL: 0.00%, Active: 7095.00%, Threads: 82

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                               
5500.00% 5500.00%   15021s    15021s   _worker (concurrent/futures/thread.py:78)
400.00% 400.00%    1056s     1056s   run_once (pychromecast/socket_client.py:573)
200.00% 200.00%   489.9s    491.0s   readinto (socket.py:669)
197.00% 197.00%   475.5s    475.5s   _stream_worker_internal (homeassistant/components/stream/worker.py:240)
148.00% 148.00%   444.7s    444.7s   checkcache (linecache.py:74)
100.00% 100.00%   274.8s    274.8s   _discover_thread (pysonos/discovery.py:124)
100.00% 100.00%   274.7s    274.7s   loop (paho/mqtt/client.py:1167)
100.00% 100.00%   272.3s    272.3s   run (zeroconf/__init__.py:1354)
100.00% 100.00%   243.7s    243.7s   run (homeassistant/components/recorder/__init__.py:357)
100.00% 100.00%   239.3s    239.3s   create_connection (urllib3/util/connection.py:86)
 54.00%  54.00%   102.4s    102.4s   open (serial/serialposix.py:265)
 15.00%  15.00%   33.90s    33.90s   flush (logging/__init__.py:1065)
  5.00%   5.00%   11.05s    11.05s   open (serial/serialposix.py:268)
  0.00%   0.00%   10.03s    10.03s   _stream_worker_internal (homeassistant/components/stream/worker.py:79)
  1.00%  68.00%    9.48s    139.9s   coro (asyncio/coroutines.py:124)
  0.00%   0.00%    8.99s     9.00s   filter (logging/__init__.py:805)
  2.00%  91.00%    8.58s    245.3s   _run (asyncio/events.py:81)
  2.00%  12.00%    5.78s    53.52s   exception (logging/__init__.py:1469)
  5.00%   5.00%    5.62s     6.76s   extract (traceback.py:359)
  0.00%   0.00%    4.99s     9.42s   create_task (asyncio/base_events.py:431)
  0.00%   0.00%    3.98s     4.00s   load_json (homeassistant/util/json.py:31)
  3.00%   3.00%    3.49s     3.49s   mux_video_packet (homeassistant/components/stream/worker.py:213)
  0.00%   5.00%    3.14s    21.85s   makeRecord (logging/__init__.py:1544)
  0.00%   0.00%    2.92s     2.92s   update (netdisco/tellstick.py:42)
  0.00%   0.00%    2.77s     2.77s   create_dsmr_protocol (dsmr_parser/clients/protocol.py:42)
  1.00%   1.00%    2.65s     2.65s   create_dsmr_reader (dsmr_parser/clients/protocol.py:51)
  1.00%   1.00%    2.52s     2.52s   __init__ (logging/__init__.py:290)
  0.00% 148.00%    2.52s    451.4s   extract (traceback.py:362)
  1.00%   1.00%    2.50s     2.50s   extract (traceback.py:336)
  1.00%   2.00%    2.27s    21.40s   connect_and_reconnect (homeassistant/components/dsmr/sensor.py:186)
  0.00%   0.00%    2.26s     2.26s   update (netdisco/daikin.py:49)
  0.00%   0.00%    2.20s     2.20s   handle_read (zeroconf/__init__.py:1408)
  0.00%   0.00%    2.19s     2.19s   _run_once (asyncio/base_events.py:1843)
  0.00%   0.00%    2.19s     2.19s   update (netdisco/smartglass.py:135)
  1.00%  67.00%    2.17s    130.4s   create_serial_connection (serial_asyncio/__init__.py:410)
  0.00%   0.00%    2.12s     2.12s   update (netdisco/gdm.py:85)
  0.00%   0.00%    2.07s     2.07s   update (netdisco/lms.py:47)
  1.00%   2.00%    2.01s     4.53s   _call_soon (asyncio/base_events.py:739)
  0.00%   0.00%    1.85s     2.82s   _run_once (asyncio/base_events.py:1860)
  1.00%   1.00%    1.84s     1.84s   _some_str (traceback.py:155)
  1.00%   5.00%    1.79s     4.52s   line (traceback.py:288)
  0.00%   0.00%    1.72s     1.72s   scan (netdisco/ssdp.py:255)
  0.00%   0.00%    1.69s     1.69s   format (traceback.py:423)
  0.00%   0.00%    1.64s     1.64s   extract (traceback.py:361)
  0.00%   3.00%    1.54s    10.11s   serial_for_url (serial/__init__.py:85)
  1.00%   1.00%    1.52s     1.52s   __init__ (logging/__init__.py:329)
  0.00%   0.00%    1.52s     1.52s   _write_to_self (asyncio/selector_events.py:140)
  0.00%  61.00%    1.48s    115.9s   serial_for_url (serial/__init__.py:88)
  0.00%   0.00%    1.43s     1.93s   create_dsmr_protocol (dsmr_parser/clients/protocol.py:39)
  0.00%   0.00%    1.35s     1.35s   print_exception (traceback.py:105)
  2.00%   2.00%    1.33s     1.33s   formatTime (logging/__init__.py:604)
  0.00%   0.00%    1.21s     1.21s   acquire (logging/__init__.py:899)
  0.00%   1.00%    1.20s     2.35s   extract (traceback.py:353)
  2.00%   2.00%    1.19s     1.19s   connect_and_reconnect (homeassistant/components/dsmr/sensor.py:183)
  0.00%   0.00%    1.15s     1.15s   add (_weakrefset.py:84)

matthijsberg · January 7, 2021, 11:14am

Last update before I reset my system en go on, when is remove the “Netbeheer” integration from the GUI, but not from the sensors part in the config, I get the py-spy output below. Noticeable differences in the the lines about “serial”. Is it possible the DSMR part is waiting for the serial to come available and hangs there? There are no DSMR processes too.

Collecting samples from 'python3 -m homeassistant --config /config' (python v3.8.6)
Total Samples 27700
GIL: 0.00%, Active: 7095.00%, Threads: 82

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                               
5500.00% 5500.00%   15021s    15021s   _worker (concurrent/futures/thread.py:78)
400.00% 400.00%    1056s     1056s   run_once (pychromecast/socket_client.py:573)
200.00% 200.00%   489.9s    491.0s   readinto (socket.py:669)
197.00% 197.00%   475.5s    475.5s   _stream_worker_internal (homeassistant/components/stream/worker.py:240)
148.00% 148.00%   444.7s    444.7s   checkcache (linecache.py:74)
100.00% 100.00%   274.8s    274.8s   _discover_thread (pysonos/discovery.py:124)
100.00% 100.00%   274.7s    274.7s   loop (paho/mqtt/client.py:1167)
100.00% 100.00%   272.3s    272.3s   run (zeroconf/__init__.py:1354)
100.00% 100.00%   243.7s    243.7s   run (homeassistant/components/recorder/__init__.py:357)
100.00% 100.00%   239.3s    239.3s   create_connection (urllib3/util/connection.py:86)
 54.00%  54.00%   102.4s    102.4s   open (serial/serialposix.py:265)
 15.00%  15.00%   33.90s    33.90s   flush (logging/__init__.py:1065)
  5.00%   5.00%   11.05s    11.05s   open (serial/serialposix.py:268)
  0.00%   0.00%   10.03s    10.03s   _stream_worker_internal (homeassistant/components/stream/worker.py:79)
  1.00%  68.00%    9.48s    139.9s   coro (asyncio/coroutines.py:124)
  0.00%   0.00%    8.99s     9.00s   filter (logging/__init__.py:805)
  2.00%  91.00%    8.58s    245.3s   _run (asyncio/events.py:81)
  2.00%  12.00%    5.78s    53.52s   exception (logging/__init__.py:1469)
  5.00%   5.00%    5.62s     6.76s   extract (traceback.py:359)
  0.00%   0.00%    4.99s     9.42s   create_task (asyncio/base_events.py:431)
  0.00%   0.00%    3.98s     4.00s   load_json (homeassistant/util/json.py:31)
  3.00%   3.00%    3.49s     3.49s   mux_video_packet (homeassistant/components/stream/worker.py:213)
  0.00%   5.00%    3.14s    21.85s   makeRecord (logging/__init__.py:1544)
  0.00%   0.00%    2.92s     2.92s   update (netdisco/tellstick.py:42)
  0.00%   0.00%    2.77s     2.77s   create_dsmr_protocol (dsmr_parser/clients/protocol.py:42)
  1.00%   1.00%    2.65s     2.65s   create_dsmr_reader (dsmr_parser/clients/protocol.py:51)
  1.00%   1.00%    2.52s     2.52s   __init__ (logging/__init__.py:290)
  0.00% 148.00%    2.52s    451.4s   extract (traceback.py:362)
  1.00%   1.00%    2.50s     2.50s   extract (traceback.py:336)
  1.00%   2.00%    2.27s    21.40s   connect_and_reconnect (homeassistant/components/dsmr/sensor.py:186)
  0.00%   0.00%    2.26s     2.26s   update (netdisco/daikin.py:49)
  0.00%   0.00%    2.20s     2.20s   handle_read (zeroconf/__init__.py:1408)
  0.00%   0.00%    2.19s     2.19s   _run_once (asyncio/base_events.py:1843)
  0.00%   0.00%    2.19s     2.19s   update (netdisco/smartglass.py:135)
  1.00%  67.00%    2.17s    130.4s   create_serial_connection (serial_asyncio/__init__.py:410)
  0.00%   0.00%    2.12s     2.12s   update (netdisco/gdm.py:85)
  0.00%   0.00%    2.07s     2.07s   update (netdisco/lms.py:47)
  1.00%   2.00%    2.01s     4.53s   _call_soon (asyncio/base_events.py:739)
  0.00%   0.00%    1.85s     2.82s   _run_once (asyncio/base_events.py:1860)
  1.00%   1.00%    1.84s     1.84s   _some_str (traceback.py:155)
  1.00%   5.00%    1.79s     4.52s   line (traceback.py:288)
  0.00%   0.00%    1.72s     1.72s   scan (netdisco/ssdp.py:255)
  0.00%   0.00%    1.69s     1.69s   format (traceback.py:423)
  0.00%   0.00%    1.64s     1.64s   extract (traceback.py:361)
  0.00%   3.00%    1.54s    10.11s   serial_for_url (serial/__init__.py:85)
  1.00%   1.00%    1.52s     1.52s   __init__ (logging/__init__.py:329)
  0.00%   0.00%    1.52s     1.52s   _write_to_self (asyncio/selector_events.py:140)
  0.00%  61.00%    1.48s    115.9s   serial_for_url (serial/__init__.py:88)
  0.00%   0.00%    1.43s     1.93s   create_dsmr_protocol (dsmr_parser/clients/protocol.py:39)
  0.00%   0.00%    1.35s     1.35s   print_exception (traceback.py:105)
  2.00%   2.00%    1.33s     1.33s   formatTime (logging/__init__.py:604)
  0.00%   0.00%    1.21s     1.21s   acquire (logging/__init__.py:899)
  0.00%   1.00%    1.20s     2.35s   extract (traceback.py:353)
  2.00%   2.00%    1.19s     1.19s   connect_and_reconnect (homeassistant/components/dsmr/sensor.py:183)
  0.00%   0.00%    1.15s     1.15s   add (_weakrefset.py:84)

SiriusGen · February 7, 2021, 12:29am

Just a note to say that there seems to be a memory increase again, with the release of 2021.2.1. It also seems to be related to the Brother Printer integration like it was (for me) last time. Removing this integration for now stops the increase.

shukkar · March 10, 2021, 7:04pm

I am seeing this issue in core-2021.3.3 (as well as core-2021.x.x) with ONVIF integration. With two ONVIF cameras (Intel NUC, supervised install), core memory goes from 20% to 100% used in a few hours and causes core to crash and restart. Removing ONVIF cameras fixes this. Adding only one ONVIF camera causes slower increase in memory usage, but eventually it goes to 100% (including SWAP) and crashes core.

jaimefdeoliveira · March 12, 2021, 2:29pm

I have the same problem but without any integration. after install the core-2021.3.3.