Memory leak

Anybody experienced extensive memory usage?

After upgrading to 0.112 I noticed that my HA is using more and more memory. Up until it reached apr 100% and then gets unresponsive… and reboots itself.

Still an issue after patching to 0.112.2. How can I trace what’s going wrong?

/KBrygger

Start with your text logs.

Well… Of course :slight_smile:

[740581.573262] usb 1-1.5: USB disconnect, device number 123
[740581.876484] usb 1-1.5: new full-speed USB device number 124 using dwc_otg
[740582.011185] usb 1-1.5: New USB device found, idVendor=1cf1, idProduct=0030, bcdDevice= 1.00
[740582.015211] usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[740582.019162] usb 1-1.5: Product: ConBee II
[740582.021132] usb 1-1.5: Manufacturer: dresden elektronik ingenieurtechnik GmbH
[740582.023150] usb 1-1.5: SerialNumber: DE1963500
[740582.026309] cdc_acm 1-1.5:1.0: ttyACM2: USB ACM device
[740589.253292] usb 1-1.5: USB disconnect, device number 124
[740589.556485] usb 1-1.5: new full-speed USB device number 125 using dwc_otg
[740589.690947] usb 1-1.5: New USB device found, idVendor=1cf1, idProduct=0030, bcdDevice= 1.00
[740589.695054] usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[740589.699126] usb 1-1.5: Product: ConBee II
[740589.701133] usb 1-1.5: Manufacturer: dresden elektronik ingenieurtechnik GmbH
[740589.703180] usb 1-1.5: SerialNumber: DE1963500
[740589.707605] cdc_acm 1-1.5:1.0: ttyACM2: USB ACM device
[740635.589367] usb 1-1.5: USB disconnect, device number 125
[740635.886454] usb 1-1.5: new full-speed USB device number 126 using dwc_otg
[740636.020944] usb 1-1.5: New USB device found, idVendor=1cf1, idProduct=0030, bcdDevice= 1.00
[740636.025126] usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[740636.029534] usb 1-1.5: Product: ConBee II
[740636.031719] usb 1-1.5: Manufacturer: dresden elektronik ingenieurtechnik GmbH
[740636.033956] usb 1-1.5: SerialNumber: DE1963500
[740636.037364] cdc_acm 1-1.5:1.0: ttyACM2: USB ACM device
[740640.308349] avahi-daemon invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[740640.312777] avahi-daemon cpuset=/ mems_allowed=0
[740640.314964] CPU: 3 PID: 345 Comm: avahi-daemon Tainted: G         C        4.19.126-v7 #1
[740640.319273] Hardware name: BCM2835
[740640.321458] [<801122a8>] (unwind_backtrace) from [<8010d4f8>] (show_stack+0x20/0x24)
[740640.325809] [<8010d4f8>] (show_stack) from [<80972b7c>] (dump_stack+0xd4/0x118)
[740640.330163] [<80972b7c>] (dump_stack) from [<8027487c>] (dump_header+0x80/0x270)
[740640.334586] [<8027487c>] (dump_header) from [<802739ec>] (oom_kill_process+0xe8/0x38c)
[740640.339174] [<802739ec>] (oom_kill_process) from [<80274664>] (out_of_memory+0x280/0x380)
[740640.343869] [<80274664>] (out_of_memory) from [<8027a4cc>] (__alloc_pages_nodemask+0xb90/0x111c)
[740640.348704] [<8027a4cc>] (__alloc_pages_nodemask) from [<8026fc0c>] (filemap_fault+0x54c/0x70c)
[740640.353563] [<8026fc0c>] (filemap_fault) from [<802a8e10>] (__do_fault+0x60/0x188)
[740640.358554] [<802a8e10>] (__do_fault) from [<802ad9e8>] (handle_mm_fault+0x850/0xd30)
[740640.363630] [<802ad9e8>] (handle_mm_fault) from [<8098ee3c>] (do_page_fault+0x154/0x3a0)
[740640.368831] [<8098ee3c>] (do_page_fault) from [<801169f0>] (do_PrefetchAbort+0x5c/0xec)
[740640.374116] [<801169f0>] (do_PrefetchAbort) from [<80101f24>] (ret_from_exception+0x0/0x1c)
[740640.379484] Exception stack(0xb5491fb0 to 0xb5491ff8)
[740640.382214] 1fa0:                                     00000007 7eb9cbaf 00000001 00000057
[740640.387563] 1fc0: 01d87c48 01dd6fa0 7eb9cbc8 01d87f00 0003a430 01d8a630 00000000 00000000
[740640.392877] 1fe0: 76f7410c 7eb9cba8 76f606b0 76e44680 20000010 ffffffff
[740640.395989] Mem-Info:
[740640.398767] active_anon:88474 inactive_anon:89113 isolated_anon:0
[740640.398767]  active_file:355 inactive_file:638 isolated_file:0
[740640.398767]  unevictable:0 dirty:0 writeback:0 unstable:0
[740640.398767]  slab_reclaimable:9271 slab_unreclaimable:16641
[740640.398767]  mapped:429 shmem:655 pagetables:1810 bounce:0
[740640.398767]  free:1039 free_pcp:484 free_cma:0
[740640.413724] Node 0 active_anon:353896kB inactive_anon:356452kB active_file:1392kB inactive_file:1968kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1216kB dirty:0kB writeback:0kB shmem:2620kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[740640.423033] Normal free:4140kB min:3868kB low:4832kB high:5796kB active_anon:353896kB inactive_anon:356452kB active_file:1752kB inactive_file:1964kB unevictable:0kB writepending:0kB present:970752kB managed:946152kB mlocked:0kB kernel_stack:4296kB pagetables:7240kB bounce:0kB free_pcp:2456kB local_pcp:780kB free_cma:0kB
[740640.432311] lowmem_reserve[]: 0 0
[740640.434576] Normal: 150*4kB (UEH) 165*8kB (EH) 130*16kB (UEH) 0*32kB 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4064kB
[740640.439199] 1920 total pagecache pages
[740640.441481] 279 pages in swap cache
[740640.443654] Swap cache stats: add 1586522, delete 1586243, find 161263/1617086
[740640.445892] Free swap  = 0kB
[740640.448151] Total swap = 236536kB
[740640.450264] 242688 pages RAM

I don’t know if this makes sense. But here it says ‘Out of memory’ :frowning:

/KBrygger

I was referring to the Home Assistant text logs, assuming you’re only running home assistant.

Running Hass.io or cores as it is called now :slight_smile:

2020-07-06 16:00:57 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-07-06 16:00:57 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for apple_tv which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-07-06 16:00:59 WARNING (Recorder) [homeassistant.components.recorder] Ended unfinished session (id=173 from 2020-07-04 14:28:10.883302)
2020-07-06 16:01:10 WARNING (MainThread) [homeassistant.setup] Setup of person is taking over 10 seconds.
2020-07-06 16:01:10 WARNING (MainThread) [homeassistant.setup] Setup of hassio is taking over 10 seconds.
2020-07-06 16:01:15 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for unifigateway which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-07-06 16:01:25 WARNING (MainThread) [homeassistant.setup] Setup of timer is taking over 10 seconds.
2020-07-06 16:01:25 WARNING (MainThread) [homeassistant.setup] Setup of zone is taking over 10 seconds.
2020-07-06 16:01:26 ERROR (MainThread) [homeassistant.components.media_player] The samsungtv platform for the media_player integration does not support platform setup. Please remove it from your config.
2020-07-06 16:01:26 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for mitemp_bt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-07-06 16:01:28 ERROR (MainThread) [snitun.client.client_peer] Challenge/Response error with SniTun server
2020-07-06 16:01:28 ERROR (MainThread) [hass_nabucasa.remote] Connection problem to snitun server
2020-07-06 16:01:34 ERROR (MainThread) [pyhaversion] Timeout error fetching version information from Hassio, 
2020-07-06 16:01:38 WARNING (MainThread) [homeassistant.components.sensor] Setup of sensor platform unifigateway is taking over 10 seconds.
2020-07-06 16:01:46 WARNING (MainThread) [homeassistant.components.climate] Setup of climate platform netatmo is taking over 10 seconds.
2020-07-06 16:01:46 WARNING (MainThread) [homeassistant.components.sensor] Setup of sensor platform netatmo is taking over 10 seconds.
2020-07-06 16:01:46 ERROR (MainThread) [homeassistant.config_entries] Error setting up entry 192.168.0.3 for synology_dsm
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 428, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='192.168.0.3', port=5000): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/synology_dsm/synology_dsm.py", line 256, in _execute_request
    url, params=encoded_params, timeout=self._timeout, **kwargs
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='192.168.0.3', port=5000): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 220, in async_setup
    hass, self
  File "/usr/src/homeassistant/homeassistant/components/synology_dsm/__init__.py", line 161, in async_setup_entry
    await api.async_setup()
  File "/usr/src/homeassistant/homeassistant/components/synology_dsm/__init__.py", line 253, in async_setup
    await self._hass.async_add_executor_job(self._fetch_device_configuration)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/synology_dsm/__init__.py", line 321, in _fetch_device_configuration
    self.utilisation = self.dsm.utilisation
  File "/usr/local/lib/python3.7/site-packages/synology_dsm/synology_dsm.py", line 351, in utilisation
    data = self.get(SynoCoreUtilization.API_KEY, "get")
  File "/usr/local/lib/python3.7/site-packages/synology_dsm/synology_dsm.py", line 179, in get
    return self._request("GET", api, method, params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/synology_dsm/synology_dsm.py", line 223, in _request
    response = self._execute_request(request_method, url, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/synology_dsm/synology_dsm.py", line 275, in _execute_request
    raise SynologyDSMRequestException(exp)
synology_dsm.exceptions.SynologyDSMRequestException: {'api': None, 'code': -1, 'reason': 'Unknown', 'details': "ReadTimeout = HTTPConnectionPool(host='192.168.0.3', port=5000): Read timed out. (read timeout=10)"}
2020-07-06 16:02:30 ERROR (MainThread) [homeassistant.core] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/mitemp_bt/sensor.py", line 775, in update_ble
    discover_ble_devices(config, aeskeys, whitelist)
  File "/config/custom_components/mitemp_bt/sensor.py", line 683, in discover_ble_devices
    sensors[t_i], mac, config, temp_m_data[mac]
  File "/config/custom_components/mitemp_bt/sensor.py", line 499, in calc_update_state
    entity_to_update.schedule_update_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 416, in schedule_update_ha_state
    assert self.hass is not None
AssertionError
2020-07-06 16:11:56 WARNING (SyncWorker_7) [homeassistant.components.ios.notify] The given target has reached the maximum number of notifications allowed per day. Please try again later.
2020-07-06 16:11:56 WARNING (SyncWorker_7) [homeassistant.components.ios.notify] iOS push notification rate limits for kaspar_iphone7: 153 sent, 150 allowed, 0 errors, resets in 9:48:03
2020-07-06 17:23:20 WARNING (MainThread) [homeassistant.components.script] Script script.1554376161801 already running.
2020-07-06 17:24:10 WARNING (MainThread) [homeassistant.components.script] Script script.1554376161801 already running.
2020-07-06 17:25:28 WARNING (Thread-227) [pychromecast.socket_client] [Køkken TV(192.168.0.192):8009] Heartbeat timeout, resetting connection
2020-07-06 17:25:58 ERROR (Thread-227) [pychromecast.socket_client] [Køkken TV(192.168.0.192):8009] Failed to connect to service Chromecast-cd34de6dfbf91dcf5005ac2f7ffcb965._googlecast._tcp.local., retrying in 5.0s
2020-07-06 20:07:36 WARNING (MainThread) [homeassistant.helpers.entity] Updating state for device_tracker.samsung_rasmus (<class 'homeassistant.components.unifi.device_tracker.UniFiClientTracker'>) took 0.623 seconds. Please create a bug report at https://github.com/home-assistant/home-assistant/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+unifi%22

Try to get a py-spy recording. It may be that the integration that is leaking ram is also using lots of cpu time. https://github.com/benfred/py-spy

py-spy record --pid 200 --output issuexxx.svg --duration 120

1 Like

I’m seeing the same thing:

I do reboot home assistant every night (or actually delete the container and reprovision it with the latest version), so i dont get to the point where it gets unresponsive. But i do see the memory leak.

1 Like

Running 111.4 on a Nuc and seeing the same. Rebooted for the first time in about a week and it’s clearly evident on the chart.

1 Like

With all the tweaks and upgrades every 3 weeks, I might be better off with a clean install. Just start over fresh.
I also lost my BT connections for my Xiaomi temp sensors, so to start over I get a clean install and everything should work… I hope :slight_smile:

I had all sorts of problems with HA and memory leaking, until I forced hass to use jemalloc.
Github ticket here about it - including the latest comment from balloob that it’s now the default in the HA container.

But yea, I strongly recommend installing jemalloc and using ld_preload to force it to be the default memory allocator, if you’re not already. Seemed to fix the problems I had with leaking memory.

I’m using the latest container from hass, but i see the memoryleak still.

Newby alert! I have similar issues. My supervisor seems to keep eating memory. See the glances screenshot below:

I tried disabling a lot of addons, this doesnt work. I want to rule out integrations, but with integrations moving to the interface instead of the YAML I don’t know how to disable them temporarily.

This is a dump from py-spy

But I dont have any idea how to debug this. Does anyone have an idea how to continue? How can I disable the integrations temporarily from the interface? I dont want to loose all settings/devices by deleting them.

Great job on getting a py-spy dump.

Disabling integrations likely won’t help you because this appears to be a problem with the supervisor.

Please open an issue here https://github.com/home-assistant/supervisor/issues/new and include everything from your post.

1 Like

I did that. Thanks. I guess I will have to wait. I’ll try to restore an older backup.

YES! I’ve been experiencing this same problem for the last couple of weeks. The memory use builds to around 90%, the load suddenly goes from around 0.2 to between 30 and 60 and the whole system becomes unresponsive.
I’ve been disabling random things, but haven’t isolated the issue yet.

I started experiencing these problems not long after changing over to use MySQL instead of the default SQLite, and booting from SSD, so I was wondering if was related to one of those.
I don’t think I made any other changes around the time that the problems started.

I had memory issues with speedtest integration. Someone using it here?

No, not me.

I also have problems and I have integrated Speedtest. Now deactivated and no improvement. However, several integrations may be affected.

After a few tests I have to change my statement. I think for a very large part of the memory growth was due to the modul “speedtest”.

@phaeton How’s your HA running now? I saw you said it had been running ok for a day.

I restored the backup including the first option: 0.112.x. This didn’t work. The same issues. I moved over to installing Proxmox yesterday(which was on muy list anyway), creating a virtual machine and restoring only the files and settings. This did work for for now. No problems anymore since 23 hours. This is longest uptime since weeks. I think something went wrong along the way with my install and the reinstall fixed it. I doubt the stability has to do with Proxmox, but since I wanted Proxmox any way I installed it.