Thread leak?

Hi
running HA 2022.7.5 in a Python venv, Python 3.9, Raspberry Pi 4, 4GB, Storage on SSD
Once in a while I have this message in the log:

2022-08-02 04:50:05 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/srv/homeassistant2/lib/python3.9/site-packages/homeassistant/data_entry_flow.py", line 222, in async_init
    flow, result = await task
  File "/srv/homeassistant2/lib/python3.9/site-packages/homeassistant/data_entry_flow.py", line 249, in _async_init
    result = await self._async_handle_step(flow, flow.init_step, data, init_done)
  File "/srv/homeassistant2/lib/python3.9/site-packages/homeassistant/data_entry_flow.py", line 359, in _async_handle_step
    result: FlowResult = await getattr(flow, method)(user_input)
  File "/home/homeassistant/.homeassistant/custom_components/bosch_shc/config_flow.py", line 193, in async_step_zeroconf
    self.info = await self._get_info(discovery_info.host)
  File "/home/homeassistant/.homeassistant/custom_components/bosch_shc/config_flow.py", line 225, in _get_info
    return await self.hass.async_add_executor_job(
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/homeassistant/.homeassistant/custom_components/bosch_shc/config_flow.py", line 74, in get_info_from_host
    information = session.mdns_info()
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/session.py", line 277, in mdns_info
    return SHCInformation(
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/information.py", line 80, in __init__
    self.get_unique_id(zeroconf)
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/information.py", line 138, in get_unique_id
    self._listener = SHCListener(zeroconf, self.filter)
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/information.py", line 32, in __init__
    ServiceBrowser(zeroconf, "_http._tcp.local.", handlers=[self.service_update])
  File "/srv/homeassistant2/lib/python3.9/site-packages/zeroconf/_services/browser.py", line 511, in __init__
    self.start()
  File "/usr/lib/python3.9/threading.py", line 874, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

at around 04:50am BoschSHC integration could not start a new thread.

This time I monitored the system since restarting on 2022-07-28 17:50. Did install py-spy and found this:
I have an ever increasing number of threads with this name:
Thread 0xBE80E440 (active): “zeroconf-ServiceBrowser-_http._tcp-2954”
After starting the HA-Service I have 3 such threads and it increases by 2 threads every 40-50 minutes, the maximum was 312 “zeroconf-ServiceBrowser-_http._tcp-”-Threads this morning at 04:50, same time I had the error above in the logs. Setup a sensor for monitoring

  - platform: command_line
    name: ThreadWatch1
    command: "py-spy dump --pid $(pgrep -u homeassistant hass ) |grep zeroconf-ServiceBrowser-_http._tcp | wc -l"

Stacktrace for all 312 threads is similar:

Thread 0xBE80E440 (active): "zeroconf-ServiceBrowser-_http._tcp-2954"
    run (zeroconf/_services/browser.py:530)
        Arguments:
            self: <ServiceBrowser at 0x6f1db328>
        Locals:
            event: (("ds416._http._tcp.local.", "_http._tcp.local."), <ServiceStateChange at 0xae016178>)
    _bootstrap_inner (threading.py:954)
        Arguments:
            self: <ServiceBrowser at 0x6f1db328>
    _bootstrap (threading.py:912)
        Arguments:
            self: <ServiceBrowser at 0x6f1db328>

First suspect was the custom component for BoschSHC, but I disabled the integration and the number of threads was still increasing after a restart
Another suspect was my Synology DS416 because the name was listed in the stacktrace. The Synology is also acting as DHCP- and DNS-Server in my network.
Also disabled the Synology integration, number of threads was still increasing

Any idea what causes the threads being started and obviously never terminated?

Armin

Do you have a repeat while/until loop in your automations or scripts that never exits?

Hi
I have only a small number of automations, nothing complex, just switching lights based on input from IR-Receivers or time.
Also have the same on my test system ( Pi4 8GB, 64-bit system, Python venv), on the test system I currently have no automations or scripts.

Integrations:
HACS
BoschSHC ( Custom Component)
DWD ( from HACS)
Hyperion
IPP ( Monitoring ink in my printer)
MQTT ( Monitoring my Pi4’s using RPi Reporter MQTT2HA Daemon)
Nmap tracker ( using ping, that might be a suspect)
Philips Hue
Tankerkoenig

Armin

On my test system I removed the
default_config:
entry and explicitly added integrations into configuration.yaml I think I really need. Will see where this takes me… just after restart I now have 1 thread instead of 3.

Armin

Based on the name no too surprising, it is “zeroconf” creating the threads, after removing default_config and no seeing additional threads being created, I added the integrations listed on the default_config page and after adding “zeroconf” to my configuration.yaml the threads with the names listed in the initial message were created again, 3 at startup and 2 additional every 40-50 minutes.

Created an issue on github meanwhile

Armin

To close the loop on this.

Hi
thanks for looking into this. I’m just surprised that the threads are still increasing after I disabled the BoschSHC integration, restarted HA and checked that the integration is not started

Armin

It’s certainly possible you have another integration that is doing the same thing. It’s easy to forget to cancel the service browsers after using them.

Thank you @armin-gh for reporting this issue. I fixed the reported bug in the 3rd party library.

It looks like the solution in the lib was to call zeroconf.close(). Unfortunately that won’t actually work because HomeAssistant limits zeroconf to a single instance and .close() is blocked to prevent closing the shared instance.

You can cancel the browser with the below example instead:

    browser = ServiceBrowser(....)

    browser.cancel()

Thank for looking into this again

on my test system btw the number of threads is even increasing with this very minimalistic configuration.yaml, ALL (!) other integrations disabled and all custom integrations removed.
groups.yaml, script.yaml and scene.yaml are empty, the automation is disabled.

the number of “zeroconf-ServiceBrowser”-threads increments by 1 approximately every 40-45 minutes
My test system is meanwhile updated to 2022.9.5

# Configure a default setup of Home Assistant (frontend, api, etc)
default_config:

group: !include groups.yaml
automation: !include automations.yaml
script: !include scripts.yaml
scene: !include scenes.yaml

recorder:

http:
  ssl_certificate: /etc/letsencrypt/fullchain.cer
  ssl_key: /etc/letsencrypt/my.key


sensor:
  - platform: command_line
    name: ThreadWatch1
    command: "py-spy dump --pid $(pgrep -u homeassistant hass ) |grep zeroconf-ServiceBrowser-_http._tcp | wc -l"

Armin

My fault. Thank you @bdraco
I changed the behavior now to cancel the browser in the 3rd party library.

1 Like

Is there a sensor that can show how many active threads there are? Something has thread leads in my system and eventually HA stops operating entirely. I thought it was also the memory so I switch to a RPi4 8GB (from RPi3 1GB) which definitely sped things up!..however eventually the same issues arise (not being able to restart host/no access to supervisor/etc).

Install the profiler integration. It has a service to dump the active threads to the log.

1 Like

Thanks bradco for tip on profiler…very useful and I dumped everything. Looks like it might be a file descriptor leak, and not a thread leak. Sockets can’t be opened, which appears to have caused lots of blocked threads and everything to fail (e.g. cannot reboot the host remotely). I’ll have to track that down…I uninstalled some integrations in hope that I can narrow down what might be behaving badly.