Improved Stability

There is currently a number of custom_integrations that are not coded thread safe. They do bad things with threads. They have got away with it until the latest version of HA which finished the switch-over to python 3.12. This switch has been in the making foe months and developers of anything connecting to HA have had all this time to fif it. Most have, some have not.

BDraco has mapped out instructions on how to identify the ones that are bad.

In the mean time, HACS and everything in HA Docs tells you that Custom Integrations, themes, python scripts, blueprints, add-ons, and all that stuff you add that doesn’t come included in core can cause instability in your system and if you have problems, check those first. In this case remove all the custom-integrations and you will have a stabile system.

4 Likes

Thanks, that sounds like movement in the right direction.

BUT if you’re referring to the AssertionError, no, that was happening to me long before 2024.4. Actually, I don’t know that has happened to me since I upgraded (but that has only been about a week).

And again, I’m not making this about one issue, those are examples. Ideally, no one integration or device should be able to bring the system down.

Edit: also, note example 2, a core integration, still took me down. Those issues will happen, I know that. HA should too.

I am not a dev at all, just a user like you but have heard rumors that the thread unsafe actions may be detectable when they happen and if that is the case, I’m sure they will come up with something to lock out the offender as it happens and keep the core from crashing.

This is the nature of this kind of application, though. Open source and all of the Custom, whatever’s are not tested by Nabu Casa, rather by the code owner/contributor. The switch to the new python has shown several problems with contributed code.

1 Like

This could help you find the issue

1 Like

My instance locked again this morning, with Profiler running. The UI would not come up, nor were any automations running.

The last hour+ from the log is below, unedited. I can’t say I understand, but it’s almost like it got caught up trying to reconnect to my Sony Bravia Smart TV’s Chromecast and stopped responding to anything else. Does this make sense to anyone else?

It does seem to support what I’ve said…a Core component hangs up and took my whole instance down, again.

2024-04-26 05:22:02.317 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:22:02.458 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:22:02.486 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:22:02.748 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:25:16.814 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:25:17.034 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:25:17.045 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:25:17.383 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:28:31.828 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:28:31.988 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:28:32.004 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:28:32.300 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:30:43.830 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:30:44.007 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:30:44.018 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:30:44.446 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:33:56.625 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:33:56.631 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:33:56.647 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:33:56.758 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:37:07.422 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:37:07.438 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:37:07.455 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:37:07.584 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:39:18.041 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:39:18.052 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:39:18.069 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:39:18.170 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:41:25.125 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:41:25.142 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:41:25.159 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:41:25.286 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:43:35.812 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:43:35.829 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:43:35.840 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:43:35.946 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:47:53.317 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:47:53.333 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:47:53.355 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:47:53.512 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:50:03.978 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:50:03.984 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:50:04.006 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:50:04.139 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:53:54.968 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:53:54.999 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:53:55.000 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:53:55.117 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:56:05.965 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:56:05.992 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:56:06.008 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:56:06.171 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 05:59:16.940 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 05:59:16.956 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 05:59:16.967 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 05:59:17.146 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:02:28.672 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:02:28.679 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:02:28.695 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:02:28.826 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:04:39.272 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:04:39.278 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:04:39.284 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:04:39.389 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:07:53.329 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:07:53.350 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:07:53.366 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:07:53.528 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:11:54.362 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Heartbeat timeout, resetting connection
2024-04-26 06:11:54.368 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:11:57.489 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:14:08.376 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:14:08.393 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:14:08.409 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:14:08.570 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:17:25.693 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:17:25.709 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:17:25.715 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:17:25.850 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:21:26.659 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Heartbeat timeout, resetting connection
2024-04-26 06:21:26.665 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:21:26.671 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error writing to socket.
2024-04-26 06:21:26.786 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:23:40.280 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:23:40.296 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:23:40.302 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:23:40.437 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:25:47.349 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:25:47.355 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:25:47.371 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:25:47.503 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:29:04.621 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:29:04.632 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:29:04.633 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:29:04.753 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:31:18.471 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:31:18.487 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:31:18.498 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:31:18.608 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:33:25.517 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:33:25.523 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:33:25.539 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:33:25.693 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!
2024-04-26 06:35:36.534 ERROR (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-04-26 06:35:36.551 WARNING (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection
2024-04-26 06:35:36.562 INFO (Thread-11) [pychromecast.controllers] Receiver:channel_disconnected
2024-04-26 06:35:36.713 INFO (Thread-11) [pychromecast.socket_client] [TV(192.168.2.229):8009] Connection reestablished!

In 2024.5.x we will introduce a debug mode that can catch many non-thread-safe operations. Coupled with turning on asyncio debug mode this will catch ~90% of threading implementation errors in integrations

If you have an integration blocking the event loop or doing non-thread-safe operations:

  1. Install profiler integration Link to Integrations: add integration – My Home Assistant
  2. Enable asyncio debug service as soon as possible after startup Profiler - Home Assistant
  3. Watch logs for RuntimeError: Non-thread-safe operation and long asyncio delays
  4. Download and post logs with full trace

For 2024.5.x and later Home Assistant debug mode can also be enabled in configuration.yaml

homeassistant:
  debug: true
6 Likes

I checked the entire log from the crash on 4/26, there were no “RuntimeError” or “thread-safe” in the logs (or variances thereof). I also checked my current logs (running since 4/26 with profiler active and no crashes) and also came up empty.

So perhaps there is something else going on here?

If debug mode was not on it won’t be looking for or blocking the non-thread-safe operations

So I need to wait for 2024.5 and enable that debug mode?

asyncio debug mode can be already be turned on with the profiler integration today by following the instructions in Improved Stability - #11 by bdraco

For Home Assistant’s debug mode, you’ll need to wait until 2024.5 is released tomorrow.

So we’re going in circles. Yes, I did that, and it is active now (my 4 day 22 MB log file) as well as in the log file from the crash mentioned above.

In that case, I think waiting until tomorrow’s release and enabling Home Assistant debug is the way to go.

All has been well for weeks, and then HA hung on my again this morning. Of course I did not have debugging turned on at the time.

From looking at the log, HA seems to have lost network connectivity entirely around 5 a.m. I restarted the container when I got up and it came up fine, and I have no evidence of anything else having network difficulty (and my HA runs in Docker on Synology DSM, I would usually get an email if the DDNS disconnects). But in addition to Chromecast errors similar to before I have a slew of timeouts, “took longer than scheduled update interval” or flat out “cannot connects” starting around that time, covering local resources (e.g. MQTT) and remote/internet ones. Very strange.

There is an updated guide available with more ways to track down problem integrations:

6 Likes

Thanks. I’m running with debug enabled now. We’ll see what happens…in a few weeks… :slight_smile:

I did find that it seems to be failing Friday mornings between 5-6, which is when I had the DSM Hyperbackup integrity check running. Maybe it’s a coincidence as I can’t think of why that would impact anything, but I rescheduled that as well, so at least I can see if it starts failing at a different time.

Six weeks, to be exact. Now HA stopped around 1 a.m. which is precisely when I rescheduled the integrity check on my system to run. And, as it has been running for 6 weeks now without affecting HA. Further, the integrity check was done at 2:30 but HA was still unresponsive when I woke up at 5:30 and restarted it.

This problem is vexing me. I think I have the trigger now but why it’s happening I have no idea. If it were conflicts, I’d expect it to clear up and resume when the task was done.

Anyway, it won’t let me post the full logs, but around the 1 a.m. there were a ton of these (which I get tons of now all day long fwiw):

2024-07-12 01:01:50.439 WARNING (MainThread) [homeassistant.util.loop] Detected blocking call to open inside the event loop by integration 'ps4' at homeassistant/components/ps4/__init__.py, line 176: games = load_json_object(g_file) (offender: /usr/src/homeassistant/homeassistant/util/json.py, line 78: with open(filename, mode="rb") as fdesc:), please create a bug report at https://github.com/home-assistant/core/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+ps4%22
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/src/homeassistant/homeassistant/__main__.py", line 223, in <module>
    sys.exit(main())
  File "/usr/src/homeassistant/homeassistant/__main__.py", line 209, in main
    exit_code = runner.run(runtime_conf)
  File "/usr/src/homeassistant/homeassistant/runner.py", line 190, in run
    return loop.run_until_complete(setup_and_run_hass(runtime_config))
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 672, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 639, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1988, in _run_once
    handle._run()
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 680, in _async_handle_interval_callback
    self.config_entry.async_create_background_task(
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 1164, in async_create_background_task
    task = hass.async_create_background_task(target, name, eager_start)
  File "/usr/src/homeassistant/homeassistant/core.py", line 860, in async_create_background_task
    task = create_eager_task(target, name=name, loop=self.loop)
  File "/usr/src/homeassistant/homeassistant/util/async_.py", line 37, in create_eager_task
    return Task(coro, loop=loop, name=name, eager_start=True)
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 1050, in _async_update_entity_states
    await entity.async_update_ha_state(True)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 941, in async_update_ha_state
    await self.async_device_update()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 1289, in async_device_update
    await self.async_update()
  File "/usr/src/homeassistant/homeassistant/components/ps4/media_player.py", line 160, in async_update
    self._parse_status()
  File "/usr/src/homeassistant/homeassistant/components/ps4/media_player.py", line 166, in _parse_status
    self._games = load_games(self.hass, cast(str, self.unique_id))
  File "/usr/src/homeassistant/homeassistant/components/ps4/__init__.py", line 176, in load_games
    games = load_json_object(g_file)

Then at 1:05 things start reporting delays and/or connection failures, like:

2024-07-12 01:05:22.488 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.refrigerator is taking over 10 seconds
2024-07-12 01:06:06.563 ERROR (MainThread) [homeassistant.components.fully_kiosk] Timeout fetching 192.168.2.222 data

Then from 1:14 until 5:30 (when I restarted) all I have is repeats of:

2024-07-12 05:24:09.849 ERROR (Thread-12) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error reading from socket.
2024-07-12 05:24:09.854 WARNING (Thread-12) [pychromecast.socket_client] [TV(192.168.2.229):8009] Error communicating with socket, resetting connection

Hi, FYI what this category is about

Be aware that this is not the right place for posting this and expect something is done about it.
Just saying…

Got to be that guy eh? Ok, I’ll bite.

FYI - the request is to improve stability. Except at least half the people believe there are no stability issues because THEY don’t have stability issues. Have I proved it yet? No, probably not to them.

FYI - a good portion of those will say there are only stability issues from 3rd party plugins. Except the one legit error I have is from a built in integration.

FYI - I’ve also gotten more help here on this post than I’ve got from any post I’ve ever made in this community. I’ll take what I can get.

Thanks for your help.

No clue what you mean by that, I was just trying to help you by telling that a feature request is not the best way to get help with your current problem but hey, just have a ball!

All your posts here have been so helpful. Thanks.