Runs great for 17 hours after a reboot, then all hell breaks loose

I’m running Home Assistant 2023.5.3 Supervisor 2023.04.1, Operating System 10.1. I’m running it as a VM on ESXi on a Nuc i5. A few weeks ago, I started having to restart home assistant once a day, every day because tons of entities like camera feeds, door sensors, door locks, etc… , device trackers etc…would suddenly stop showing in the browser.

When I would go to automations, almost all the triggers would not be displayed. All of these problems will disappear after a reboot for approximately 17 hours, then errors will start showing up in the log from all browser Ip addresses such as my main computer, and my 3 wall tablets. When that happens, a refresh of the browser doesn’t help, only a reboot of Home Assistant. The error from the computers accessing Home Assistant from browsers is:

2023-05-17 17:08:40.253 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140609537941440] Error handling message: Unknown error (unknown_error) Jeff from 192.168.1.205 (Mozilla/5.0 (Linux; Android 10; Lenovo TB-X505F Build/QKQ1.191224.003; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/112.0.5615.135 Safari/537.36)
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/connection.py", line 180, in async_handle
    handler(self.hass, self, schema(msg))
  File "/usr/src/homeassistant/homeassistant/components/config/entity_registry.py", line 80, in websocket_list_entities_for_display
    + ",".join(
  File "/usr/src/homeassistant/homeassistant/components/config/entity_registry.py", line 83, in <genexpr>
    if entry.disabled_by is None and entry.display_json_repr is not None
  File "/usr/src/homeassistant/homeassistant/helpers/entity_registry.py", line 224, in display_json_repr
    dict_repr = self._as_display_dict
  File "/usr/src/homeassistant/homeassistant/helpers/entity_registry.py", line 206, in _as_display_dict
    if (precision := sensor_options.get("display_precision")) is not None:
RecursionError: maximum recursion depth exceeded while calling a Python object

The error with show up with the IP address of my main computer and the 3 wall tablets and they continue throughout the log until I reboot. Then all is good for roughly 17 hours. I’m not sure how to troubleshoot this issue because unless I’m overlooking something, there doesn’t seem to be any specific integration that is producing an error.

As I said this has gone on for a few weeks. A couple of days ago, the Nvme drive went out in the NUC. I ordered a new one, reinstalled ESXi, created the vm and restored Home Assistant without issues. I was hoping my problem would go away, but it is still with me every single day. I’m not looking to be spoonfed the solution, but can you at least point me in the direction on where to start. My logging level is currently set to info. Should I post my whole log file? Not sure if this will work, but here’s a dropbox link to my whole log file from reboot to just before I reboot again because of the issue: The errors in question start at 17:05:25.

Link Removed

This may be stupid, but just spit balling here: The line before the error says Home Assistant is “Setting up sensor.upnp”. I wasn’t sure what that was about so I looked at my integrations and one was setup for upnp for a wireless access point I was using. That wireless router went out a few weeks ago, and I replaced with a different model.

I just now disabled that upnp device on the integrations page. Is it possible that the error shown in my first post about “maximum recursion depth exceeded while calling a Python object” was due to exceeding some maximum limit while looking for this hardware that I no longer have on my network? …and it takes roughly 17 hours to reach that limit?

In the log file linked above there is 1172 occurrences of the text string “Setting up sensor.upnp”.

1 Like

Do you have a loop in your automations or scripts that never exits?

There shouldn’t be. None have a repeat while/until loop.

Very possibly the upnp integration then. How long ago did you disable it?

It sounds like it goes nuts at about the same time every day.
Do you restart it the same time every day?
If so could you add a restart at about 15 hours in to see if it is something thy builds up, like a memory leak or if it is a time specific event?

1 Like

Are there any integrations you have setup or enabled in the last little while, around the time the issue started happening ?

Do you have any custom components ? At a stretch if you cant finy out anything else, start with disabling custom components. If it stays up, enable them 1 by one.

It can be annoying, but ive found a number of uptime and performance related issues with custom components.

1 Like

You might want to keep an eye on memory consumption over those 17 hours. And maybe I/O. But it does sound as though you are on the right track already.

You should post a GitHub issue, because the integration shouldn’t do this. But no programmer is perfect. Nevertheless if they don’t know it’s happening, it will recur for someone else.

1 Like

For me maximum recursion depth exceeded while calling a Python object was caused by a Linkplay custom component. I think it is fixed in the latest release but I just went way back in time to the version that worked for me.

However in this instance it looks like the upnp sensor may be stacking aiohhtp requests - similar issue and it will eventually cause this issue.

1 Like

Roughly 11 hours ago. I’m hoping that will resolve it.

Its not at a specific time, but does seem to always be around 17 hours after a restart. I probably can automate a restart every 16 hours if I can’t get to the root of the problem.

Well I made it 24 hours without having to reboot Home Assistant for the first time in a few weeks. It seems disabling the UPNP integration for that wireless router that no longer exists resolved it. I have posted a github issue for that UPnP/IGD integration. Once I’m sure the code owner doesn’t need me to do any troubleshooting, I’ll delete that device integration which I should have done when I got rid of that device.