Running latest 2022.5 version on Docker on Ubuntu x86 computer.
When (re)starting the docker container, everything works great, all works and everything.
After some time, about an hour or so, various. things stop working:
HomeAssistant is not accessible through :8123 port
Iāve also added HealthChecks.io support - health checks are stopped being fired.
Automations stop working
On the other hand - some other things keep āworkingā:
Docker container is alive and well
Logs of background services are still being written (i.g. INFO (Thread-14) [pychromecast.socket_client] [Entryway clock(192.168.1.123):8009] Connection reestablished!)
This started sometime with 2022.5 (not sure exactly when).
Iāve tried to change the logs to debug (all of them, via `logger: default: debug") ā but no correlated logs appear at that time, it looks like everything works well.
How can I debug the issue?
Anyone experienced something similar?
Some extra data if needed:
Version core-2022.5.4
Installation Type Home Assistant Container
Development false
Supervisor false
Docker true
User root
Virtual Environment false
Python Version 3.9.9
Operating System Family Linux
Operating System Version 5.15.0-27-generic
CPU Architecture x86_64
Timezone Asia/Jerusalem
How are you checking this issue? In browser, through app, both?
If you have Linux, you may try nmap haserverip
This will list all available ports and might tell something about problem
I use portainer. Portainer allows access to live log of docker container. This would be good for checking this problem
Could this be memory issue or storage problem.
Iāve seen similar issue on raspberry pi when sd card storage is failing.
Have you reboot host server?
Iāve seen host server updates create havoc with HA docker container. Rebooting twice(not sure why) resolved this. As a rule I expect updates on host kernel to cause this and usually notice it immediately.
Have you tried go back yo older HA version? Does this change issue. Would help verify if itās version related
Iāve had the same issue with a integration that was choking.
HA received so many updates from the sensors of that integration that it stopped working after x time.
For me it was about 12~18 hrs.
Perhaps you could disable some integrations or automations to see if any of them is causing this issue.
It could be a simple bad logic flow in an automation causing this.
I recently had an issue with an automation for a phone charger that tried to switch it on constantly until the device was detected as charging. It would eat up all the allocated RAM at a rate of about 1Gb/10 mins and render my HA inaccessible within an hour, but seemed to half limp along. Like a zombie
So with that in mind, Iād recommend restarting HA, de-activate all automations, restart again and monitor it. If itās stable then restart them 1/2 at a time until you hone down what one may be causing the hang-up.
If disabling all the automations donāt resolve it then you at least know they arenāt the cause and you can try disabling the integrations to see if a device is the source of the problem
Iāve had some issues with Docker containers on Synology that stop responding (or rather, stop accepting network connections). When it happens, there are some messages being logged about TCP socket memory being depleted.
Actually Iām suprised I didnāt think of it myslelf.
The issue started after I added a new integration. Disabling it solved the issue. Thanks everyone
I was developing a custom integration and was using Python locks to try to coordinate threads. Got some reports of Home Assistant locking up (eg UI stopped responding, required HA to be restarted) - since this started occurring immediately after that change - I suspected it. Also Iāve seen so many deadlocks in my career - I was pretty sure that was the culprit. I reworked the app to avoid needing a lock at all.
We may want to do a code search on that integration looking for locks or other multi threading constructs. What integration was it?
I removed the mutex and added āPARALLEL_UPDATES=1ā to let HASS prevent multiple access to the API.
Will continue investigating this weird case, I am not sure how HASS infrastructure access our API objects and whether itās from different threads or different workers.
It can and does. Be glad to look at it. For me what I saw doing callbacks into
schedule_update_ha_state()
With a lock, and then having HA call into entity methods / properties in parallel that would use the lock, because HA has some internal locks it uses. So the HA lock was allocated by the method/property call and the HA lock was required for schedule_update_ha_state. Hence the deadlock. This was a year ago, so the details may not be 100%.