Here I see that at some point HA stops registering the sensor and then after about ~15 minutes, it starts again, usually coming down from a high CPU usage. This makes me think that there is some process (maybe an add-on) that spirals out of control and then recovers. While HA is responsive, ssh also becomes irresponsive, so I cannot look at top for example.
I have no (simple) idea of how to debug this because the log files (both supervisor and core) show nothing of relevance. Does anyone have any suggestions?
I am not entirely sure whether this happened for me since 0.110 though. I have been having issues for a few weeks and recently moved from Ubuntu to a Proxmox setup because of the (reverted!) deprecation announcement. So I am not sure whether it’s because of that or some HA bug.
Home Assistant is built around an event loop. This means that there is always only a single task running at the core of the system. When a task needs to do I/O, they schedule an I/O task and suspend themselves until the I/O is done.
I don’t know if it is the cause here, but one reason for Home Assistant not acting for several seconds is if an incorrectly coded integration is doing I/O inside the task, instead of scheduling an I/O task. Now the whole event loop blocks and no other tasks are processed until the I/O is done.
A good first start is to make sure you don’t have any warnings about I/O in the event loop in your logs. If you do, get those fixed should be step 1.
I noticed something similar this last week. Maybe related? I’m running hass supervised on a nuc. I noticed my nuc fan was full speed following a restart and would not come down, temp was spiking. Upon investigation the python3 process for the supervisor was using 100% cpu and not going down. Since I’m in a docker, I restarted the supervisor and everything went back to normal. No idea what caused the problem as I didn’t see anything in the logs, but it seemed to be a pretty good bandaid.