Home Assistant becomes unresponsive at a certain time

sesardelaisla · April 22, 2022, 12:24pm

Hello All,

I have been experiencing a HA strange behaviour from Core version 2022.4, apparently. It does happen almost everyday. When it happens, it always does around 20:30-21:00 my time. HA dashboard becomes slow and unresponsive. Devices reaction to commands/automations take longer than usual. Restarting core solves the problem and problem doesn’t occur again until the next day at around the same time.

Every time it happens, I don’t see any abnormal CPU usage. My gear is a Celeron NUC with an SSD and 8GB RAM. It’s been working like a charm for a year or so I don’t see any relevant error in the logs. It doesn’t mean there is not any, just that I am not able to find something related to this problem.

My db recorder is set to be auto-purged every 30 days. Logger is set to errors only. Given that unresponsiveness happens at a certain time, I guess it could be related with an automation triggered at a specific time and problem starts. However, I haven’t found any wrong setup in automations triggered around such time slot.

In the screenshot attached, you can see normal CPU (figure 1) and RAM (figure 2) usage at 21:37, which it is the time when light.garage (3) was lit upon motion detection. I noticed that the problem was already happening because the light turns on at once usually. However, at such time, it took about 2-3 seconds to turn on. Then, I checked the dashboard both in my mobile and my wife’s, and both companion apps were very slow when playing through menus and switches. A couple of minutes later, I restarted the core (around 21:41), and that’s the reason the CPU and RAM has sudden changed to higher and lower values.

Is there any way to check whether there is some stuck subprocess, automation, integration, device or whatever when the issue occurs? Any suggestion which helps to solve or at least narrow down the issue will be much appreciated!

sesardelaisla · April 26, 2022, 2:24pm

Answering to myself, just in case it helps someone else.

Finally, I managed to find out what the problem was. I suspected that the problem was related to automations, because it usually happened at the same time, and I have some automations which run around sunset.

Last night, when issue occurred again, I tried to reload automations only, instead of restarting the whole core, and problem disappeared at once. Then, I checked the automations that were triggered around that time and did some tests to narrow the bad one down, and I found that I had a “repeat until a certain condition is met” step set in one specific automation. However, the condition was not the right one, so it actually looped such condition step forever, because such condition could never be met. After setting the right condition, everything is working flawlessly.

By the way, my bad not to check the logbook before. After solving the problem, I realized that it was full of entries related to the automation step that was being repeated over and over…

Conclusion: Be very careful when using “repeat until” conditions, as they might fall to an endless loop!

tom_l · April 27, 2022, 1:37am

That’s why I always like to use a loop count as an extra “get out” condition

  - repeat:
      sequence:
      - service: homeassistant.update_entity
        entity_id: media_player.viera_st50_series
      - delay: 2
      until:
      - condition: or
        conditions:
        - "{{ repeat.index >= 15 }}"
        - "{{ is_state('media_player.viera_st50_series', 'on') }}"

Just in case the entity the condition depends upon is unavailable for some reason.

Worst case the the loop does not do what is intended (and this is usually noticed and can be debugged) , but at least it does not loop for ever.