HassIO stops responding every so often

Just read the news - Home Assistant Supervised/on general Linux depreciation is on hole. Another twist in the story! :smiley:

Iā€™m curious what sort of diagnosis and resolution people have reached for this problem?

I was having the same problem with my install on raspberry Pi 3B, in my case removing the androidtv integration for takling to my nvidia shield TV over adb fixed the problem. But Iā€™m not really sure why it fixed the issue.

Itā€™s just started happening to me as well. Itā€™s much more often than once a day though. It doesnā€™t seem to last past an hour or two anymore.

In case this helps anyone else, I disabled UPNP on my network and the lock-ups stopped immediately.

I have to retract that. Itā€™s still locking up, just far less frequently.

I might have the the same/similar problem.
Happens after some days.
Iā€™m running Hassio on a Raspberry Pi3+, latest version. For months no problems.

Now my network monitoring troggers because ICMP fails. SSH and core also fails.
Sometimes some ICMP Responses go through again, but core/SSH is not coming back, only power reset helps and core runs again after ha core rebuild.

How to debug this situation?
There seem to be no logs available.

Is this solved for you guys? Maybe any hints? I have the same problems for 2 months now. Changing the sd card and the pi did not help. Changing to matia db also no success.

I disabled most of my add ons such as nodered adb and google drive backup. Deconz is still running cause i need my lights.

Suddenly my cpu usage goes up till hass freezes.

Very frustrating considering all worked fine before.

3 Likes

This is unlikely to be the answer you are looking for, but after months of not getting anywhere on this issue in the end I wiped my Pi and did a fresh install of Home Assistant. Obviously saved copies of my YAML files, so getting things back to my setup again didnā€™t take that long (although worthing noting that I did not ā€˜restore from snapshotā€™ but reintroduced my old yaml bit by bit to the new install - I wanted to see if it was anything I had done that had caused these problemsā€¦ but now I donā€™t think that it was)

that was a couple of months ago and HA has been running flawlessly since then. Integrations etc are all the same as before, so Iā€™m putting this down to some kind of HA error introduced over the last couple of years - I had been running that instance of HA since 2017 and many big changes have happened to the codebase in that time - I figured a fresh install would do no harm ā€¦and, in my case, that seems to have been the right approach.

3 Likes

Thank you for your reply.

It seems you are right. I startet over from scratch, so no snapshot backup.
The problem seems to be gone now for 3 days in a row (HA freezes every day).
I donā€™t see memory or cpu spikes and no unavailable deconz lights anymore.

Most of my automations, add ons, lovelace etc. are up and running again but there is still a lot of work to do.

I really donā€™t think i messed something up in my config, something broke in the core of HA in an update some time ago. So for me this solved!

I had some issue affecting the responsiveness of my devices switching especially on deconz Zigbee. At times, the Zigbee devices show ā€˜unavailableā€™ for a while then comes back again. There are not Long but some times take up to 1 or 2 minutes before switching.

I notice when the response is slow usually when the green light on my RP3B+ is on and will usually activate and response when the light dies down. I had been running home assistant and these problem doesnā€™t happen on Zigbee but tuya was slow at times prior to pgraded to 0.114. 0.109 seems stable. So i seems concur with some of you then something in the process of newer releases has been broken.

I hope someone can fix thisā€¦i feel these are critical fundamental should be robust and stable as it affects many people who has put in hours of work into it and now finding disruptive issue to the home automation. I will try to ā€˜re-builtā€™ my yaml again but it will takes time and cause disruptions which i need to plan out.

Hi all,

I have the same behavior as described above. After some time the system gets unresponsive for some hours and when waiting long enough it will get back to normal again. Further I can confirm that this is only happening now since a few months. So maybe really related to any bigger change.

Additional add-ons Iā€™ve installed are:

  • deconz
  • Samba
  • Node-Red
  • SSH / Webterminal
  • mqtt server
  • Gdrive snapshot

I had hassio already running on a SD card and on a SSD (usb disk), Raspberry Pi 3 and only the conbee stick connected to it

Steps I am planning to do to investigate further are:

  • Try to access the new Observer Plugin when this state occurs again
  • Run the deconz on a separate raspberryPi and include it to hassio

Same problem here, in my case i have this problem since january 2020. I have reinstalled all backing up yaml/etc. two months ago but the issue is still there. I remember reading about Foscam stream issue that freeze system, but donā€™t know is this is the case. The ip is still visible in local network but the web gui/apps are not working.

Raspberry PI3 b+

Same problem. I have RPi 3B+ and SSD. HassIO randomly freezes. There is nothing I can do then. Only ping works.

I had performance issues as posted here which showed as high CPU loads. These where caused by IO_Waits. Not something you might expect when using an SSD, but perhaps worth to monitor for a while.

I used a command line sensor since it is not by default available. You can create the sensor by adding the code below to your configuration.yaml

command_line:
  - sensor:
      name: CPU IO Wait
      command: top -b -n1 | grep ^CPU | awk '{printf("%.0f"), $10}'
      scan_interval: 3
      unit_of_measurement: "%"
      value_template: '{{ value }}'

As you can see in the topic, it showed a direct relation with the CPU loads.

Edit: Since 2023.6 the command line sensor syntax has changed. This edit on June 8 '23 replaces the old YAML.

4 Likes

I too am having this issue on a celeron NUC / SSD. Am running hassio (you can rename it to whatever you want ;)). In the last 3-4 months itā€™s been getting more and more unresponsive. Suddenly itā€™s not accessible but then everything works again a few minutes later, other times itā€™s hours. Whatā€™s odd is that the main UI is not accessible at all, but if I ask alexa to turn something on (node-red script that switches on a tasmota light for example) it still works. Iā€™m loathed to start over and redo it, but Iā€™m thinking this might be the only course of action to take.

2 Likes

So TL:DR, is it an hardware issue (limitation)?
You speak about SSD instead of an SD Card, but using a mechanical HDD, e.g. HA on a VM or Docker, will be there the same problems?
Thank you

Yes, it is a limitation in SD Card performance and I fixed it using a SSD (Solid State Disk), but expect you can also use a HDD.

In summary:
My HA froze from time to time as did all connections of the Conbee II stick became unavailable from time to time. I found out that both issues had one thing in common, being the high CPU load (not to be mistaking with CPU Usage percentage) before or at the moment of the issue.

I found out that the high CPU load was caused by a high CPU IO Wait (as shown in the images). What basically happens is that your CPU is waiting for the read/write actions to finish. Currently I have a max IOWait of 2% (using an SSD), where this was 100% quite often and for longer times using a SD card.

Disclaimer: The reason for the load is/was not per se HA itself. I reduced the issue a little by limiting the frequency of logbook writes, which helped a little. I use several addons which can/could also very well be the reason of the amount of IO. I however wanted to keep them, so I decided to increase performance instead of lowering the IO.

So I am not saying my solution is the only or only right solution, itā€™s basically a tip for troubleshooting.

Resume
The first step Iā€™d advice in troubleshooting not a respondong HA, is see if the CPU is obstructed in any way. By default the sensor platform ā€˜systemmonitorā€™ gives you all the tools but not IOWait. Therefor I created the commandline sensor shown above.

3 Likes

as mentioned above, i had the exact same problem. moving to a new a2 sd card and adding every integration from scratch (without any snapshots) solved the issue for me.

1 Like

The are too many SD Card classificationsā€¦ SDHC II U3 C10 V60 A2 :sweat_smile:

1 Like

Thanks again (: