HassIO stops responding every so often

doxyl · March 2, 2020, 12:43pm

I have been using HassIO since the beginning on a Raspberry Pi 4 4GB model. The unit uses the RPi4 power supply and has always been directly connected via ethernet into my router. The unit also has a dedicated static IP (set both on my router and on HassIO).

I’ve been running for half a year on this configuration and it’s been great. However, this past weekend I was gone and went to check the status of my lights and was not able to access HomeAssistant. I was able to log in to the Orbi and check and the Raspberry Pi was showing that it was not connected to the router (I only have 4 ethernet connections, so it was easy to check).

Came home the next day, could not access it whatsoever so I pulled the plug and let it reboot. Rebooted fine, but could not find any logs. I went in and updated everything (no auto-updates) so everything is new and updated and restarted. Went to bed last night, and woke up to it being unresponsive again. Tried going through SSH to see if I could find anything but I could not.

The home-assistant.log file only shows information from the current boot, and since I restarted it this morning it had no logs
I only run Phillips Hue lights (6 of them), my garage door, and a boolean switch to activate some MQTT sensors for use in Node-RED, so nothing too heavy

Already tried searching for this issue but only came up with threads that were created >2 years ago.

Any idea of what is happening here? Any tips on how I should debug this?

EDIT:
After the problem started a month ago, I went ahead and created a NodeRed automation to reboot my Pi everyday around 6PM. I have not encountered the problem of freezing/crashing since. While this is definitely not a permanent fix, it works for the time being until such a fix is deployed.

Piggyback · March 2, 2020, 2:24pm

I ran into similar issues some time ago and noticed my database was growing as I tracked all devices. Could it be you are running out of space? (doesn´t really sound like it as I would expect it not to boot, but worth checking).

doxyl · March 2, 2020, 4:53pm

Here is the output of df:
Screenshot from 2020-03-02 11-52-18

in /root/config, the size of home-assistant_v2.db is only 24.8M, which doesn’t seem like a lot.

pdkwork · March 3, 2020, 12:00pm

I am also facing the same issue as you. I have a remote NUC running Hass.io in docker and intermittently I can’t access it. I have had to install a TPLink switch on the power to the NUC to be able to reboot. I also have a development version in my home based on RP3 and it also behaves similarly but not at the same time. In my investigations it appears to loose the ability to lookup DNS. In the logs for the supervisor I see:
20-02-23 21:40:05 INFO (MainThread) [hassio.store.git] Update add-on https://github.com/hassio-addons/repository repository
20-02-23 21:40:10 ERROR (MainThread) [hassio.store.git] Can’t update https://github.com/home-assistant/hassio-addons repo: Cmd(‘git’) failed due to: exit code(128)
cmdline: git fetch --depth=1 --update-shallow -v origin
stderr: ‘fatal: unable to access ‘https://github.com/home-assistant/hassio-addons/’: Could not resolve host: github.com’.

I have not found out how to resolve this but these errors appear when I can’t access the system.

Jamie_Best · April 16, 2020, 2:06am

Following as mine is doing the same, rpi3b+ with ssd running hassos. I have started to track cpu and memory and they go nuts just before the unresponsiveness happens.

It runs well most of the time but then get this spike and memory fills and things become unresponsive. I can still navigate most lovelace pages, but can’t load supervisor or history (things like that) It seems to settle when the OS gets re-started. I have read that loop energy can be the cause so I will comment this out and see, it doesn’t work anyway.

Sventhebrit · April 16, 2020, 5:53am

I am seeing the identical issue with my setup - HASSIO running on RPi 3B. HASSIO stops responding intermittently - unavailable via the app and connecting via WebUI. I was originally pulling the power to recover it, but recently realised that it does recover on its own and works fine - normally after trying to access via iOS app it becomes responsive after around 20 seconds.

This could be totally unrelated, but the problem literally started the day after COVID-19 lockdown in the UK. I have Life 360 connected to it and I assumed that regular updates from Life360 may have been keeping it awake - since no one is leaving the house and moving around it won’t be receiving any updates and may have been going into some sort of sleep mode.

At the same time, I did get a new router from my broadband supplier - I upgraded from BT HomeHub to HomeHub 2. I did swap back to the old hub for a couple of days and problem still persisted so I don’t believe it is related to this.

Any thoughts?

Thanks
Steve

ashfaaaa · April 24, 2020, 12:17pm

i’m facing the same problem within 24 hours of replug RPi it goes offline. facing this problem after i updated to 0.108. i cant even access config folder from samba. anyone found why does it happen or a fix ?

RobertoCarlos · April 24, 2020, 6:22pm

Hi,
I’m facing the same issue. Web gui stops to respond, samba and ssh is unavailable too. When connected to external monitor I can’t see anything (nothing displayed). It happened just now for a 3rd or 4th time within last couple of days/week. Any idea where to start? I cant see anything in the logs too - seems they are cleared after restart. Or I’m not checking correctly.

Im using Raspberry 3b with external ssd on usb.

Cheers
Roberto

netizen24601 · April 24, 2020, 7:03pm

Are you by chance using the MyQ integration for the garage door?

In the past when I enabled that it Home Assistant would lock up after about 24 hours or so. I suspected a memory leak, but just disabled the integration and the lock ups went away.

RobertoCarlos · April 24, 2020, 7:11pm

No. Dont use that. I have only couple of things installed. Recently I switched to built in mqtt broker. Apart from it I have only couple of integrations.

Interesting that I observed high cpu spikes and the unit started to heat up significantly. I added a fan and changed the case to allow more air flow.

Where I can look for system level logs? The one available via webgui say nothing

Oh, and this time it recovered without power restart. Recently i also changes power supply- no change.

doxyl · April 24, 2020, 7:16pm

I am using the MyQ integration. Memory leak does sound likely if daily restarting has fixed the issue.

RobertoCarlos · April 24, 2020, 7:32pm

OK. what about the logs?
The one in /config/home-assistant.log says exactly nothing
Now it shows only records from last hour and the system is up for a day… there has to be something else.

netizen24601 · April 25, 2020, 1:25am

I can’t find my notes, but seem to remember myq errors in the logs, but nothing that indicated a crash. But essentially after a while the whole Home Assistant instance would just stop responding. The log would just stop writing. Turning off MyQ integration everything worked again without daily restarts or lock ups. I’ve been meaning to turn it back on, but now you got me worried the issue is still there.

bastero · April 28, 2020, 3:58am

I’m getting the same issue as all the above are stating ie. HA because unresponsive. Initially I thought it was because I’d evoked TasmoAdmin, however the issue persists and like was mentioned above sometimes recovers itself. I do have MyQ integration and noticed today after open and closing my garage door, that the state was stuck on ‘closing’ and never changed to ‘closed’. Only by resetting HA did the cover state reset to ‘closed’. I still haven’t resolved the instability of HA. Thoughts anyone?

I’m running on RPi 3+ with and SSD Drive with 0.108.8, HassOS 3.13 & 219 Supervisor. To the best of my knowledge the instability started within the last few upgrades.

francisp · April 28, 2020, 4:14am

Maybe that is the cause : (coredns going nuts)

coredns

Dinocam · April 28, 2020, 11:58pm

I have the same issue as above, often HA becomes unresponsive for minutes, sometimes hours at a time then comes back to life. There doesn’t appear to be anything in the logs to indicate what the issue is. I’m not running MQTT (as some have mentioned here). running on raspberry pi3, latest versions of core and supervisor.

RobertoCarlos · April 29, 2020, 8:33am

Hi, are you using external hdd? or sd-card? Someone suggested that might be a root cause. I’m testing my setup on sdcard for few days now - raspbian with HA on docker. So far, so good. If everything will be fine I’ll move over to that setup. Before I had hassio setup “directly” on SSD.

RobertoCarlos · April 29, 2020, 8:35am

Agree, it was OK for weeks/months and since last system update it went crazy…

Jamie_Best · May 10, 2020, 3:20pm

Just an update from me, I haven’t had any problems for a while, I removed the loop energy sensor, and kept on top of updates.

The problems seemed to go away when I rebooted the host system (ssd install of hassos).

Back to smoothness for me.

I’m now tasmotizing anything that I can to get rid of Tuya.

RobertoCarlos · May 10, 2020, 9:32pm

Hmm. So, now when I spent a lot of time to migrate from hassio on ssd directly to hassio on raspbian with docker it seems that it works PLUS this method got depreciated recently. How nice.

Any idea what path should be PROPER now? Bit frustrating I would say…