Hass.io installation randomly stopped and now wont restart?

Itsiq · November 15, 2019, 5:06am

I have home assistant 101.3 running as a docker installation on a NUC running ubuntu 18.04. I got home from work today and my automations weren’t working. Tried to log into home assistant and got an “unable to connect” message. Tried restarting the NUC and same issue. Logged into the NUC and ubuntu is running fine and docker shows home assistant running away, but nothing is connecting. I pulled open the log and it looks like it randomly just stopped about 5:21pm today (can tell because I have a repeating error for mqtt which shows up about every minute or so) and then just stops. I don’t see anything else obvious on the log. Restarting the nuc doesn’t add anything to the log beyond the last entries at 5:21pm. I’m too much of a noob to know where to approach fiddling with docker. Does anybody have any thoughts here on what to try/what in the world is going on? Is this just a wipe and reinstall kind of situation? (thank goodness for backups!)

nickrout · November 15, 2019, 7:24am

docker logs home-assistant

(or whatever your container is called)

Itsiq · November 16, 2019, 2:04am

So I looked into the docker logs and found this

2019-11-14 17:22:44 INFO (MainThread) [homeassistant.components.websocket_api.http.connection.140052748539728] Connection closed by client
2019-11-14 17:22:44 INFO (MainThread) [homeassistant.components.stream] Stopped stream workers.
2019-11-14 17:22:44 INFO (MainThread) [engineio.client] Sending packet MESSAGE data 1
2019-11-14 17:22:44 INFO (MainThread) [engineio.client] Sending packet CLOSE data None
2019-11-14 17:22:44 INFO (MainThread) [socketio.client] Engine.IO connection dropped
2019-11-14 17:22:44 INFO (SyncWorker_18) [homeassistant.components.zwave] Stopping Z-Wave network
2019-11-14 17:22:44 INFO (SyncWorker_18) [openzwave] Stop Openzwave network.

I’m not sure what to make of this… everything before this was routine entries that weren’t relevant (that I could find at least.)

THEN I got a disk full notification which is weird because the disk shouldn’t be anywhere close to full. Looking around, found the /var/logs/syslog.1 file was ~180 GB in size! It seemed to be normal log until I found the error below repeated over and over, literally thousands of times a second

Nov 14 17:23:11 Beneciabrain docker.dockerd[9512]: time=“2019-11-14T17:23:11.781535339-08:00” level=error msg=“failed to get event” error=“rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: permission denied"” module=libcontainerd namespace=plugins.moby

I’m assuming it makes up most of the log, I didn’t take the time to sort through it beyond a few minutes because my system would only load it so long before it would hang. I deleted the log since it was so large I couldn’t do anything with it and my system kept slowing to a chug. Hopefully there isn’t something further buried in there somewhere. Does anybody have any insight into what is going on here?

(BTW, I tried restarting the system after deleting the log in case it was a low disk space issue, but no dice.)

nickrout · November 16, 2019, 3:03am

How full is the filesystem now?

juan11perez · November 16, 2019, 3:51am

I recall something like this happening to me a while back. Something was writing excessively to the hass dB (I was using Maria dB docker). It would go on until it took all space and made the unit inoperative.
if you can still run commands from the command line, then stop docker and delete the oversized docker container image.
If the unit is unresponsive, I hope you have a backup, because the only solution that worked for me at that point was to wipe and reinstall.

Itsiq · November 16, 2019, 7:04am

It dropped down to ~20gb, but I had to leave to do some other things and now the log for today is ~120gb. Home assistant is furiously logging away something still. I’ve stopped docker at this point and it doesn’t seem to be logging any more. I’m thinking I’m going to try to wipe it and start fresh with a backup, but haven’t been able to fiddle my way through that yet