2 Servers 2 Locations, both running HA, both quit working after recent server update

arretx · December 9, 2020, 6:36am

2 Servers, one is 20.04 and one is 18.04.

apt-get update
apt-get upgrade

Common problem I’ve seen is that every time there’s a docker update HA throws the “unable to connect” page until the update is completed, then everything is back to normal.

This time, both servers, both operating nearly identically, quit working. The supervisor service isn’t auto starting, and even when manually started, I get the “retry” error connection page for HA.

The systemctl service shows this on one:

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 WARNING (MainThread) [supervisor.misc.tasks] Watchdog found a problem with CoreDNS plugin!

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (MainThread) [supervisor.plugins.cli] Can't start cli plugin

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (MainThread) [supervisor.misc.tasks] CLI watchdog reanimation failed!

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (SyncWorker_0) [supervisor.docker] Image homeassistant/amd64-hassio-multicast not exists for hassio_multicast

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 INFO (MainThread) [supervisor.plugins.dns] Starting CoreDNS plugin

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (MainThread) [supervisor.plugins.multicast] Can't start Multicast plugin

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (MainThread) [supervisor.misc.tasks] Multicast watchdog reanimation failed!

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (SyncWorker_3) [supervisor.docker] Image homeassistant/amd64-hassio-dns not exists for hassio_dns

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (MainThread) [supervisor.plugins.dns] Can't start CoreDNS plugin

Dec 08 23:31:03 server1.newvalleyresources.com hassio-supervisor[34323]: 20-12-09 06:31:03 ERROR (MainThread) [supervisor.misc.tasks] CoreDNS watchdog reanimation failed!

And this on the other:

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (SyncWorker_4) [supervisor.docker] Image homeassistant/amd64-hassio-audio not exists for hassio_audio

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (MainThread) [supervisor.plugins.multicast] Can't start Multicast plugin

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (MainThread) [supervisor.misc.tasks] Multicast watchdog reanimation failed!

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (MainThread) [supervisor.plugins.audio] Can't start Audio plugin

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (MainThread) [supervisor.misc.tasks] PulseAudio watchdog reanimation failed!

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 WARNING (MainThread) [supervisor.misc.tasks] Watchdog found a problem with CoreDNS plugin!

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 INFO (MainThread) [supervisor.plugins.dns] Starting CoreDNS plugin

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (SyncWorker_5) [supervisor.docker] Image homeassistant/amd64-hassio-dns not exists for hassio_dns

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (MainThread) [supervisor.plugins.dns] Can't start CoreDNS plugin

Dec 08 23:32:04 server2 hassio-supervisor[6064]: 20-12-09 06:32:04 ERROR (MainThread) [supervisor.misc.tasks] CoreDNS watchdog reanimation failed!

No idea how to tackle this problem.

Ewout · December 9, 2020, 12:24pm

Same problem here

It seems to have something to do with a Docker update…

MandM78 · December 9, 2020, 2:09pm

Same here. 18.04. Apparently my HA instance stopped working at 2am this morning and nothing I’ve tried has gotten it to come back yet. Just the “Retry” page…

rented_mule · December 9, 2020, 2:17pm

Here as well. Running openmendiavault/Debian. Took a docker update this morning and now all but dead.

gdschut · December 9, 2020, 2:40pm

Same for me.
Would getting back to a previous docker version help? Any idea which version to revert to?

Ewout · December 9, 2020, 3:13pm

Yes, reverting to docker 19.xx works. After that you’ll probably need to restore a snaphot, but that’s how i got things running again.

gdschut · December 9, 2020, 3:22pm

Reading my logfiles I saw that my upgrade included:

Unpacking docker-ce-cli (5:20.10.0~3-0~raspbian-buster) over (5:19.03.12~3-0~raspbian-buster)
Unpacking docker-ce (5:20.10.0~3-0~raspbian-buster) over (5:19.03.12~3-0~raspbian-buster)
Unpacking containerd.io (1.4.3-1) over (1.2.13-2)

(and more, but seems not related)

Do you know if I also have to revert containerd.io ?

arretx · December 9, 2020, 3:56pm

Assuming we don’t know how to revert to a previous version of Docker, and assuming we have only the /usr/share/hassio/* folder structure preserved (don’t have snapshot???)…

…how would one go about doing this?

gdschut · December 9, 2020, 3:59pm

I just downgraded to the latest 5.19 version of docker-ce by the use of aptitude, which I installed for this, and using this guide: https://askubuntu.com/questions/900536/how-do-i-use-aptitude-to-downgrade-force-version-of-a-package.
right now everything is starting up. At least my HA containers are there again, but not everything is working yet…

update:
All docker images of hassio addons were gone. After clicking ‘start’ for each addon in Supervisor the exact name of the missing image was displayed, which I manually downloaded with
docker image pull <name>

Ewout · December 9, 2020, 4:16pm

~~I got things working on the latest version of docker-ce now.~~

~~What I did is restart hass a couple of times using # hassio-supervisor start / stop / status. After a few attempts hass started. Then i restored a snapshot.~~

~~It takes some waiting in between, bus after about half an hour things are running again~~

Never mind, this didn’t survive a reboot…

MandM78 · December 9, 2020, 4:39pm

Bummer. I was hoping for a simple fix like this.

gdschut · December 9, 2020, 4:51pm

Downgrading docker-ce package to 5.19 version solved the problem for me.
After that I only had to re download the docker images of the hassio addons I use.

Ewout · December 9, 2020, 4:53pm

I did this:

sudo apt-get install docker-ce=5:19.03.14~3-0~ubuntu-focal docker-ce-cli=5:19.03.14~3-0~ubuntu-focal containerd.io

Then restored the latest snapshot.

That seems to do the trick. Simple enough, although I’m stuck with an old version of Docker for now as it seems.

patagona · December 9, 2020, 5:42pm

Had the same issue in Debian.
Installing the old version of docker-ce via

sudo apt-get install docker-ce=5:19.03.13~3-0~debian-buster

and manually pulling the plugin docker images via docker pull got it up and running again.

MandM78 · December 9, 2020, 6:33pm

~~I tried your method and it’s throwing an error saying that:~~

~~E: Version ‘5:19.03.14~3-0~ubuntu-focal’ for ‘docker-ce’ was not found~~
~~E: Version ‘5:19.03.14~3-0~ubuntu-focal’ for ‘docker-ce-cli’ was not found~~

~~Any idea where I messed up?~~

I’m a dope. I’m not running focal… I’m running bionic. Thanks!

arretx · December 10, 2020, 1:45am

Ok, welp…so far so good.

Using Aptitude, I installed the previous version of docker-ce and docker-ce-cli and moments later I was able to get to the HA dashboard.

Under ‘Supervisor’, the add-ons will appear as though they are there, but you’ll need to pull each one of them to get them back online:

sudo docker image pull <imagename>

To get the image name, try starting one of the add-ons. You’ll get a popup error showing you the name of the image, which will be something like:

("No such image: hassioaddons/grafana-amd64:5.3.6")

so this one would be:

sudo docker image pull hassioaddons/grafana-amd64:5.3.6

After that, I used Aptitude to put a hold on docker-ce and docker-ce-cli upgrades.

Now, if only I knew why this happened and how to prevent it…and when I’ll be able to release these so they’ll upgrade.

febalci · December 10, 2020, 11:49am

I have been pulling my hairs since last midnight… I always blamed it to new supervisor requirements when i am on ubuntu 18.04 on Intel NUC. I even updated to Ubuntu 20.04. Now i find this, went back to docker 5.19; everything is fine. Thanks a lot guys, you are the best…

gadric · December 10, 2020, 12:17pm

You are really great guys. Had the problems with Debian buster since yesterday and did not know how to fix this. But the downupgrade to docker 5.19 worked.

Really curious what they did in verison 5.20 …

modem · December 10, 2020, 2:28pm

Same thing happened to me, but in my case the home-assistant_v2.db database got corrupted somehow, so I’ve lost al my history.

pabloluisperez · December 10, 2020, 2:55pm

Same issue here and same solution.
Ubuntu 20 in my case.
Just rollback to 5.19 and everything started as before.
No need to restore a snapshot.

Thank you!