Internal server error 500 causing HASS updates to fail

I run HASS under Docker in Debian. It has been running (and updating fine) for about six months. But this week it gagged due to an update for SSH & Web Terminal. Don’t know why, as it’s been OK in the past. Flagged install as unhealthy, refused to update anything more.

Excerpt from log as follows:

23-03-08 17:19:34 INFO (SyncWorker_0) [supervisor.docker.interface] Downloading docker image ghcr.io/home-assistant/amd64-hassio-supervisor with tag 2023.03.1.
23-03-08 17:19:34 ERROR (SyncWorker_0) [supervisor.docker.interface] Can't install ghcr.io/home-assistant/amd64-hassio-supervisor:2023.03.1: 500 Server Error for http+docker://localhost/v1.42/images/create?tag=2023.03.1&fromImage=ghcr.io%2Fhome-assistant%2Famd64-hassio-supervisor&platform=linux%2Famd64: Internal Server Error ("Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io on 192.168.20.1:53: no such host")
23-03-08 17:19:34 ERROR (MainThread) [supervisor.supervisor] Update of Supervisor failed: Can't install ghcr.io/home-assistant/amd64-hassio-supervisor:2023.03.1: 500 Server Error for http+docker://localhost/v1.42/images/create?tag=2023.03.1&fromImage=ghcr.io%2Fhome-assistant%2Famd64-hassio-supervisor&platform=linux%2Famd64: Internal Server Error ("Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io on 192.168.20.1:53: no such host")
23-03-08 17:19:37 ERROR (MainThread) [asyncio] Task exception was never retrieved
future: <Task finished name='Task-975' coro=<Addon.watchdog_container() done, defined at /usr/src/supervisor/supervisor/addons/addon.py:989> exception=AddonsJobError('Rate limit exceeded, more then 10 calls in 0:30:00')>

I’m a noob in terms of Debian and Docker so my only resource (apart from 40 years in mainframe IT) was google and the community. But there was a post about home routers causing DNS issues which caused troubles in HASS. So that was where I started looking.

Found my Network Connection was called “Profile 1” so looked for DNS info in the connection profile:

root@9010:/home/userid# sudo nmcli connection  show "Profile 1" |grep DNS
IP4.DNS[1]:                             192.168.20.1
IP4.DNS[2]:                             8.8.8.8

My Netcomm VDSL router is at 192.168.20.1 but DNS was working fine, even on the system hosting HASS (except not in the HASS update process). Nevertheless, the hint about routers and DNS made me try the easiest debugging approach – simply replace my local router ip # with Google’s ip # (so two identical ips in the connection profile). Then I rebooted the HASS host.

I still don’t believe it, but that was all it took to sort out the HA updates. The updates completed successfully in less than a minute. No more 500 errors. Sometime I should go back and tidy up by removing the duplicate, but right now I’m just happy that it’s working.

Someone with more Debian/Docker experience may be able to explain why this change made a difference. I only know it worked, but not why. Hopefully sharing this will save someone some heartache down the track!

Keywords: router, DNS, Docker, 500, “no such host”, “Internal Server Error”, ghcr.io, updates

So what I did appeared to be a misleading temporary improvement/intermittent. Although the SSH update worked that day, the HASS 2023.3.4 update still didn’t work. Decided to have a look at the router configs closely…

Discovered to my embarrassment a possible cause… The IP4 DNS config in the VDSL router had its own LAN address as the first DNS address (fail…!!) and 8.8.8.8 as the secondary DNS address. Maybe that’s why DNS was not reliable!? Just odd that everything else was working fine.

Changed the first to 8.8.8.8 and retried the HASS 2023.3.4 update. This time the HASS update worked. :-}

I’ll keep an eye on the updates from now on to be sure this is not another temporary improvement.