HAOS update failing

Should I be doing something different, not just hitting the “Install” link for the update offered on the Settings page?

Hi other Nick: is that trailing colon only in the error or does this mark a problem with the update system?
I’m about to post a bug report; should I go ahead?

Hi,
I’m going to suggest either DNS, or some other comms issue where the HAOS device is behaving differently than other devices on your LAN.

I’d install a web shell like the Terminal & SSH Add-on and try some commands…

# can HASS resolve an IP address for the site containing the update..
# look for an 'ANSWER section'
dig github.com

# look for 'has address'
host github.com

# look for a hop-by-hop network trace end to end
traceroute github.com

The next step would be to test directly downloading the link URL as a test via wget <URL>.

PS The trailing : colon is very likely log formatting.

If this helps, :heart: this post!

Hi James

Yes, it often is.

Yeah, thought so.

Good suggestion to tackle this! :+1:

dig and host look good. The traceroute is feasible, if github is a CDN-farm with endpoints in London:

8 lonap01.msn.net (5.57.81.17) 19.089 ms 28.478 ms 21.427 ms
9 ae29-0.icr03.lon22.ntwk.msn.net (104.44.55.206) 18.991 ms ae25-0.icr03.lon24.ntwk.msn.net (104.44.50.168) 19.988 ms *
10 * * *
(no useful info beyond there)

A manual wget works fine.

Hmm! (scratches head).

My traceroute sees a load of similar UK-based Internet exchange nodes, but assumed that was due to a UK-based ISP peering locally.

A manual wget works fine.

That’s got me stumped - if a shell on the HASS container can connect and xfer fine, then there’s only oddities with headers / Python libraries / out of disk space or a short-term problem that’s gone away.

I’ve seen a few similar reports of OTA updates failing from others but assumed network issues getting to a local CDN node, not something client-side (i.e. Cannot connect to host).

I guess the host HASOS might have a different set of libraries / cut-down network stack from the HASS container itself, but if this persists, some grepping in the source code and a GitHub issue might be worth it.

If it is any consolation, 12.4 has a few intermittent USB issues so you might want to hold back.

As you’re on 12.1, how about trying 12.2 (to avoid 12.3 / 12.4)?

ha os update --version 12.2

(My Yellow is sticking on 12.2 until there’s more data to help fix the USB driver issues, although there are options with kernel options.)

The manual 12.2 update fails the same way.

Can I tell it to update using a wget’d local file?
If not, I guess I just have to keep trying until it magically works :unamused:

Hi,

As you seem to know one end of a command line from another, let’s dig a bit deeper (the best way to learn how stuff works…) :man_mage: :grin:

The HAOS GitHub has a few logged issues with debug commands that might be worth a read:

The Supervisor updates HASS and the HAOS image in one of the two OS boot slots. Looking at the Supervisor code confirms where your error message is coming from (complete with trailing :)

This error message is seems to be thrown from a TimeoutError - slow network or DNS resolver timeout? But why is the supervisor different from a manual HASS CLI dig or wget? Dunno.

Based on this dev comment, I wonder if host logs -t rauc or host logs -t OTA might show something more:


If not, I guess I just have to keep trying until it magically works :unamused:

The simple option would be “turn it off and on again” a few times to see of something DNS/ IP/ moon phase changes (bet you’ve done that…).

Can I tell it to update using a wget’d local file?

Sort-of - the docs reference creating a specialy names CONFIG USB drive with the kernel *.raucb:

Only done “restart” so far :slight_smile: so an actual power-cycle is next.

Nothing at all from those log commands.

Power cycle: no help.

A new supervisor version has just been released so it might be worth an update and try again.

Hello, I’m just having the same issue with 12.4 → 13.0 update.

I tried all the network commands – they work. I even can ping github from the terminal addon:

obraz

Yet I still cannot update in any way:

I’ve found a GitHub issue reported for this (yes my laptop in the same network also does successfully connect :wink:), but devs responded it’s a network error and they cannot do anything about it. Given that I can access Github from the HA machine, it’s probably not the case though. I posted a comment there, let’s how it unfolds.

GitHub had a fail today. Try again.

Nope, failed again.

I don’t think github failed when I tried – I run wget and update attempt one after another. wget started downloading the file and update attempt failed to connect.

1 Like

I guess we are back to this then

By the way, this is what I was referring to. Maybe lingering effects GitHub rolls back database change after breaking itself • The Register

Hi guys,
I’m quite new to Home Assistant, so maybe i’m missing something. My home assistant is running on a raspberry pi 5.

Earlier I got notifications to update core or OS and that went fine, so I think that in the basis, my setup should be correct.
Now when trying to install OS 13.1, I get a notification “Failed to perform the action update/install. Error updating Home Assistant Operating System: Unknown error, see supervisor”. I got the same error when trying to update to 13.0 (from 12.3), but eventually skipped that install. This is already for a few days now.

When looking at the supervisor output, I get this line;

2024-08-26 23:00:52.509 ERROR (MainThread) [supervisor.os.manager] Home Assistant Operating System update failed with: Installation error: Failed updating slot boot.0: failed to run slot hook: Child process exited with code 1

I tried rebooting, unplugging, rebooting the whole network. That all didn’t help.
Above the advice is given to try trough ssh;

  • dig github. com

Response;

; <<>> DiG 9.18.27 <<>> github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 540
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d6bb6887ef217901 (echoed)
;; QUESTION SECTION:
;github.com.                    IN      A

;; ANSWER SECTION:
github.com.             55      IN      A       140.82.121.3

;; Query time: 8 msec
;; SERVER: 172.30.32.3#53(172.30.32.3) (UDP)
;; WHEN: Mon Aug 26 23:14:52 CEST 2024
;; MSG SIZE  rcvd: 77
  • host github. com
    response;
github.com has address 140.82.121.3
github.com mail is handled by 5 alt2.aspmx.l.google.com.
github.com mail is handled by 10 alt4.aspmx.l.google.com.
github.com mail is handled by 10 alt3.aspmx.l.google.com.
github.com mail is handled by 1 aspmx.l.google.com.
github.com mail is handled by 5 alt1.aspmx.l.google.com.
  • Traceroute github. com
    respons;
traceroute to github.com (140.82.121.4), 30 hops max, 46 byte packets
 1  router.domain_not_set.invalid (192.168.1.254)  0.617 ms  1.048 ms  0.607 ms
 2  *  *  *
 3  *  *  *
 4  100.64.0.18 (100.64.0.18)  3.902 ms  3.769 ms  3.706 ms
 5  *  *  *
 6  62.45.255.114 (62.45.255.114)  4.743 ms  4.936 ms  4.649 ms
 7  er1.ams1.nl.above.net (80.249.208.122)  5.268 ms  4.827 ms  5.103 ms
 8  *  *  *
 9  *  *  *
10  ae1.mcs1.fra6.de.eth.zayo.com (64.125.29.57)  9.993 ms  9.977 ms  10.097 ms
11  82.98.193.31.IPYX-270403-001-ZYO.zip.zayo.com (82.98.193.31)  9.486 ms  9.842 ms  14.318 ms
12  *  *  *
13  *  *  *
14  *  *  *
15  *  *  *
16  *  *  *
17  *  *  *
18  *  *  *
19  *  *  *
20  *  *  *
21  *  *  *
22  *  *  *
23  *  *  *
24  *  *  *
25  *  *  *
26  *  *  *
27  *  *  *
28  *  *  *
29  *  *  *
30  *  *  *

I think that my Home Assistant is still connecting to Github (also because a core update yesterday worked flawless).

Can anyone help me to find the cause that the OS updates are not working anymore?

If more info is necessary, please ask.
Any advice would be appreciated.

With kind regards,
Danny

p.s. some more info about my setup;

  • Board; Raspberry Pi 5
  • Core; 2024.8.3
  • Supervisor: 2024.08.0
  • Operating system; 12.3
  • Frontend: 20240809.0
  • Storage; 2% used (separate SSD)
  • Wired connection to network, using DHCP

I’m on a RPi4, but exactly same behaviour. It is not DNS and not disk space related. Nothing helped so far.

What about the gchr.io address?

It seems there’s some DNS or connectivity problem for the Supervisor container only. Could you try attaching to the Supervisor container using docker exec -ti hassio_supervisor bash (either directly on the host/VM after typing login or through the Advanced SSH terminal; standard SSH/web terminal won’t work) and check what fails there? I will start with dig A github.com and ping dns and/or ping 172.30.32.3 (as DNS should be resolved through the CoreDNS plugin).