HAOS update failing

jgh · June 25, 2024, 10:01am

On a “Green”, 12.1 → 12.4
This has persisted for attempts on several days.
An update of the “Core” just ran ok, so it’s not all external comms being down.

Relevant line from the log:

homeassistant.components.hassio.handler.HassioAPIError: Can’t fetch OTA update from https://github.com/home-assistant/operating-system/releases/download/12.4/haos_green-12.4.raucb: Cannot connect to host github.com:443 ssl:default [None]

Any clues would be welcome.

Nick4 · June 25, 2024, 10:23am

Hi, strange because that link is OK.
What happens if you visit https://github.com/home-assistant/operating-system/releases/

jgh · June 25, 2024, 10:24am

From my laptop, on the same house net, that’s fine

nickrout · June 25, 2024, 10:26am

That link falls because of the trailing “:”

It downloads without it.

jgh · June 25, 2024, 10:32am

Should I be doing something different, not just hitting the “Install” link for the update offered on the Settings page?

Nick4 · June 25, 2024, 11:45am

Hi other Nick: is that trailing colon only in the error or does this mark a problem with the update system?
I’m about to post a bug report; should I go ahead?

FloatingBoater · June 25, 2024, 11:53am

Hi,
I’m going to suggest either DNS, or some other comms issue where the HAOS device is behaving differently than other devices on your LAN.

I’d install a web shell like the Terminal & SSH Add-on and try some commands…

# can HASS resolve an IP address for the site containing the update..
# look for an 'ANSWER section'
dig github.com

# look for 'has address'
host github.com

# look for a hop-by-hop network trace end to end
traceroute github.com

The next step would be to test directly downloading the link URL as a test via wget <URL>.

PS The trailing : colon is very likely log formatting.

If this helps, this post!

Nick4 · June 25, 2024, 12:43pm

Hi James

Yes, it often is.

Yeah, thought so.

Good suggestion to tackle this!

jgh · June 25, 2024, 12:51pm

dig and host look good. The traceroute is feasible, if github is a CDN-farm with endpoints in London:

8 lonap01.msn.net (5.57.81.17) 19.089 ms 28.478 ms 21.427 ms
9 ae29-0.icr03.lon22.ntwk.msn.net (104.44.55.206) 18.991 ms ae25-0.icr03.lon24.ntwk.msn.net (104.44.50.168) 19.988 ms *
10 * * *
(no useful info beyond there)

A manual wget works fine.

FloatingBoater · June 25, 2024, 1:14pm

Hmm! (scratches head).

My traceroute sees a load of similar UK-based Internet exchange nodes, but assumed that was due to a UK-based ISP peering locally.

A manual wget works fine.

That’s got me stumped - if a shell on the HASS container can connect and xfer fine, then there’s only oddities with headers / Python libraries / out of disk space or a short-term problem that’s gone away.

I’ve seen a few similar reports of OTA updates failing from others but assumed network issues getting to a local CDN node, not something client-side (i.e. Cannot connect to host).

I guess the host HASOS might have a different set of libraries / cut-down network stack from the HASS container itself, but if this persists, some grepping in the source code and a GitHub issue might be worth it.

If it is any consolation, 12.4 has a few intermittent USB issues so you might want to hold back.

As you’re on 12.1, how about trying 12.2 (to avoid 12.3 / 12.4)?

ha os update --version 12.2

(My Yellow is sticking on 12.2 until there’s more data to help fix the USB driver issues, although there are options with kernel options.)

jgh · June 25, 2024, 1:27pm

The manual 12.2 update fails the same way.

Can I tell it to update using a wget’d local file?
If not, I guess I just have to keep trying until it magically works

FloatingBoater · June 25, 2024, 1:57pm

Hi,

As you seem to know one end of a command line from another, let’s dig a bit deeper (the best way to learn how stuff works…)

The HAOS GitHub has a few logged issues with debug commands that might be worth a read:

The Supervisor updates HASS and the HAOS image in one of the two OS boot slots. Looking at the Supervisor code confirms where your error message is coming from (complete with trailing :)

github.com

home-assistant/supervisor/blob/ffb4e2d6d767c6f4c1eef108da4952ca5e38c59b/supervisor/os/manager.py#L231


      
                              chunk = await request.content.read(1_048_576)
                              if not chunk:
                                  break
                              ota_file.write(chunk)
          
                  _LOGGER.info("Completed download of OTA update file %s", raucb)
          
              except (aiohttp.ClientError, TimeoutError) as err:
                  self.sys_supervisor.connectivity = False
                  raise HassOSUpdateError(
                      f"Can't fetch OTA update from {url}: {err!s}", _LOGGER.error
                  ) from err
          
              except OSError as err:
                  if err.errno == errno.EBADMSG:
                      self.sys_resolution.unhealthy = UnhealthyReason.OSERROR_BAD_MESSAGE
                  raise HassOSUpdateError(
                      f"Can't write OTA file: {err!s}", _LOGGER.error
                  ) from err
          
          @Job(name="os_manager_reload", conditions=[JobCondition.HAOS], internal=True)

This error message is seems to be thrown from a TimeoutError - slow network or DNS resolver timeout? But why is the supervisor different from a manual HASS CLI dig or wget? Dunno.

Based on this dev comment, I wonder if host logs -t rauc or host logs -t OTA might show something more:

github.com/home-assistant/operating-system

Failed to install 12.4

opened 12:23PM - 19 Jun 24 UTC

stain3565

bug board/raspberrypi

### Describe the issue you are experiencing current os version 12.3 Attempted …to install 12.4 from updates "Failed to call service update/install" Have restared supervisor and restarted home assistant. No success. Logs: ` 2024-06-19 13:10:50.436 INFO (MainThread) [supervisor.os.manager] Fetch OTA update from https://github.com/home-assistant/operating-system/releases/download/12.4/haos_rpi4-64-12.4.raucb 2024-06-19 13:10:55.306 INFO (MainThread) [supervisor.os.manager] Completed download of OTA update file /data/tmp/hassos-12.4.raucb 2024-06-19 13:10:55.715 ERROR (MainThread) [supervisor.os.manager] Home Assistant Operating System update failed with: Installation error: Failed updating slot boot.0: failed to run slot hook: Child process exited with code 1 ` ### What operating system image do you use? rpi4-64 (Raspberry Pi 4/400 64-bit OS) ### What version of Home Assistant Operating System is installed? 6.6 ### Did the problem occur after upgrading the Operating System? No ### Hardware details Just pi4 ### Steps to reproduce the issue 1. Simply click install fir version 12.4 2. 3. ... ### Anything in the Supervisor logs that might be useful for us? ```txt 2024-06-19 13:10:50.436 INFO (MainThread) [supervisor.os.manager] Fetch OTA update from https://github.com/home-assistant/operating-system/releases/download/12.4/haos_rpi4-64-12.4.raucb 2024-06-19 13:10:55.306 INFO (MainThread) [supervisor.os.manager] Completed download of OTA update file /data/tmp/hassos-12.4.raucb 2024-06-19 13:10:55.715 ERROR (MainThread) [supervisor.os.manager] Home Assistant Operating System update failed with: Installation error: Failed updating slot boot.0: failed to run slot hook: Child process exited with code 1 ``` ### Anything in the Host logs that might be useful for us? ```txt Nothing useful ``` ### System information ## System Information version | core-2024.6.3 -- | -- installation_type | Home Assistant OS dev | false hassio | true docker | true user | root virtualenv | false python_version | 3.12.2 os_name | Linux os_version | 6.6.28-haos-raspi arch | aarch64 timezone | Europe/London config_dir | /config <details><summary>Home Assistant Community Store</summary> GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1404 Downloaded Repositories | 60 </details> <details><summary>AccuWeather</summary> can_reach_server | ok -- | -- remaining_requests | 27 </details> <details><summary>Home Assistant Cloud</summary> logged_in | true -- | -- subscription_expiration | 29 September 2024 at 01:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | true remote_server | eu-central-1-9.ui.nabu.casa certificate_status | ready instance_id | 2334dad093c247fa8b69b2c522f4a8b0 can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok </details> <details><summary>Home Assistant Supervisor</summary> host_os | Home Assistant OS 12.3 -- | -- update_channel | stable supervisor_version | supervisor-2024.06.0 agent_version | 1.6.0 docker_version | 25.0.5 disk_total | 109.3 GB disk_used | 76.8 GB healthy | true supported | true host_connectivity | true supervisor_connectivity | true ntp_synchronized | true virtualization | board | rpi4-64 supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.14.0), Home Assistant Google Drive Backup (0.112.1), MariaDB (2.7.1), Z-Wave JS UI (3.8.0), Studio Code Server (5.15.0), Mosquitto broker (6.4.1), Samba share (12.3.1), Grafana (10.0.0), InfluxDB (5.0.0), AppDaemon (0.16.6), Duck DNS (1.17.0), File editor (5.8.0), SQLite Web (4.1.2), Z-Wave JS (0.6.0), Zigbee2MQTT (1.38.0-1), TileBoard (1.3.8), Ring-MQTT with Video Streaming (5.6.7), Samba NAS (12.2.0-nas2), eWeLink Smart Home (1.4.3), TasmoAdmin (0.30.3), Ring Livestream (1.35), Samba Backup (5.2.0), Node-RED (17.0.13), ESPHome (2024.5.5), Piper (1.5.0), Whisper (2.1.0), openWakeWord (1.10.0), Git pull (7.14.1) </details> <details><summary>Dashboards</summary> dashboards | 7 -- | -- resources | 32 views | 38 mode | storage </details> <details><summary>Recorder</summary> oldest_recorder_run | 15 June 2024 at 22:34 -- | -- current_recorder_run | 19 June 2024 at 11:19 estimated_db_size | 1301.17 MiB database_engine | mysql database_version | 10.11.6 </details> <details><summary>Sonoff</summary> version | 3.7.3 (e240aaf) -- | -- cloud_online | 1 / 1 local_online | 1 / 1 </details> ### Additional information _No response_

If not, I guess I just have to keep trying until it magically works

The simple option would be “turn it off and on again” a few times to see of something DNS/ IP/ moon phase changes (bet you’ve done that…).

Can I tell it to update using a wget’d local file?

Sort-of - the docs reference creating a specialy names CONFIG USB drive with the kernel *.raucb:

github.com

home-assistant/operating-system/blob/7298ffc13fde018d80fc0787932e2020168b6200/Documentation/configuration.md

# Configuration

## Automatic

You can use an USB drive with HassOS to configure network options, SSH access to the host and to install updates.
Format a USB stick with FAT32/EXT4/NTFS and name it `CONFIG` (in all capitals). Alternative you can create a `CONFIG` folder inside the `boot` partition. Use the following directory structure within the USB drive:

```text
network/
modules/
modprobe/
udev/
authorized_keys
timesyncd.conf
hassos-xy.raucb
```

- The `network` folder can contain any kind of NetworkManager connection files. For more information see [Network][network.md].
- The `modules` folder is for modules-load configuration files.
- The `modprobe` folder is for modules configuration files (/etc/modprobe.d)

This file has been truncated. show original

jgh · June 25, 2024, 2:33pm

Only done “restart” so far so an actual power-cycle is next.

Nothing at all from those log commands.

jgh · June 25, 2024, 8:15pm

Power cycle: no help.

FloatingBoater · June 25, 2024, 10:24pm

A new supervisor version has just been released so it might be worth an update and try again.

mipioter · August 15, 2024, 9:06am

Hello, I’m just having the same issue with 12.4 → 13.0 update.

I tried all the network commands – they work. I even can ping github from the terminal addon:

obraz

Yet I still cannot update in any way:

I’ve found a GitHub issue reported for this (yes my laptop in the same network also does successfully connect ), but devs responded it’s a network error and they cannot do anything about it. Given that I can access Github from the HA machine, it’s probably not the case though. I posted a comment there, let’s how it unfolds.

nickrout · August 15, 2024, 11:50am

GitHub had a fail today. Try again.

mipioter · August 15, 2024, 4:44pm

Nope, failed again.

I don’t think github failed when I tried – I run wget and update attempt one after another. wget started downloading the file and update attempt failed to connect.

nickrout · August 15, 2024, 8:41pm

I guess we are back to this then

nickrout · August 16, 2024, 12:55am

By the way, this is what I was referring to. Maybe lingering effects GitHub rolls back database change after breaking itself • The Register