Update silently fails on HAOS

hahuzar · April 4, 2024, 5:50pm

In the Setting a popup I got the message of ‘update available’, version 2024.4.0, currently running 2024.3.3. If I click install, some progress bar gets active, then stops and nothing further happens. After machine reboot, it is still:

Core 2024.3.3
Supervisor 2024.03.1
Operating System 12.1
Frontend 20240307.0
This is from haos_generic-aarch64-12.1.img.xz which I downloaded 2024-03-31 9am and is running well in a VM on a NanoPi-R6C.
The same version running on a 2 year old Intel VM updates fine, e.g. it also auto reboots, unattended.

I also tried in CLI, but similar behavior; 2 minutes or so ‘Processing’, then ‘Done’ and nothing.

I looks to me that under the hood something fails, but although this HAOS is Debian 12.1, I feel like rather limited in what I can log see compared to other systems I administer. I have examined the ‘fresh’ image, I normally use Btrfs subvols/snapshots, this squasfs and overlayfs is quite different.

Maybe someone can give some hints where to look for OS logs or so?

eftimg · April 4, 2024, 5:55pm

I had similar scenario with previous versions, because of add-ons / RAM. Maybe check your add-ons and turn off some or 3-rd party ones, just to do the update. Then you can re-enable them.

francisp · April 4, 2024, 5:58pm

Settings → System → Logs → pick what you want to see

hahuzar · April 4, 2024, 6:02pm

I had no Add-ons, then it also fails. I added ssh add-on today to have a bit easier CLI access as serial console in the VM manager is not working (works on the x86_64 variant).
I allocated 2 CPUs and 2G RAM, I can allow all 8 CPU’s and 4G RAM just to try. Or maybe also run/try it on my RPi4-8GB, with 2G RAM as rest is occupied by other VMs.

hahuzar · April 4, 2024, 6:12pm

Aha, I was blind for something in the upper-right corner of my screen!, now that I see it, i see under Supervisor after failing again update:

Error for http+docker://localhost/v1.43/images/ghcr.io/home-assistant/qemuarm-64-homeassistant:2024.4.0/json: Not Found (“No such image: Package qemuarm-64-homeassistant · GitHub”)
2024-04-04 20:06:37.772 WARNING (MainThread) [supervisor.homeassistant.core] Updating Home Assistant image failed

NOTE/EDIT: It seems I need codetags or so as the forum refuses to post links/urls as it sees that in the error messages.
But it is clear to me that something is not found, will see if I understand etc.

MaxK · April 4, 2024, 7:06pm

How much disk space do you have allocated?

hahuzar · April 4, 2024, 7:32pm

The default, what is this image:
https://github.com/home-assistant/operating-system/releases/download/12.1/haos_generic-aarch64-12.1.img.xz

xzcat haos_generic-aarch64-12.1.img.xz > haos.img.fresh

losetup --show -P -f haos.img.fresh

/dev/loop0

gdisk -l /dev/loop0

Number Start (sector) End (sector) Size Code Name
1 2048 67583 32.0 MiB EF00 hassos-boot
2 67584 116735 24.0 MiB 8300 hassos-kernel0
3 116736 641023 256.0 MiB 8300 hassos-system0
4 641024 690175 24.0 MiB 8300 hassos-kernel1
5 690176 1214463 256.0 MiB 8300 hassos-system1
6 1214464 1230847 8.0 MiB 8300 hassos-bootstate
7 1230848 1427455 96.0 MiB 8300 hassos-overlay
8 1427456 4048895 1.3 GiB 8300 hassos-data
The backup .tar is 68M, there was/is 1.3G free reported (it is only 4 days old only doing Energy/Solar at the moment)

As a test, or workaround?, I did take a backup, stopped the VM, renamed haos.img to haos.img.fails and renamed haos.img.fresh to haos.img
Then start VM again, in the web interface it did some updating, then I selected my backup restore, now all running again:
Core 2024.4.0
Supervisor 2024.03.1
Operating System 12.1
Frontend 20240403.1

The image itself is on Btrfs on NVME, linux 6.8.1 kernel, hourly snapshotted, that is a proven method for me. I just ran a scrub, to be sure, no issues.
So I don’t know why it failed. My current thinking is that something has been going wrong in the filesystem(s) inside the image, but guessing.

MaxK · April 4, 2024, 7:47pm

A similar problem was reported and the solution was to increase disk space. HA need space to make copies, decompress images, etc. The recommended min disk space on RPi, for example (I know you are on a VM) is 32GB.

hahuzar · April 4, 2024, 8:08pm

Hm, I see another issue: The last partition is expanded at first run it seems. It is now 5.3GB and in de WebUI i find 61.5% used (3.2G/5.2GB), so I am quite surprised that it needs/eats so much space, although I think I can understand looking at all the features/graphics. My x86_64 VM indeed was / is 32GB, I kept it like that as I saw it was mostly sparse and not been running for 2 years. Now I actually want to replace the Intel box with ARM box. The NVME is 500G, so I can do like truncate -s32G haos.img and see/hope it autoexpands partition8, otherwhise manually.

hahuzar · April 4, 2024, 8:38pm

OK, the truncate trick worked, GPT is adapted, so now 3.5GB used of total 30.5GB. Also I realize that earlier there was ‘orange’ now ‘green’. But I am used to tens of my own colors in influxDB for various data, so here and there colors don’t get my attention anymore.

Conclusion: It is just a storage problem. Enough free space on the NVME, but the question arises, why do I need 3.5G ? My influxDB has 1 year 10s granularity samples of 2 solar invertors and 3phase power+delivery and is about 2G. I am worried about HA just collecting data and that I do not want to spent time on figuring out how to ‘downsample’ etc. My VM that runs influx is 14G filesystem size, is Btrfs zstd compressed, so can keep it small and runs fine on a 2G RAM VM RPi4. But I’ll see how it develops. I keep 4 systems running in parallel at the moment, all handling the same powermeasurement streams.

hahuzar · April 5, 2024, 10:17am

FYI, I cloned an existing Debian12 aarch64 6G VM image where I installed HA Supervised. It has working serial console, actually the original VM has no keyboard/mouse/tablet/video/graphics/audio, so like a ‘deeply embedded ARM board’.
Rootfs was/is 5G Btrfs compress-force=zstd mount option, a quick calculation is that HA stuff takes 873MB besides the 1700MB of the Debian12 installation, including some snapshots. I will more or less see this as a challenge, what will happen if upgrades come etc. If I get a storage failure again, I can live/adhoc add Btrfs storage without reboot or so.