Home Assistant OS 10 update has broken my Pi 4b 4GB

Exactly the same thing happend to me, also with a RPi4 & SSD.

Only had local backups on the SSD itself, are they possible to retrieve?

Hi there
Today I tried the update from OS 9.5 to 10.1.
Unfortunately without success :wink:
After around one hour I did a hard shutdown on my Raspi 4b/8GB.
I then connected a keyboard and monitor and can see that OS itself is coming up to a certain point.
I then tried: os update --version 9.5
Processing… Done.
Error: 'OSManager.update' blocked from execution, system is not running - CoreState.STARTUP

Update: 28. April 2023 21:33
I did a few reboots for whatever reason, and boom it worked again.

Update: 08. May 2023 17:57
At the end I went back to OS 9.5
I ran into out of memory.
System was hanging.
A realy bad expierience.
I wasn’t able to do the downgrade directly at me raspi. (I know that sounds strange )
What I did instead was a ssh-Connection.
Then the command “ha core stop”
Then the command “ha os update --version 9.5”
I do now some preparations to be able do have a solid backup
Then I wait at least for os 10.3 before I give it another try

Not sure if it is associated with OS 10 upgrade, but I am running on ProxMox 7.4/Hass.OS 10 on a new hardware (Hunsn RJ02 / 500GB SSD/ 32GB RAM) installed last week and I had HA stop responding a couple a times per day (and only that VM, the other containers (WireGuard/Icinga2) and VM (PfSense) on ProxMox were running smoothly)… I upgraded to 10.1 and now it seems to be stable for 18 hours… If problem persist, I will go back to OS9.5… Lesson learned: do not change too many things at the same time as the problem could come from ProxMox 7.4 (was running 7.3 before), new hardware (moved from NUC to Hunsn), new OS (upgraded from 9.5 to 10.0 and now to 10.1)…

Stay tuned… :slight_smile:

update: HA is now running in OS10.1 for 1 day and 14h without any problems… Just to complete my answer, I followed what @mickandkez did: I stopped completely the VM machine running home assistant (but kept the other VM and containers running as well as ProxMox), wait a couple of seconds and started the virtual machine… Stable since than… The VM is now using around 4% of CPU.

Update 2: after a week without problems, I had two crashes in a couple of hours (seems due to kernel problem in Linux (asm_exec_page_fault) based on the console logs)… I downgraded to Hass.OS 9.5… and see if problem persists… If it persists, I will reinstall the VM from scratch and download a backup… More to come…

Fortunately I have no problems with the Yellow Box, but I know that anything can go wrong any time.
I make back-ups every two days, both on the Yellow Box and on my Google Drive, and also every time I make a significant change in my HA configuration.

I still have my old RPi 3 but keep it offline. After reading of your issues, I decided to temporarily disconnect the Yellow Box and reconnect the Rpi.

At first it took 30 to 45 minutes to update HA and then another 30 minutes to restore the latest backup (made by the Yellow Box). None of my Zwave devices work but that make perfect sense as I did not move the Zooz 800 dongle to the RPi3, which has a Zooz 700 dongle. However everything else works on the RPi. I assume that should the Yellow Box fail any time, I can move the Zooz 800 dongle and Zwave should then come back.

This gives me some assurance as there is no technical support with HA, like the superb support I have with the Universal Devices IoX (Polisy/eisy).

Did you go back to 9.5?
I tried to update to 10.1 today and it didn’t work with a RPi4 + SSD

Yesterday I had some “fun” myself as well, upgrading my pi4-4gb+sd+ssd from 9.5 to 10.1. After the normal update failed I didn’t give up easy. I tried updating the eeprom, ditching the sdcard, and booting a fresh install of 10.1 from ssd. That worked fine from scratch, but as soon as I restore my snapshot/backup it has the same issue many others have described. It works for a second, then goes away, then randomly can be pinged but not accessed… until hard reset, then rinse and repeat.

I have been through assorted update/upgrade issues over the years with HA. I fell back to 9.5 and will wait and see if this gets fixed later on… maybe 10.2 or whatever. At least it’s not like an esphome OTA all fail, where you have to access every device to connect usb for recovery. The problem I am now having though, is I select “Skip” to 10.1, and it keeps popping back up. It’s like it really, really wants me to brick it again, lol.

You may be able to pull the drive, hook it up to another box running linux, and see if they can be recovered. It’s best to habitually download those backups and put them somewhere safe for this exact situation. They can take forever to complete, and I am sometimes asleep before they’re ready to download. Making a habit of checking the backups page whenever I’m logged helps me not forget to download them.

Hi there,

I just want to add to this. I have a:
Pi 4b 8GB
USB Boot to ORICO SSD Portable External 128GB Mini M.2 NVME

I updated from HA OS 9.5 to 10.0, the day it was released and it has been a nightmare since. I read that some people were not even able to boot when they updated with a similar NVME SSD Pi4 hardware configuration.

Luckily mine did, it just kept crashing every 5 hours or so. I connected the HDMI and saw that it was the SQUASHFS becoming read only and journald errors.


compare:

I since changed the power supply from a 20W 4 Ampere to a macbook usb-c charger and updated to HA OS 10.1 which, brought some stability improvement. But still it crashed, then about every other day.

Today I rolled back to HA OS 9.5:
ha os update --version 9.5
ha core update --version=2023.1.7

and its currently migrating my DB back

Database is about to upgrade from schema version: 41 to: 30

so it’s still very busy. It yet remains to be seen whether I get my old regular 1 month or more uptime without crashes. I really hope so.

This is not OKAY!

I suspect it has something to do with the following ‘features’, from release notes:

  • zswap instead of swap in zram is used. This should allow to use Home Assistant OS on systems with lower amounts of RAM with the trade-off of slightly higher storage wear.

For reference, these are the release notes of HA OS 10.1, so something was really not okay with the 10.0 Release, and it’s quite annoying

Raspberry Pi

2 Likes

Hi there,
I have a similar problem on my Odroid XU4 booting from eMMC.
After the update to HAOS 10 it no longer boots and I get no connection to display via HDMI…
Currently trying to recover my 9.5 backup.

FWIW, those who had issues with booting from a usb attached ssd (not nvme) after upgrading from 9.5 to 10.0 or 10.1…

It may be helpful to post some info about your adapter and drive here:

The devs are working on a potential fix, and it seems some info from users who had a failure may help.

1 Like

Hi,

I just wanted to update on this issue. I have had no crashes since I downgraded to HA OS 9.5, uptime is now since 9 May 2023 at 18:17, which was a normal host reboot. I also updated to core 2023.5.2 again, that does not seem to cause any problems.

I tried again recently to update. Some things I discovered

add-ons that utilize CIFS are what is making my system continuously reboot the host like a boot loop. Such as radarr, sonarr, transmission, imich, next cloud etc etc.

From what I can tell, it seems like smbv1/CIFS is deprecated entirely and causes the system to crash when trying to mount.

If have add-ons using SMBv1/cifs, if you can plug directly into the host and use the cli even if it’s only for 20-30seconds before the system crashes and reboots. Try to disable the add ons utilizing CiFS, and the system should boot or not crash long enough to bring up the main HA container, and the webui, at which point you can disable those add-ons on boot.

Need to submit a bug in alexbelgium repo to update the smb/cifs method used in the containers

Since I’ve disabled add ons that utilize CIFS, HAOS 10 and 10.1 run fine.

2 Likes

My add-ons actually try all smb versions starting from 3 and only using smbv1 if it is the protocol used by the user. I could disable completely smbv1 some routers such as fritzbox only use smbv1. Si it seems more on the user side here… On my own system I use the cifs method and never had issues. I use the samba nas add-on that publishes all disks on smb

Hi, actually I’ve adapted my code : it was testing first smb 3, then switched to smb 1. I’ve now implemented that it tests first smb 3 (any version), 3.0 specifically, 2.1, 2.0, and 1.0 only if nothing else works with a big warning saying this is not recommended. It will be implemented in the different addons according to their next update schedules - let me know if you want me to push updates earlier for some specific addons

2 Likes

Hi,
I have this same issue. Upgrading to any OS 10 version caused disk to go read-only after a few hours, reverting to 9.5 corrects the issue.
I am running straight HASSIO OS 9.5 on a RP4 8G booting from a USB drive 128GB (no SD card).

Are you putting any notes on each add on about the change? I need it for radarr, sonarr, and nzbget.

This is implemented in a template manner, if the add-on was updated since thag date (which it was for all of those) then it has the new smb code

I have removed the only CIFS addon I use i.e. SAMBA, but I still have the same issue, though it only occurs after a few days.

to be honest the bus used in storage medium is unreliable to begin with, if you think nothings going wrong doing stuff on a storage connected via USB while in the middle of the update, that your naive.

who knows maybe while in the middle of the update the usb controller decide to poop it self up which is nothing new to begin with, they reset it can get back working again but something definitely gets corrupted in between when the usb controller get reset and recovered back.

I seem to be fixed by blacklisting my USB drive controller.