RPi 4 with SSD won’t boot after updating HA

Hi guys, I have a RPi which as been running HA straight with no issues for years. I ran the latest update and now it refuses to boot.

Things I’ve tried

  • new ssd : installed a fresh HA and that boots up…. Until I try to run an old backup… refused to boot

  • new USB stick - same port and process as above - BOOTS FINE!! Backup installs and runs fine

  • different cable (hdd caddy actually) - fresh install boots fine but once I install an old backup, no joy

  • did he old Apt upgrades etc, checked bootloader, all ok…

I’m so confused, the older backup was from my install that ran JUST fine!?

Any ideas would be hugely appreciated- I really don’t want to run from a USB stick…

Thanks

There are several details missing from the picture which are important:

  • What type of HASS install is this? (“Apt upgrades” suggests this is not a HAOS appliance)
  • Which RPi? RPi3a (officially a potato!), RPi3b+, RPi4?
  • What was the original hardware? (you mention new SSD, new USB drive, not the original)

Assuming the RPi is fine (suggested test below…), my guess is the previous install was suffering from hidden disk issues, and the update pushed it over the edge. The old backup might contain a bad config somewhere, a bad database file, or you’ve found an unknown breaking change. :frowning:

The other thing to check is the PSU - external storage might pull more than the supply can handle, causing the CPU to brown out. That might manifest as a lockup when attempting a restore - high sustained load to CPU and disk. Database restores are usually SLOW (some large installs have reported tens of hours).

I’d also check a different RPi, and/or install RPiOS on a spare card and test the existing hardware with some intensive desktop / browsing / downloading to disk. The RPiOS might expose hardware issues in the syslog, and on-screen power / temperature warnings. You can also try fsck on other media. RPiOS is also likely to update the RPi firmware if this has not already happened.

If it’s a RPi4, the storage is connected to one of the blue USB3 ports isn’t it? :wink:

The good news is a HASS backup is just a tar gzip of component TGZ files so you can manually ‘unpack’ the old backup, look at the contents down to individual file level and restore in stages (e.g. unpack, edit, repack, test restore).

YAML config is fairly easy to recover, but database config can be hard - you need to get log messages and chase them down one by one.

If this helps, :heart: this post!

1 Like

Hi James, thank you so much for your response:

  • Type of HASS :: This is installed using the HA version from the RPi Imager software (see image below). AS I mentioned I bought a brand new SSD for this.
  • RPi version :: As per the title, this is RPi4 - I read somewhere that you had to get into the RPI with a the standard RPi OS to do an EEPROM update, so I flashed the stock OS on a USB stick and did all that.
  • Original Hardware :: exactly the same as now, SSD (Crucial, same as the new).

I have tested with stock RPi install, all works fine, also the stock HA install works fine. I have the SSD now in a powered CADDY and that works fine for the stock install. What I found is if I use the stock HA installation from the RPi Installer menu and then log into HA and I DON’T use the option to load from a backup (I have several backups going back to March) - it all works fine. Boots up from the CADDY etc.

If I then go into the HA menu and restore from a backup partially, this works except all my NODE RED scripts are gone…

Not sure how to get them back… if I do a full restore onto a USB stick then EVERYTHING works as before, except it runs of a USB stick which I don’t want. It looks like I might have to do that and export my Node Red scripts one by one (I have about 40x odd) and then import them one by one…

I have tried USB3 and USB2 - Way back when I moved to SSD (years ago) there was a thread that said not to use USB3…

If you have any way of leading me as to where to look to get all my NodeRed scripts back without manual loading… I traveled the TAR files but could not see anything that looked like it might work…

Thanks again

Ah - missed the title, reading the text!

I can’t explain the HAOS install failing on immediate restore, but working if restored after the new user setup. There has been a lot of work in the “onboarding process” so perhaps something has broken. Finishing up new user setup, and THEN restoring is being comitted to memory as a workaround!

NodeRed is an add on, but isn’t something I’ve used beyond some experiments years ago. If you install one of the Terminal & SSH add-ons, you can try working with your backup files.

Restoring to a USB stick, manually copying off the files to another device, then switching back to SSD and copying back might work, as might just backing up NodeRed on its own.

What’s your level of *nix command line knowledge like?

Here’s an example session on the HASS web terminal listing out the contents of a FULL backup. My guess is your backup contains something like addons_nodered.tar.gz which might be an easier way to restore the files manually in the right place.

  • Install from scratch
  • restore from a full backup (likely doesn’t include NodeRed)
  • Extract the NodeRed backup from the full backup manually
  • Try restoring the NodeRed backup from the GUI
  • Worst case - extract the NodeRed files and import manually.

(Add-Ons are in separate containers from HASS, so I’m not sure how to get a shell to restore their files.)

cd /root/backup
ls -latr      # list backups in time order, newest last

[core-ssh backup]$ ls -latr
  # stuff deleted for brevity #
-rw-r--r--    1 root     root         10240 Nov  3 05:55 9de72011.tar
-rw-r--r--    1 root     root      93972480 Nov  6 14:26 a4cc3307.tar
drwxr-xr-x    2 root     root          4096 Nov  6 14:26 .

# the filenames don't match the GUI name, but dates help identify which is which

# backup files are a tar archive containing TGZ archives
# use tar to list the contents of a FULL backup...
[core-ssh backup]$ tar tvf a4cc3307.tar     
drwx------ root/root         0 2023-11-06 14:26:29 ./
-rw-r--r-- root/root      1600 2023-11-06 14:23:47 ./44d7b954_logviewer.tar.gz
-rw-r--r-- root/root       184 2023-11-06 14:26:22 ./addons_local.tar.gz
-rw------- root/root      1889 2023-11-06 14:26:29 ./backup.json
-rw-r--r-- root/root      1644 2023-11-06 14:23:47 ./core_configurator.tar.gz
-rw-r--r-- root/root     20712 2023-11-06 14:23:52 ./core_matter_server.tar.gz
-rw-r--r-- root/root      4999 2023-11-06 14:23:52 ./core_mosquitto.tar.gz
-rw-r--r-- root/root      1765 2023-11-06 14:23:47 ./core_samba.tar.gz
-rw-r--r-- root/root      8682 2023-11-06 14:23:47 ./core_ssh.tar.gz
-rw-r--r-- root/root    889904 2023-11-06 14:23:51 ./core_zwave_js.tar.gz
-rw-r--r-- root/root  68026410 2023-11-06 14:26:22 ./homeassistant.tar.gz
-rw-r--r-- root/root  24397303 2023-11-06 14:26:29 ./media.tar.gz
-rw-r--r-- root/root       182 2023-11-06 14:26:22 ./share.tar.gz
-rw-r--r-- root/root       176 2023-11-06 14:26:22 ./ssl.tar.gz

# extract one add-on backup from a FULL backup file as an example
# Guessing, NodeRed probably has a filename like addons_nodered.tgz
[core-ssh backup]$ tar xvf a4cc3307.tar ./core_ssh.tar.gz    
./core_ssh.tar.gz

[core-ssh backup]$ ll
total 873124
  # stuff deleted for brevity #
-rw-r--r--    1 root     root          8682 Nov  6 14:23 core_ssh.tar.gz

# list the contents of the add-on backup extracted from a full backup
[core-ssh backup]$ tar tvfz core_ssh.tar.gz 
drwx------ root/root         0 2023-11-06 14:23:47 ./
-rw------- root/root      4017 2023-11-06 14:23:47 ./addon.json
drwxr-xr-x root/root         0 2023-10-26 22:53:04 data/
drwx------ root/root         0 2023-10-26 22:53:08 data/.ssh/
-rw------- root/root       702 2023-10-26 22:53:08 data/.ssh/authorized_keys
-rw------- root/root       263 2022-09-01 13:33:02 data/.ssh/known_hosts
-rw-r--r-- root/root        44 2022-08-04 17:36:12 data/.ssh/known_hosts.old
  # stuff deleted for brevity #

# expand the add-on backup as an example
$ mkdir test
$ cd test
$ tar xvfz ../core_ssh.tar.gz 

I got it working by deleting the database file… not sure if that only stores logs or what? But it doesn’t seem to affect things too much once removed…. An update / restore is mad slow (like 30 hours) which I’m sure was not the case previously. I might have to consider a beefier Pi or NUC at some stage. Just glad it’s working now.

Thanks for the replies!