Power Loss, HA Will Not Start

smac3265 · February 8, 2024, 8:08pm

Hi all -
I lost power today, and while I was trying to gracefully shut down the VM, the UPS died right in the middle of the shutdown sequence. Now the VM refuses to restart at all. Both slots have tried 3 times (according to the grub menu) and now HA automatically goes into the rescue shell.

Now, once in there, I can run the journalctl -xb command and review the log. In there, I can see there are 8 entries for an invalid checksum for a backup superblock.

Keep in mind, this is my first time working with the rescue shell, and I’m not super strong with Linux commands, so I’m stumbling here.

Anyway, from what I’ve read so far, it seems like perhaps my file system is corrupted. But if I try to run the e2fsck command, it tells me /dev/sda8 is already mounted, so that won’t run. And, there’s no way to unmount sda8 (that I know of).

Once in the Rescue Shell, if I tell it to just reboot, it will immediately skip over both of the normal boot slots (I guess because it’s tried 3 times in each?) and drop right back into the Rescue Shell. If I manually intervene in the grub menu and select a normal boot entry, it will try to run, but it will fail, and go back into Rescue Shell. Some dependency is indeed failing, but I can’t tell what it is before it scrolls off the screen.

I’m hoping someone out there might be able to guide me in the right direction to get this resolved. Please.

Sean

timj.pdx · February 8, 2024, 11:54pm

Let this be a warning to all choose to run without a UPS, Linux is not immune to damage due to a power loss.
The Linux command your are looking for is ‘fsck - check and repair a Linux filesystem’. I am not remotely close to an expert with it. It cannot be run on a mounted file system. The errors in the log should tell you which partition is having problems. For more info try ‘man fsck’
Basically you just run it, and hope it is able to find/repair all the problems. If it fails you are pretty much left with restoring the partition from backups, unless you are REAL expert. Chances are pretty good it will be succeed and no files/dirs will be lost.

smac3265 · February 8, 2024, 11:58pm

Appreciate the response Tim! Indeed, good advice about the UPS - and even when you DO have a UPS, if you don’t get the shutdown completed before the UPS dies, you’re still up the creek (which is what happened to me).

So I’ve tried the fsck command, and you’re right, it won’t run on a mounted drive. But running in the Rescue Shell, I can’t seem to get it to unmount. At least, the command I believe I’m supposed to be using to unmount the device won’t execute because it doesn’t exist.

Sean

timj.pdx · February 9, 2024, 12:18am

Not exactly sure which partition your rescue shell is mounted on. You need to run ‘umount /dev/xxxx’ if your shell is mounted on that partition then that will fail. Then you would have to run Linux from a dvd/usb stick then run fsck on the bad partition. Is this a SSD or a HDD?

smac3265 · February 9, 2024, 12:38am

It’s a HDD, and it’s in a VM on a Windows PC.

timj.pdx · February 9, 2024, 1:28am

That changes everything… I know nothing about Windoze or how you can proceed from here, sorry.

couch67 · February 9, 2024, 3:41am

Hi, sorry to hear about your trouble. Do you have a recent backup of your HA config? If so, it should be relatively straightforward to recreate the VM and reinstall your backup.

smac3265 · February 9, 2024, 3:55am

There should be a backup, but it’s stuck on the drive that won’t load. It never occurred to me, nor did I investigate, the possibility of dumping the backup to an alternate location. I’m am sick with the knowledge that I’m potentially facing hours and hours of rebuilding what took hours and hours to build in the first place. If I could just get this to boot one time, or figure out how to at least get into the partition in the rescue shell, perhaps there’s hope.

francisp · February 9, 2024, 4:18am

What VM software are you using ? If using virtualbox, you can mount the VDI in a Debian or Ubuntu vm and proceed from there.

smac3265 · February 9, 2024, 8:34pm

Hi Francis - thanks for the reply! I am indeed using VirtualBox. So, what I’ve done so far is to install another HA VM using the latest .vdi image. That works fine. It starts HA as expected, and all is good that way. I tend to think that the overall image of the original HA is not corrupted because it actually does try to load. It steps through MOST of the process and fails at only one step (which I can no longer catch, because it scrolls by so quickly).

A friend who runs HA on a Pi made the suggestion about loading another HA instance in VB and then pointing the drive in that new instance to the original one. I’m going to try that (once I figure out how to do that!) and see how that works.

While I really want to get a backup of the entire setup, I’d be somewhat satisfied with getting my config.yaml and whatever other .yaml files I can grab.

Yes, this is a lesson learned, and I’ll be making changes to my everyday processes once I get past this. But right now, the focus is on recovery, as best as I can.

Sean

smac3265 · February 10, 2024, 1:42am

Final update on this…

I installed Debian in another VM and was able to grab the backups from the corrupted installation. I hadn’t done a FULL backup in a while, but it got me to a place where I didn’t have to spend hours and hours rebuilding a lot of what I had already done.

I’ve already connected the new HA VM to my NAS and created a full backup this evening. There will be regular full backups made along the way going forward, and a new UPS will be ordered for the PC that runs HA.

Lots learned here.

Sean

couch67 · February 12, 2024, 2:56am

Glad to hear you were able to retrieve your backup.

To automate backups and ensure they are stored safely, I run an add-on called ‘Home Assistant Google Drive Backup’ which, as you would guess, makes periodic backups and saves them to your Google Drive folder. The backup frequency and number of backups to keep are configurable.