HA Disaster, Then Restored, Can't Update

Adam_Zilko · July 12, 2020, 3:21am

I’m running HA on a VirtualBox on a Win 10 server. The server shutdown while the VM was running. When I tried to start it back up after the server restarted, it wouldn’t start, rebooted a few times, and went back to an older backup from last month.

The version it restored to, is running 0.0110.4, and the OS is 3.13. I can’t get either HA, or the OS, to update. I don’t really know anything about linux, and I’m not good at troubleshooting this. However, the VirtualBox is saying there’s a EXT-4fs error, w/ something about a bad block checksum.

I’m completely at a loss on what I can do to troubleshoot / fix this. I’m afraid that if I can’t fix this, I’ll be hanging up HA for good as it’s simply too difficult to troubleshoot for me, and takes too much time. Starting over seem like much too large of a task as well.

I’m hoping someone can help point me in the right direction here. I appreciate any help. If you tell me how to get you the info needed to troubleshoot this, I’m very happy to do so.

Adam_Zilko · July 12, 2020, 3:42am

francisp · July 12, 2020, 3:46am

The normal procedure to fix ext4 errors on a root fs is to attach the HDD to another Linux machine, or to boot up with a Linux live CD and run e2fsck. I don’t know virtual box, but maybe you can spin up a new VM with Debian, and mount your HA VM disk in there to run e2fsck.

BTW: I would never run HA on windows. As you discovered, there is no way to stop windows from rebooting.

Adam_Zilko · July 12, 2020, 4:27am

Somehow, through multiple retries, I was able to get the Core to update. However, for the OS, I’m getting this,

20-07-12 04:25:42 INFO (MainThread) [supervisor.hassos] Fetch OTA update from https://github.com/home-assistant/operating-system/releases/download/4.11/hassos_ova-4.11.raucb
20-07-12 04:25:53 INFO (MainThread) [supervisor.hassos] OTA update is downloaded on /data/tmp/hassos-4.11.raucb
20-07-12 04:25:53 INFO (MainThread) [supervisor.utils.gdbus] Call de.pengutronix.rauc.Installer.Install on /
20-07-12 04:25:53 INFO (MainThread) [supervisor.utils.gdbus] Start dbus monitor on de.pengutronix.rauc
20-07-12 04:25:53 INFO (MainThread) [supervisor.utils.gdbus] Stop dbus monitor on de.pengutronix.rauc
20-07-12 04:25:53 INFO (MainThread) [supervisor.utils.gdbus] Call org.freedesktop.DBus.Properties.GetAll on /
20-07-12 04:25:53 ERROR (MainThread) [supervisor.hassos] HassOS update fails with: signature verification failed: error:2E09A09E:CMS routines:CMS_SignerInfo_verify_content:verification failure

craigcurtin · July 12, 2020, 4:43am

Yeah you really need to check and fix the filesystem

Spin uo another Linux VM (rescueTux is a live distro that is pretty good and has lots of tools in it) on Virtual Box and then with Hass shutdown - mount the disk for the HA onto the Rescue VM.

From there you will be able to run a filesystem check

If you intend to continue to run on Windows then i would suggest you install WUMT Wrapper on Windows (Google it) - it lets you turn off the Windows Automatic updates that force reboots when you do not want them

Craig

Adam_Zilko · July 13, 2020, 1:34am

Thanks, I’ll give that a look. However, I’m now getting this error. Any idea on what might be the issue here?

20-07-12 04:20:31 ERROR (MainThread) [supervisor.hassos] HassOS update fails with: signature verification failed: error:2E09A09E:CMS routines:CMS_SignerInfo_verify_content:verification failure

craigcurtin · July 13, 2020, 1:53am

Thats why you need to run a disk check - i am guessing there is corruption on the disk and when you are downloading the updates they are not passing verification.

The first place to start is with a disk check

Craig

Adam_Zilko · July 13, 2020, 2:41am

Okay, thank you a ton.

So, here’s another thing to complicate everything. I had installed docker separately to setup PiHole in this server as well, after it restarted. I don’t have the VM set to auto start as the server seldom ever restarts (it’ll typically stay up for 2-3 mos at a time). I’m wondering if / thinking I brought much of this on myself as I’m wondering if there a conflict. I hadn’t thought there would have been, but I started doing some Googling on this, and apparently there can be a conflict. Unfortunately, the server had restarted, and I wasn’t paying attention to HA not starting when I proceeded to install docker.

Thoughts on this?

craigcurtin · July 13, 2020, 3:03am

It shouldn’t be causing what you are seeing - i assume you have backed out the Pihole config - i.e. you have not redirected all the DNS to the Pihole server if it is not running ?

It sounds like you may be a little new to networking and linux etc - as such i would recommend not using docker in the same VM as your HA. It would be easier to spin up a light weight Linux VM and then run docker in there for other projects you want to play with

Craig

Adam_Zilko · July 13, 2020, 4:25pm

While pihole is running, I’ve only configured two network devices (a laptop and an iPhone) to use it as I’m still in a testing phase. So, HA isn’t using it whatsoever. However, it’s running in docker on the same server that the virtual box is on.

I will likely uninstall docker first to troubleshoot this, then move on to your suggestion of checking the file system. I find it odd that older back ups of the VM are also problematic as well. This would indicate to me that there’s something else going on here. While I can start and run an older backup, I can’t restart it w/o an issue. And, it won’t reboot at all. So it must be opening in a state that has bypassed some start up dependencies.

craigcurtin · July 13, 2020, 4:33pm

When you say you went to an older backup - what sort of backup ? A backup with HA or a backup of the virtual machine using external software ?

THe issue is that unless you are restoring a complete copy of the VM i.e. a replacement VM then you are just restoring backups into a corrupted filesystem so they themselves will be corrupted.

In my case before doing any major upgrades of a VM - i clone it to a new VM so i have a complete machine to go back to - it does not sound like you have done this step though

Craig