Backup / Restore procedure for HAOS

SimonFili · January 8, 2024, 1:12pm

Hi all,

So last week, I wanted to write a restore procedure for a full install HAOS restore of my home HA.

So my starting point is a full backup of HAOS running in a VM. Quite big at around 4 gig. My home HA for the source backup is quite big, with lots of add-ons, plugins, data, automations etc. The usual after a few years of tinkering.

For the test, I used a physical small PC, booted Ubuntu from USB and installed (imaged) the local disk. When rebooted, I pointed to the latest full backup of my home HA. It took a while! Maybe 5–6 hours?

When that was done, I paused (VM) the source HA.
Then from the attached console, I renamed the HA I just restored to not have conflict on the .local name, since they are on the same network.

Then I connected to the test HA. When I started poking around, I saw that many add-ons were not running. The only one that seems to be running was InfluxDB. Grafana, Node-RED, SQLite, Studio Code server and terminal did not want to start. When clicking the start button, it simply turns around forever. I looked at some logs, debugged for around half a day, could not find anything obvious.

So finally, the question!
When doing a full restore like this, what is the procedure to do so?
Do we install base HAOS then add-ons, then restore ?
What am I missing?

Thanks! Simon

os.habitats.tech · January 8, 2024, 1:52pm

You do not state which VM product you are using.

SimonFili · January 8, 2024, 2:00pm

The source HA is HAOS on Virtual box. That should not be important.

I rebooted the testHA.local machine this morning, I did a shutdown last week after a day of working on it.

Looking around the testHA.local that I restored, I see now that the add-ons are running now.

So, here’s my observation so far:

Installed a fresh HAOS on an empty machine
Restored a full backup from another HAOS instance.
Renamed the machine to testha.local
ADD-ONS are not started
Did a couple of “Restart Home Assistant” from the GUI, that did not solve the issue
Complete shutdown of the machine
Boot up of the machine, add-ons works now.

Is this expected?

WallyR · January 8, 2024, 2:05pm

Were your HA fully updated when you made the backup?
A restore will install HA’s config library, but the HAOS is the one you installed when you prepared for the restoring of the backup.
The HAOS might not be compatible with the addons version being restored and the addons therefore needs to be redownloaded.

os.habitats.tech · January 8, 2024, 2:22pm

A complete reboot is required if your HAOS is on DHCP, but even on static IP, you need to ensure your other instance is not running. There are gotchas with networking and how it is configured so you need to pay attention.

When a backup restore has taken place I always suggest a fresh cold start of the system to ensure all changes are effective.

indeeed · January 8, 2024, 2:27pm

Yes, when doing a quick search in the forum…

Just the first 5 search results…

Also could have shared that logs with us like described here:

SimonFili · January 8, 2024, 2:30pm

I did a full backup of HAOS last week and everything was updated. Then I installed HAOS from a fresh image of the same week. So maybe there were some differences, but mostly everything was updated and very close to the same version.

This morning, on the test restored machine, everything seems started, and HA is telling me to update a bunch of add-ons, core, and OS.

My main question remains. What is the best practice to restore?

My scenario is the following:

I have a production system, that I keep updated as much as possible. There is potentially a delay of 1-2 weeks on some updates.
I keep a full backup after a full update and smaller backups per add-ons as recommended when doing an update.
I want to write a procedure on how to reload HAOS from scratch if I lose a physical server.

So far, it seems to work after a complete shutdown of the physical machine and reboot.

I will run the updates on the testHA restored machine to see if there are some surprises.

The data, automation, sensors, and configurations are important to restore after a hardware failure.

os.habitats.tech · January 8, 2024, 2:35pm

Let’s get some facts out of the way.

No system is considered production worthy when it is based on Virtualbox or running under Windows with any virtualisation tech.
A production worthy system relies on a full VM backup, not the HAOS backup.

For your needs it does not matter how you restore. All it matters is you start the new system from cold once the restore has been successful and you old instance is not on the same network as the new instance.

SimonFili · January 8, 2024, 2:45pm

I agree.

I’m doing testing right now, not trying to restore a full production system.
I’m using my home instance HA to do testing, I’m aware that running on the same network I will have many things that depend on a fixed IP not working
The “production” system I’m running outside my home network, is not running in a VM, but on a physical box SSD drive.
If I get into a situation where I will need to rebuild a failed physical box, the IP, and network would be the same as before.
I would first install HAOS by using Ubuntu to image HAOS on the local SSD drive, then using a full backup to reload everything

I hope this help clarify what I’m trying to accomplish. And thanks for participating, it would be boring to discuss alone.

os.habitats.tech · January 8, 2024, 2:50pm

I now understand. It is a BM restore you require. The procedure is as you described.

Install HAOS on a new BM system.
Restore backup
Once restore successful, powerdown system
Powerup new system, ensuring you old system is not on the LAN

Congrats you are up and running.

mterry63 · January 8, 2024, 5:52pm

Personally, I used VirtualBox for my production and development Home Assistant systems for the first 18 months I used HA. I ran it on top of Windows 10 Pro and both were extremely reliable and ran months without issue. I was running version 6.x of VB and never migrated to 7.x. My only complaint with VirtualBox would be USB passthrough, which could occasionally be buggy on startup. I moved all my USB passthrough devices to TCP connections so that became a non-issue for me.

For one-off, operational recovery support, VirtualBox supports snapshot backups, which I would highly recommend using before any upgrades or changes to your system. It allows you to roll-back to the pre-upgrade state without issues. For longer term DR backups I’d recommend an unattended scripted backup like this GitHub - niro1987/VirtualBox-Backup: An automated backup for Oracle VirtualBox VMs in Windows coupled with an offline copy of the backup files. I ran this strategy in Production and it never failed me.

VM level backups would be my first choice. I still run backups in Home Assistant coupled with Google Drive Backup for offsite storage but I’ve never had to rely on restoring one. I have had to roll-back a snapshot as well as recover from a copied backup with the VB strategy above. Never had a failure.

For the Windows OS I rely on System Image backups built into Windows. I’ve had to recover those as well, never had a problem.

That said, I’m now on Proxmox for all my VMs. It is also very capable. I no longer maintain Windows 10 Pro (It was originally installed to support HomeSeer, before they had a Linux option).

Plan ahead and make sure you understand and exercise the plan and you should be fine.

Jens_Wymeersch · October 14, 2024, 2:40pm

My case is a bit different…
So I’m running HAOS on Proxmox. I have 3 machines running in High Availability. Now it looks like a part of the disk on which HAOS is running is corrupted. Resulting in the fact that my HAOS is not High Available any more.
My idea was to load a new HAOS instance and restore the backup. (which is approx 2.2G). You can’t see any progress if the restore is working or not. Any suggestions besides looking at the storage - memory - CPU consumption ?
Secondly, you indicate that you would run the restore at the same time as the old machine is running. Correct ? Only when the restore is finished, you would stop the old machine and boot the new one. Correct ? If so, did you try this yourself ?

os.habitats.tech · October 16, 2024, 12:13am

What is critical when restoring a backup is that your old system is down so that no potential IPv4 conflicts when the new system restarts.

You could use the following:

Full backup system then download backup
Shutdown old system
Start new system and restore downloaded backup

Of utmost importance is that the first thing you do on the new system is restore the backup. DO NOT attempt to create a login on the new system then restore.

On Proxmox, are you not using Proxmox generational backups and just restore the VM using Proxmox? Say go back a few days, weeks or months.

Jens_Wymeersch · October 16, 2024, 2:11am

What you describe is exactly what I tried a few days ago. The problem that came up was that I couldn’t determine when the backup was finished or not.
You don’t see any progress. Any ideas what you can do ?
Bringing the system down for hours creates obviously challenges as a lot of things in the house don’t work anymore.
That’s why I hoped I could do it in parallel which you debunked.
Related to the proxmox backups, the error in the hardware is in those backups. I have to check if I have ones going back multiple months but I think I don’t.
Once this is sorted I will change that policy.
Because then I could take an older backup and restore the home assistant backup on top of this which most probably is much faster. Correct ?

os.habitats.tech · October 16, 2024, 4:14am

Once backup is complete you should be able to login. You can also refresh the page periodically. In a modern miniPC (VM or BM) the restore process takes minutes.

If your system is slow give it an hour and it should complete. If you do not get the login screen I assume either something has gone wrong or the restore has not completed.

Jens_Wymeersch · October 16, 2024, 6:26am

Thank you for your quick reply. Is there any value in going into supervised vs OS ?
Secondly do you see any problem in running home assistant in a lxc?

os.habitats.tech · October 18, 2024, 1:19am

Supervised has no benefits when running under Proxmox.

For best long term uptime, compatibility and stability you should run HAOS in a VM. Everything else is a hack IMHO. If you like hacks and are not worried of instability you can try LXC and/or supervised. However, if you want a system you can rely upon it should be HAOS in a VM or BM (bare metal).

stevemann · November 7, 2024, 1:11pm

I agree with everything except the VM. I run HAOS on an Intel NUC i3. Bare metal. No VM, no container- just the HAOS image.

Uptime is measured in years.

os.habitats.tech · November 8, 2024, 5:13am

I also agree with your assertion and my omission. HAOS on BM is also a first class experience, however not as flexible as a VM in the sense you cannot run, for example, a Ubuntu or Windows desktop system or other servers (e.g. Plex, TrueNAS, etc.) on the same server box, plus on BM you need to run 3rd party services as add-ons instead of separate VMs/Containers, and to access the HAOS server on BM you require to setup SSH, while on a VM there is nothing to do to maintain the server side.

If your environment is simple, HAOS on BM is an excellent and solid option. However if your environment is more complex HAOS on VM is the most flexible option IMHO.

fleskefjes · November 8, 2024, 5:26am

This is a good thing how? If you have uptime measured in years that just means you are not maintaining it properly. There’s zero difficulty achieving the same uptime on a virtualized platform. To never have tried virtualization you sure think you know a lot about how it works.