So last week, I wanted to write a restore procedure for a full install HAOS restore of my home HA.
So my starting point is a full backup of HAOS running in a VM. Quite big at around 4 gig. My home HA for the source backup is quite big, with lots of add-ons, plugins, data, automations etc. The usual after a few years of tinkering.
For the test, I used a physical small PC, booted Ubuntu from USB and installed (imaged) the local disk. When rebooted, I pointed to the latest full backup of my home HA. It took a while! Maybe 5–6 hours?
When that was done, I paused (VM) the source HA.
Then from the attached console, I renamed the HA I just restored to not have conflict on the .local name, since they are on the same network.
Then I connected to the test HA. When I started poking around, I saw that many add-ons were not running. The only one that seems to be running was InfluxDB. Grafana, Node-RED, SQLite, Studio Code server and terminal did not want to start. When clicking the start button, it simply turns around forever. I looked at some logs, debugged for around half a day, could not find anything obvious.
So finally, the question!
When doing a full restore like this, what is the procedure to do so?
Do we install base HAOS then add-ons, then restore ?
What am I missing?
Were your HA fully updated when you made the backup?
A restore will install HA’s config library, but the HAOS is the one you installed when you prepared for the restoring of the backup.
The HAOS might not be compatible with the addons version being restored and the addons therefore needs to be redownloaded.
A complete reboot is required if your HAOS is on DHCP, but even on static IP, you need to ensure your other instance is not running. There are gotchas with networking and how it is configured so you need to pay attention.
When a backup restore has taken place I always suggest a fresh cold start of the system to ensure all changes are effective.
I did a full backup of HAOS last week and everything was updated. Then I installed HAOS from a fresh image of the same week. So maybe there were some differences, but mostly everything was updated and very close to the same version.
This morning, on the test restored machine, everything seems started, and HA is telling me to update a bunch of add-ons, core, and OS.
My main question remains. What is the best practice to restore?
My scenario is the following:
I have a production system, that I keep updated as much as possible. There is potentially a delay of 1-2 weeks on some updates.
I keep a full backup after a full update and smaller backups per add-ons as recommended when doing an update.
I want to write a procedure on how to reload HAOS from scratch if I lose a physical server.
So far, it seems to work after a complete shutdown of the physical machine and reboot.
I will run the updates on the testHA restored machine to see if there are some surprises.
The data, automation, sensors, and configurations are important to restore after a hardware failure.
No system is considered production worthy when it is based on Virtualbox or running under Windows with any virtualisation tech.
A production worthy system relies on a full VM backup, not the HAOS backup.
For your needs it does not matter how you restore. All it matters is you start the new system from cold once the restore has been successful and you old instance is not on the same network as the new instance.
Personally, I used VirtualBox for my production and development Home Assistant systems for the first 18 months I used HA. I ran it on top of Windows 10 Pro and both were extremely reliable and ran months without issue. I was running version 6.x of VB and never migrated to 7.x. My only complaint with VirtualBox would be USB passthrough, which could occasionally be buggy on startup. I moved all my USB passthrough devices to TCP connections so that became a non-issue for me.
For one-off, operational recovery support, VirtualBox supports snapshot backups, which I would highly recommend using before any upgrades or changes to your system. It allows you to roll-back to the pre-upgrade state without issues. For longer term DR backups I’d recommend an unattended scripted backup like this GitHub - niro1987/VirtualBox-Backup: An automated backup for Oracle VirtualBox VMs in Windows coupled with an offline copy of the backup files. I ran this strategy in Production and it never failed me.
VM level backups would be my first choice. I still run backups in Home Assistant coupled with Google Drive Backup for offsite storage but I’ve never had to rely on restoring one. I have had to roll-back a snapshot as well as recover from a copied backup with the VB strategy above. Never had a failure.
For the Windows OS I rely on System Image backups built into Windows. I’ve had to recover those as well, never had a problem.
That said, I’m now on Proxmox for all my VMs. It is also very capable. I no longer maintain Windows 10 Pro (It was originally installed to support HomeSeer, before they had a Linux option).
Plan ahead and make sure you understand and exercise the plan and you should be fine.