So last week, I wanted to write a restore procedure for a full install HAOS restore of my home HA.
So my starting point is a full backup of HAOS running in a VM. Quite big at around 4 gig. My home HA for the source backup is quite big, with lots of add-ons, plugins, data, automations etc. The usual after a few years of tinkering.
For the test, I used a physical small PC, booted Ubuntu from USB and installed (imaged) the local disk. When rebooted, I pointed to the latest full backup of my home HA. It took a while! Maybe 5–6 hours?
When that was done, I paused (VM) the source HA.
Then from the attached console, I renamed the HA I just restored to not have conflict on the .local name, since they are on the same network.
Then I connected to the test HA. When I started poking around, I saw that many add-ons were not running. The only one that seems to be running was InfluxDB. Grafana, Node-RED, SQLite, Studio Code server and terminal did not want to start. When clicking the start button, it simply turns around forever. I looked at some logs, debugged for around half a day, could not find anything obvious.
So finally, the question!
When doing a full restore like this, what is the procedure to do so?
Do we install base HAOS then add-ons, then restore ?
What am I missing?
Were your HA fully updated when you made the backup?
A restore will install HA’s config library, but the HAOS is the one you installed when you prepared for the restoring of the backup.
The HAOS might not be compatible with the addons version being restored and the addons therefore needs to be redownloaded.
A complete reboot is required if your HAOS is on DHCP, but even on static IP, you need to ensure your other instance is not running. There are gotchas with networking and how it is configured so you need to pay attention.
When a backup restore has taken place I always suggest a fresh cold start of the system to ensure all changes are effective.
I did a full backup of HAOS last week and everything was updated. Then I installed HAOS from a fresh image of the same week. So maybe there were some differences, but mostly everything was updated and very close to the same version.
This morning, on the test restored machine, everything seems started, and HA is telling me to update a bunch of add-ons, core, and OS.
My main question remains. What is the best practice to restore?
My scenario is the following:
I have a production system, that I keep updated as much as possible. There is potentially a delay of 1-2 weeks on some updates.
I keep a full backup after a full update and smaller backups per add-ons as recommended when doing an update.
I want to write a procedure on how to reload HAOS from scratch if I lose a physical server.
So far, it seems to work after a complete shutdown of the physical machine and reboot.
I will run the updates on the testHA restored machine to see if there are some surprises.
The data, automation, sensors, and configurations are important to restore after a hardware failure.
No system is considered production worthy when it is based on Virtualbox or running under Windows with any virtualisation tech.
A production worthy system relies on a full VM backup, not the HAOS backup.
For your needs it does not matter how you restore. All it matters is you start the new system from cold once the restore has been successful and you old instance is not on the same network as the new instance.
I’m doing testing right now, not trying to restore a full production system.
I’m using my home instance HA to do testing, I’m aware that running on the same network I will have many things that depend on a fixed IP not working
The “production” system I’m running outside my home network, is not running in a VM, but on a physical box SSD drive.
If I get into a situation where I will need to rebuild a failed physical box, the IP, and network would be the same as before.
I would first install HAOS by using Ubuntu to image HAOS on the local SSD drive, then using a full backup to reload everything
I hope this help clarify what I’m trying to accomplish. And thanks for participating, it would be boring to discuss alone.
Personally, I used VirtualBox for my production and development Home Assistant systems for the first 18 months I used HA. I ran it on top of Windows 10 Pro and both were extremely reliable and ran months without issue. I was running version 6.x of VB and never migrated to 7.x. My only complaint with VirtualBox would be USB passthrough, which could occasionally be buggy on startup. I moved all my USB passthrough devices to TCP connections so that became a non-issue for me.
For one-off, operational recovery support, VirtualBox supports snapshot backups, which I would highly recommend using before any upgrades or changes to your system. It allows you to roll-back to the pre-upgrade state without issues. For longer term DR backups I’d recommend an unattended scripted backup like this GitHub - niro1987/VirtualBox-Backup: An automated backup for Oracle VirtualBox VMs in Windows coupled with an offline copy of the backup files. I ran this strategy in Production and it never failed me.
VM level backups would be my first choice. I still run backups in Home Assistant coupled with Google Drive Backup for offsite storage but I’ve never had to rely on restoring one. I have had to roll-back a snapshot as well as recover from a copied backup with the VB strategy above. Never had a failure.
For the Windows OS I rely on System Image backups built into Windows. I’ve had to recover those as well, never had a problem.
That said, I’m now on Proxmox for all my VMs. It is also very capable. I no longer maintain Windows 10 Pro (It was originally installed to support HomeSeer, before they had a Linux option).
Plan ahead and make sure you understand and exercise the plan and you should be fine.
My case is a bit different…
So I’m running HAOS on Proxmox. I have 3 machines running in High Availability. Now it looks like a part of the disk on which HAOS is running is corrupted. Resulting in the fact that my HAOS is not High Available any more.
My idea was to load a new HAOS instance and restore the backup. (which is approx 2.2G). You can’t see any progress if the restore is working or not. Any suggestions besides looking at the storage - memory - CPU consumption ?
Secondly, you indicate that you would run the restore at the same time as the old machine is running. Correct ? Only when the restore is finished, you would stop the old machine and boot the new one. Correct ? If so, did you try this yourself ?
What is critical when restoring a backup is that your old system is down so that no potential IPv4 conflicts when the new system restarts.
You could use the following:
Full backup system then download backup
Shutdown old system
Start new system and restore downloaded backup
Of utmost importance is that the first thing you do on the new system is restore the backup. DO NOT attempt to create a login on the new system then restore.
On Proxmox, are you not using Proxmox generational backups and just restore the VM using Proxmox? Say go back a few days, weeks or months.
What you describe is exactly what I tried a few days ago. The problem that came up was that I couldn’t determine when the backup was finished or not.
You don’t see any progress. Any ideas what you can do ?
Bringing the system down for hours creates obviously challenges as a lot of things in the house don’t work anymore.
That’s why I hoped I could do it in parallel which you debunked.
Related to the proxmox backups, the error in the hardware is in those backups. I have to check if I have ones going back multiple months but I think I don’t.
Once this is sorted I will change that policy.
Because then I could take an older backup and restore the home assistant backup on top of this which most probably is much faster. Correct ?
Once backup is complete you should be able to login. You can also refresh the page periodically. In a modern miniPC (VM or BM) the restore process takes minutes.
If your system is slow give it an hour and it should complete. If you do not get the login screen I assume either something has gone wrong or the restore has not completed.
Thank you for your quick reply. Is there any value in going into supervised vs OS ?
Secondly do you see any problem in running home assistant in a lxc?
Supervised has no benefits when running under Proxmox.
For best long term uptime, compatibility and stability you should run HAOS in a VM. Everything else is a hack IMHO. If you like hacks and are not worried of instability you can try LXC and/or supervised. However, if you want a system you can rely upon it should be HAOS in a VM or BM (bare metal).
I also agree with your assertion and my omission. HAOS on BM is also a first class experience, however not as flexible as a VM in the sense you cannot run, for example, a Ubuntu or Windows desktop system or other servers (e.g. Plex, TrueNAS, etc.) on the same server box, plus on BM you need to run 3rd party services as add-ons instead of separate VMs/Containers, and to access the HAOS server on BM you require to setup SSH, while on a VM there is nothing to do to maintain the server side.
If your environment is simple, HAOS on BM is an excellent and solid option. However if your environment is more complex HAOS on VM is the most flexible option IMHO.
This is a good thing how? If you have uptime measured in years that just means you are not maintaining it properly. There’s zero difficulty achieving the same uptime on a virtualized platform. To never have tried virtualization you sure think you know a lot about how it works.