Preface: I completely love HA, I’m running RPI4 4GB with an SSD, Zwave, Zigbee sticks, HASSOS
I wanted to share my somewhat horrific experience with backups - it’s not been fun. I’ve used backups (snapshots) for some time along with the google drive addon which makes me feel secure. My experience of late has made me feel the opposite.
So my SSD seemed to be dying and I needed to swap it out - for any other OS I’d just image a drive. Of course - linux - so janky tools mean this becomes not so fun. Clonezilla (as I found out) doesn’t like resizing partitions down (even if they aren’t full and would otherwise fit). I can’t stress this enough - if you want to image via this method you are wrong - go instead for gparted and just copy partitions. It’s the only thing that really works. Don’t be fooled into thinking gparted resize + clonezilla will work - you will waste hours.
Why do this? Well my SSD is dying and I only had a larger drive waiting around. So I move to that and of course end up with a very large data partition which now won’t fit onto the smaller replacement (new) SSD when it arrives.
But I’m getting ahead of myself - at this point you’re all wondering why I don’t use the RPI imager and then just restore a snapshot (backup) - well of course I did.
But it doesn’t go well.
In theory, backups restore everything - just image with the RPI imager, logon and restore a full backup. Same hardware, same SW and same config - should just work. Should.
So what went wrong (some samples)?
-
Restoring backups hangs. This happened a lot. Hours pass, things get stuck - you’re spending ages SSH-ing to see if things are progressing or even alive. There’s no progress indicator, no real heartbeat to monitor. Stress increases, relationships break down, hair is lost. Please improve this. Even via the CLI you just sit looking at some spinning ASCII - it doesn’t mean it’s doing anything. I gave up a few times. I couldn’t tell if things worked. I ended up using the CLI as I stopped trusting the frontend.
-
I found at one point it was good idea to wax my DB - i.e. SSH in and stop core and nuke the DB. This seem to improve stability and restores worked a bit better.
-
I’m using Zwave JS UI but all my Zwave devices don’t restore. They’re unreachable. I spend ages trying to figure out why and eventually notice my Zwave JS integration is busted. Why? Because for some reason I need to remove and readd it to put the Zwave JS UI URL back in during initial setup. One problem solved.
-
All my HACs frontend stuff is missing and not loaded. Fix is to redownload them one by one. Why? No idea.
So you’re gonna say: well you’re using community stuff etc etc. Fact is the process is not pretty and whilst I see the backups as being essential I still feel the need to image the drive. I can restore this (now I know how to do this) with drive imaging - perfectly. It works right away. Nothing needs to be fixed. This is how backup restoration should work.
HA / HASOSS would be strongly augmented by a drive imaging solution. Even better if this could push the images to a network share - this is essentially how everything else in the house works and it’s super reliable and super fast at getting me back to a running state compared to everything else. Even in my now working state a restore takes 30mins+ and then I’ve got n mins/hours of figuring out what’s broke to contend with.
BTW if you know of a reliable windows imaging tool that works for HASSOS backups please let me know. I’ve not tested Macrium or Acronis yet but I’ve moved to Macrium for all my windows systems (as Acronis is now sadly bloatware). By this I mean I unplug the drive and stick it on a Windows system to backup.
Sorry if this came across as a whinge but my experience has been that posting stuff like this has helped me find my solutions to my problems so in some part it may help someone else. I am up and running now despite dreading upgrading to the latest core version as my next step. I’m on 2023.4.6 and about to try 2023.5.2 today. Wish me well…