New backups don't work since they changed with 2025.1.1

skorpioskorpio · January 12, 2025, 10:08pm

I was happy to see that regular automatic backups were now a thing, until I wasn’t. It took me about 20 attempts to get a backup to work, it’s not very reliable and certainly not something I trust to be doing it’s thing in the background unattended.

Also, while I understand forcing encryption if you are backing up to cloud, it should not be forced for local disk or NAS storage. I just had to reconstruct an HA instance from backup less than a month ago, many times before I figured out which integration update bricked it. Literally a couple days before the entire backup system was changed. This would have been way more difficult than it already was if I’d had to provide a restore key, many many times. And quite honestly I’m not even 100% certain that had I had to go back to a backup created prior to 2025.1.1 (from the HAOS console CLI) that it even would have been possible, given that you’d be trying to restore an unencrypted backup to a system that no longer understands that concept, and there doesn’t seem to be any sort of flag that identifies encrypted vs unencrypted backups.

And while I’m in a mood, consistency of naming would be good. My HA instance when it choked wouldn’t start the UI, so all the screwing around I did had to be done from the CLI, and luckily I had enabled SSH on the machine or I would have had no idea which backup was which, as the CLI doesn’t use the “name” of the backup it uses some slug code identifier of the backup, and you have to go through the backup directory, sort by date, and kinda guess which is the newest full backup (which is the only thing the HAOS console CLI can do), even though you “named” the backups as “Full” or not. the backups are not stored on the disk by what you named them only by the slug, which is also the only thing the CLI knows how to restore.

Sorry to gripe, but before you overhaul the backup backend, make it more intuitive to restore from and not just in the UI but anywhere backups can be restored. Even at my best guess I ended up restoring from a backup that was quite a bit older than I expected. The creation date was a month earlier, but there were integrations I added 3 months earlier that were not in it. Had I not had SSH access, and no UI, I’m not sure how I could have restored a backup. Backups need to be rock solid, that is their purpose.

So,

encryption should be optional on local or mounted disk,
what you name the backup should be what the name of tar file is, and that should be at least an option of what the HAOS console CLI can restore
what the type of backup a file is should be obvious, and what triggered that backup. Maybe simply creating sub directories under the /root/backup directory like Friendly_Name, Full, Partial, etc and create symbolic links to the slug code tar file so that it doesn’t break anything pre-existing.
the HAOS console CLI should be able to list a simple list of the backups more intuitively than it does (name, date, type, maybe slug, but again I don’t see why you would have the ability to name a backup that you can’t restore by name), what it can do is list some very verbose info, most of which is not necessary, and to a “display” that will probably not have the ability to do scrollback.

Given that this list may contain backups done as part of the process of installing updates, the installation of which is not the unlikely cause of why you are tying to do this in the first place, you may never be able to see the “slug” for the last full backup. which leads to…

add “more” to the HAOS console CLI or make more like behavior the default to any command output that is more than 25 lines. A VGA text console, or a VNC console on a VM, a IPMI interface none of these can scroll back, so any command whose output typically exceeds 25 lines is simply lost. you will only EVER see the last 25 lines.

OK, I’m done, carry on.

ddaniel · January 12, 2025, 10:13pm

If you think that there is a someone who read all that, well good luck.
Backup problems?

Solution

Set up your own backup. You have a ton of documentation online, you can use ai, whatever. Don’t relay on someone else provided solution that doesnt work for you.

MattB314 · January 12, 2025, 10:36pm

I read it all and I agree with it.

skorpioskorpio · January 13, 2025, 11:21am

Well if that was printed on paper, in an easily readable font, it’d be less than one page, one side, but I agree perhaps more information in one go than the evolved modern human mind can absorb. I knew I should have broke it up with a few inspirational cat memes.

Yes, I could create my own backup method, my HAOS instance is a VM on an array of servers running ZFS. But that isn’t the point now is it? This is, in my opinion, one of HAs biggest issues, that it constantly tries to aspire to be an integrated, mainstream product, but when things don’t work, the solutions are DIY hacks that will likely break something sometime in the future.

What bricked my install was indeed some interesting but trivial HACS integration that used depreciated methods, and what that cost me was the loss of the unnecessarily complex configuration of monitoring and controlling my stand-by generator, my uninterruptible power supply and my electrical panel. Stuff that is, well, not so trivial.

skorpioskorpio · January 13, 2025, 11:23am

Thanks Matt, I appreciate that.

ddaniel · January 13, 2025, 11:31am

I didn’t ment to be rude, but it does look like that.
It is all matter of perspective. I don’t know will this be changed in the future, rolled back or it will be something completely different.
But take this as an opportunity for creating your own solution. Maybe that will be better for you, maybe it will be worst, maybe it will be the same.
But do something about it.

I can’t tell you what to do, but I can briefly share my experience.

I started using supervised on debian. As i have a doorbell camera and didn’t have a dedicated gpu or coral frigate was using cpu. One day my server crashed due to high cpu usage. By then I found out that this is using some contaners and that you can actually limit cpu and mem usage per container.

As I didn’t know how to do that in supervised installation I decided to go with a docker. Bit by bit I added a lot of things to it, like media server, local cloud, printer server and who knows what more.

I set up my own backup solution that will backup not only home assistant then everything else I find important to backup.
And my solution just works for me. And I’m glad I done that.

skorpioskorpio · January 13, 2025, 1:02pm

I hear you, and yes if I’d known where the events of less than a month ago would have ended up, in retrospect I should have done ZFS snapshots instead as a roll back mechanism. I mean it’s really one of the reasons my HAOS is on a VM to begin with, so I could have it sitting on RAIDed disks. But the fact that I ran my original HAOS implementation into the ground, performance wise, is why I was ultimately motivated to go to a VM. That was 9 months ago and there are still things that worked before that I’ve yet to get reworking.

But that said, every update does (or can do) a native backup, so if things go south that is supposed to be your safety net, but if it doesn’t work than it isn’t. And that’s obviously a problem. I shouldn’t have to add the step to snapshot the hypervisor host every time I update something for fears something may go badly and that the built-in piece of mind is not to be trusted.

My goal is to eventually have multiple copies of HAOS homed on different VM hosts, so there is always a stand-by instance waiting in the wings, and maybe I should update that goal to include a few “last known good” instances.

I have migrated all my Zigbee to be a (wired) network based gateway, and same with Zwave, so no more USB dongles. But that doesn’t make instances completely interchangeable. A lot of things are “stuck” to the instance they are paired to. I mean I do have a fairly elaborate configuration, about 700 or so devices. It’s also why I really don’t like doing hacky alternatives to something that in my opinion should just work, or should have been considered. Every time I have to configure some backend YAML heavy integration, I ask myself, is this going to come back to bite me?

I am a disaster recovery and business continuity architect of computer infrastructures by trade. But given how many fidly bits are part of a complicated Home Assistant deployment, there is only so much you can do to make this a survivable system when there hasn’t been much effort to make it that way to begin with. I have a long laundry list of issues I have with Home Assistant, and consumer IoT in general for that matter. It just isn’t designed to be robust, and that needs to change if there is any possibility that this is where we are going. But a failure of a primary safeguard is a real problem, especially when it worked as intended a month ago, and now it’s been “improved” and doesn’t really.

Sir_Goodenough · January 15, 2025, 3:33am

Hi skorpioskorpio,

Here are some of the alternatives that i know about should the new stuff not work for you. Basically all the stuff available last month is still available .

skorpioskorpio · January 16, 2025, 5:05pm

Hmm, the HA Supervisor Full Backup Action certainly seems handy and easy enough to just do a daily, so fair enough. I didn’t know about that, lets give it a go…

OK, I’ve already written an automation using it, and seems to work fine.

But…

It doesn’t change the fact that the more or less advertised method of doing backups recently introduced is unreliable and doesn’t really work as it should.

For example: I have it configured to take daily backups and retain all copies on the local disk (as path of least resistance). This has been configured since the 2025.1.2 update was install on January 2nd. It has successfully, automatically created a backup only once since then. I have pushed a ‘backup now’ maybe 30-40 times as both ‘do an auto backup now’ and 'do a manual backup (using effectively the same criteria as the auto should), that has successfully completed 3 times and failed 90% of other times. How many of those 4 backups are present in the archive? One. What happened to the others? Don’t know. Why did all those backups fail? Also don’t know, I don’t see anything meaningful in the logs.

How many backups do I have total in /root/backup? About 100, all but one from prior to the 2025.1.2 update, including about 20 from the last week of December 2024 when I was trying to narrow down what update was preventing the UI from starting and 2 created as a test of the automation created at the beginning of this post.

So is the problem resolved? Well sort of, more correct would be to say that you’ve recommended a workaround that had been there all along and had I been aware of it I probably would have used it all along. Did it solve the problem for anyone outside the confines of this thread that may have the same issues? No.

petro · January 16, 2025, 5:37pm

That info will be in your supervisor logs.

stevemann · January 16, 2025, 5:45pm

Where do you get this idea? Home Assistant is unlikely to ever be a mainstream consumer product. Most of Home Assistant is DIY.

skorpioskorpio · January 16, 2025, 5:47pm

Supervisor log ERRORs reference Core logs which contain nothing. There are Supervisor Freeze/Thaw WARNINGs but no actionable details. Like I said nothing meaningful.

petro · January 16, 2025, 5:49pm

Every time my backups have failed, I have an error in either HA logs and/or supervisor logs. So I’m not sure what to tell you.

skorpioskorpio · January 16, 2025, 5:53pm

This is what is in the Supervisor logs (dozens of times since 2025.1.2) and there is no corosponding errors in the Core logs.

2025-01-16 08:15:37.011 WARNING (MainThread) [supervisor.jobs] 'BackupManager.freeze_all' blocked from execution, system is not running - freeze
2025-01-16 08:15:42.193 WARNING (MainThread) [supervisor.jobs] 'BackupManager.freeze_all' blocked from execution, system is not running - freeze
2025-01-16 08:16:14.666 ERROR (MainThread) [supervisor.homeassistant.module] Preparing backup of Home Assistant Core failed. Check HA Core logs.
2025-01-16 08:25:01.606 WARNING (MainThread) [supervisor.backups.manager] Timeout waiting for signal to thaw after manual freeze, beginning thaw now

petro · January 16, 2025, 5:56pm

and what do the core logs say at that time?

WHat’s your log level set to?

skorpioskorpio · January 16, 2025, 6:13pm

Oh I would have agreed with this statement a couple years ago, but not now.

With off the shelf HA appliances, HA logos on 3rd party device box tops, and practically every Home Automation influencer dumping whatever platform they were notable for, in favor of HA. You are getting pretty hard pressed to find any real informational alternative to HA for anything more complicated than a few lights and switches. Sure there are other environments out there, but at a certain complexity and device count, it’s all HA. You might could argue that Smartthings gave it a run for it’s money at one point, but when Samsung dumped it and WebCore died, that was pretty the end of that.

skorpioskorpio · January 16, 2025, 6:20pm

Core logs? Nothing at all.

Unchanged from default, is there more info to be had? Maybe, but if I have to change the log level to determine that, it just reiterates my point that it’s broken and not ready for release. Having Supervisor logs refer to Core logs that don’t exist is in an of itself broken behavior. No?

petro · January 16, 2025, 6:23pm

My guy, I have no idea why you’re so upset, I’m just trying to help.

skorpioskorpio · January 16, 2025, 7:08pm

Honestly at this point, I’m not sure that I really am, it went from being a blocker to being just an annoyance now. The problem was that it prevented any backups through the UI from reliably being performed, not just that Automatic ones didn’t happen. And this comes on the heels of having to do a rebuild after a failed update that resulted in loosing the configurations of several YAML heavy integrations that took me a long time to get right, that now I have to do all over again.

I am now backing up the system through multiple other means. At storage level (ZFS), at VM level (Proxmox) and through an automation using the backup full action. So is MY particular problem resolved, well yea, multiple times over, and to a safer degree then it was, certainly. Is the problem I originally described in this thread resolved? No, it’s just made the “official” way of doing backups dead to me.

petro · January 16, 2025, 7:10pm

I also don’t use the official backup method. Is that really that big of a deal? As long as the options exist, use whatever works best.