I read it all and I agree with it.
Well if that was printed on paper, in an easily readable font, it’d be less than one page, one side, but I agree perhaps more information in one go than the evolved modern human mind can absorb. I knew I should have broke it up with a few inspirational cat memes.
Yes, I could create my own backup method, my HAOS instance is a VM on an array of servers running ZFS. But that isn’t the point now is it? This is, in my opinion, one of HAs biggest issues, that it constantly tries to aspire to be an integrated, mainstream product, but when things don’t work, the solutions are DIY hacks that will likely break something sometime in the future.
What bricked my install was indeed some interesting but trivial HACS integration that used depreciated methods, and what that cost me was the loss of the unnecessarily complex configuration of monitoring and controlling my stand-by generator, my uninterruptible power supply and my electrical panel. Stuff that is, well, not so trivial.
Thanks Matt, I appreciate that.
I didn’t ment to be rude, but it does look like that.
It is all matter of perspective. I don’t know will this be changed in the future, rolled back or it will be something completely different.
But take this as an opportunity for creating your own solution. Maybe that will be better for you, maybe it will be worst, maybe it will be the same.
But do something about it.
I can’t tell you what to do, but I can briefly share my experience.
I started using supervised on debian. As i have a doorbell camera and didn’t have a dedicated gpu or coral frigate was using cpu. One day my server crashed due to high cpu usage. By then I found out that this is using some contaners and that you can actually limit cpu and mem usage per container.
As I didn’t know how to do that in supervised installation I decided to go with a docker. Bit by bit I added a lot of things to it, like media server, local cloud, printer server and who knows what more.
I set up my own backup solution that will backup not only home assistant then everything else I find important to backup.
And my solution just works for me. And I’m glad I done that.
I hear you, and yes if I’d known where the events of less than a month ago would have ended up, in retrospect I should have done ZFS snapshots instead as a roll back mechanism. I mean it’s really one of the reasons my HAOS is on a VM to begin with, so I could have it sitting on RAIDed disks. But the fact that I ran my original HAOS implementation into the ground, performance wise, is why I was ultimately motivated to go to a VM. That was 9 months ago and there are still things that worked before that I’ve yet to get reworking.
But that said, every update does (or can do) a native backup, so if things go south that is supposed to be your safety net, but if it doesn’t work than it isn’t. And that’s obviously a problem. I shouldn’t have to add the step to snapshot the hypervisor host every time I update something for fears something may go badly and that the built-in piece of mind is not to be trusted.
My goal is to eventually have multiple copies of HAOS homed on different VM hosts, so there is always a stand-by instance waiting in the wings, and maybe I should update that goal to include a few “last known good” instances.
I have migrated all my Zigbee to be a (wired) network based gateway, and same with Zwave, so no more USB dongles. But that doesn’t make instances completely interchangeable. A lot of things are “stuck” to the instance they are paired to. I mean I do have a fairly elaborate configuration, about 700 or so devices. It’s also why I really don’t like doing hacky alternatives to something that in my opinion should just work, or should have been considered. Every time I have to configure some backend YAML heavy integration, I ask myself, is this going to come back to bite me?
I am a disaster recovery and business continuity architect of computer infrastructures by trade. But given how many fidly bits are part of a complicated Home Assistant deployment, there is only so much you can do to make this a survivable system when there hasn’t been much effort to make it that way to begin with. I have a long laundry list of issues I have with Home Assistant, and consumer IoT in general for that matter. It just isn’t designed to be robust, and that needs to change if there is any possibility that this is where we are going. But a failure of a primary safeguard is a real problem, especially when it worked as intended a month ago, and now it’s been “improved” and doesn’t really.
Hi skorpioskorpio,
Here are some of the alternatives that i know about should the new stuff not work for you. Basically all the stuff available last month is still available .
Hmm, the HA Supervisor Full Backup Action certainly seems handy and easy enough to just do a daily, so fair enough. I didn’t know about that, lets give it a go…
OK, I’ve already written an automation using it, and seems to work fine.
But…
It doesn’t change the fact that the more or less advertised method of doing backups recently introduced is unreliable and doesn’t really work as it should.
For example: I have it configured to take daily backups and retain all copies on the local disk (as path of least resistance). This has been configured since the 2025.1.2 update was install on January 2nd. It has successfully, automatically created a backup only once since then. I have pushed a ‘backup now’ maybe 30-40 times as both ‘do an auto backup now’ and 'do a manual backup (using effectively the same criteria as the auto should), that has successfully completed 3 times and failed 90% of other times. How many of those 4 backups are present in the archive? One. What happened to the others? Don’t know. Why did all those backups fail? Also don’t know, I don’t see anything meaningful in the logs.
How many backups do I have total in /root/backup? About 100, all but one from prior to the 2025.1.2 update, including about 20 from the last week of December 2024 when I was trying to narrow down what update was preventing the UI from starting and 2 created as a test of the automation created at the beginning of this post.
So is the problem resolved? Well sort of, more correct would be to say that you’ve recommended a workaround that had been there all along and had I been aware of it I probably would have used it all along. Did it solve the problem for anyone outside the confines of this thread that may have the same issues? No.
That info will be in your supervisor logs.
Where do you get this idea? Home Assistant is unlikely to ever be a mainstream consumer product. Most of Home Assistant is DIY.
Supervisor log ERRORs reference Core logs which contain nothing. There are Supervisor Freeze/Thaw WARNINGs but no actionable details. Like I said nothing meaningful.
Every time my backups have failed, I have an error in either HA logs and/or supervisor logs. So I’m not sure what to tell you.
This is what is in the Supervisor logs (dozens of times since 2025.1.2) and there is no corosponding errors in the Core logs.
2025-01-16 08:15:37.011 WARNING (MainThread) [supervisor.jobs] 'BackupManager.freeze_all' blocked from execution, system is not running - freeze
2025-01-16 08:15:42.193 WARNING (MainThread) [supervisor.jobs] 'BackupManager.freeze_all' blocked from execution, system is not running - freeze
2025-01-16 08:16:14.666 ERROR (MainThread) [supervisor.homeassistant.module] Preparing backup of Home Assistant Core failed. Check HA Core logs.
2025-01-16 08:25:01.606 WARNING (MainThread) [supervisor.backups.manager] Timeout waiting for signal to thaw after manual freeze, beginning thaw now
and what do the core logs say at that time?
WHat’s your log level set to?
Oh I would have agreed with this statement a couple years ago, but not now.
With off the shelf HA appliances, HA logos on 3rd party device box tops, and practically every Home Automation influencer dumping whatever platform they were notable for, in favor of HA. You are getting pretty hard pressed to find any real informational alternative to HA for anything more complicated than a few lights and switches. Sure there are other environments out there, but at a certain complexity and device count, it’s all HA. You might could argue that Smartthings gave it a run for it’s money at one point, but when Samsung dumped it and WebCore died, that was pretty the end of that.
Core logs? Nothing at all.
Unchanged from default, is there more info to be had? Maybe, but if I have to change the log level to determine that, it just reiterates my point that it’s broken and not ready for release. Having Supervisor logs refer to Core logs that don’t exist is in an of itself broken behavior. No?
My guy, I have no idea why you’re so upset, I’m just trying to help.
Honestly at this point, I’m not sure that I really am, it went from being a blocker to being just an annoyance now. The problem was that it prevented any backups through the UI from reliably being performed, not just that Automatic ones didn’t happen. And this comes on the heels of having to do a rebuild after a failed update that resulted in loosing the configurations of several YAML heavy integrations that took me a long time to get right, that now I have to do all over again.
I am now backing up the system through multiple other means. At storage level (ZFS), at VM level (Proxmox) and through an automation using the backup full action. So is MY particular problem resolved, well yea, multiple times over, and to a safer degree then it was, certainly. Is the problem I originally described in this thread resolved? No, it’s just made the “official” way of doing backups dead to me.
I also don’t use the official backup method. Is that really that big of a deal? As long as the options exist, use whatever works best.
It is not a big deal and it never was. Anyone can find a backup solution that works best for him and his setup if he wanted.
But it does seem that something else is…
Oh I get it, and would I have been in the situation I found myself in a month ago if I’d been less trusting and more cynical, obviously not. But then again this issue was not why I was in that situation either, as that happened before the change, the change just made it more likely that that issue would have happened, perhaps worse.
This is why the community exists, and why it’s essential, so that others learn from the lessons of their peers, right? But you don’t look for a solution to a problem you’ve yet to experience. Community resolutions are largely postmortem, perhaps almost literally so when the solution to an issue is one that could set you back to the beginning, and could sour your whole opinion of a product.