The Title says it all.
It was a big Step to inform Users in UI what broke.
Now for a longer Time already, sometimes a Warning gets issued when a breaking Change is foreseeable, including the Version when it will break existing Stuff.
If the Goal is, to have a HA that is always up and running, it would be perfect if an Update can only be installed, if the break of functions is either solved or actively dismissed.
That information is only tracked in the blog posts, the code has no idea what’s breaking or not. I doubt this will ever get added beyond the current repair system. I.e. the code only knows it’s broken after the update occurs.
I interpreted the wth more as kinda Brainstorming - what would be really needed, really awesome.
However, sure it isn’t trivial. And I can’t provide solid solutions.
But, ignoring that it might take quite some CPU time, can’t be all yaml code just looked through for statements indicating a problem? Can’t versions of add-ons etc be checked against compatibility?
Or could the breaking changes be tried to get simulated - if the new call works it’s fine, if not, but the old one, it indicates a breaking Change?
We chose to go to the Moon. Not because it’s easy, but because it’s hard
It’s not a CPU issue, it’s a management issue. Who’s going to track all these changes, how will the data be gathered, and where will it be gathered from. Right now it takes 1 person a week to tabulate the breaking changes (and we often get it wrong), right before the release is created. You’re essentially asking this person to double their work in within an already limited time frame.
I am not asking anybody who doesn’t have time to have less.
Im describing what i consider a big problem, or, positive said, what might be a solution.
If we only think about time, then everything in Homeassistant consumed time.
Maybe a breaking Change tracker (basically list) would be an improvement? It has to be noted somewhere anyhow.
Why not a checklist, where all possible breaking changes have to get noted - mandatory check, to pass a pr?
And if already knowing what is breaking, this could get fed into that list?
Well that’s the thing, you are. Even if the process is automated and a required step in the PR, people still have to know it’s a breaking change and add it to a list. Secondly PR’s are cherry picked into builds, someone manages that. One large list that is built before the build is made will not satisfy your WTH. Because the breaking change isn’t linked to a version. So making it a requirement for the PR doesn’t help because someone has to go through after the fact and mark what version it gets merged into.
If a PR has a breaking change, require that it also list the change in some kind of pre-update-check system.
Automation can pull those together for the build based on what PRs are included.
Before updating, HA runs the pre-update checks listed in the target build. If any fail, abort the build unless it gets explicitly ignored by the user.
The check isn’t present in just one build; it’s present in all future builds for some time period (a year?) to ensure if a user jumps over this build they still run the check.
After the metadata is in the PR, there is no manual effort on anyone to collect breaking changes for a release, it should be 100% automated.
However, that’s a lot of work to set up a new pre-update-check system. It’s difficult to enforce that all PRs with breaking changes comply. It can be confusing to new users if their update fails by a pre-update check.
The upside is that diligent users like me don’t need to read the “Breaking Changes” section and compare it to our own memory of what we use. The system could handle it for us. It would also make me personally feel safer setting up automatic updates. But implementation is a large undertaking.
I think the problem is that a breaking change i HA cause issues in third party integrations and many of them might be unknown to the HA devs.
Just because a third party integration use the HA code with the breaking change it might still not use the exact code that the breaking change affect.
For core stuff, or known breaking changes, I think the MVP and reasonably achievable level of functionality here would be that the update gives a confirmation pop-up that lists the in-use integrations that have breaking changes, and asks the user to confirm before upgrading or back out to resolve the breaking changes.
I don’t think it would be reasonable for HA to have to determine if any of the specific breaking changes will impact dashboards, automatons, etc.
It wouldn’t be a guarantee that the breaking changes would affect the user’s instance, just that they’re using things that do have breaking changes. It can be easy to miss something when scanning the release notes.
I like your Aproach! Aside it is constructive, you show the Balance of investment and outcome.
So, i understood it might be a lot of Work (maybe a Hell of) to accomplish, and it will never be 100% - only when thinking about inofficial Addons.
From my Perspective as a User:
The more a System is Failsafe, the better.
The more a User is convinced about its System, the sooner an Update might be installed.
The more the System is up-to-date, the safer it is.
Sure, at some Point we need to talk about how high the ‘Price’ is, in terms of on-time-investment and regulary. And how much the System as a whole benefits from it.
Maybe the Investment is too high.
But maybe at the End, it is not taking a Week each Release, but only 2 Days (as it is semi-automated) to do manually quality Checks. And, at least for the Core- and Official System, Changes that break the System are 90% Past, as they can’t get installed.
Having said i am no Dev but just an average User, this might be a really stupid Idea, but it like to share it anyway
What, if a second instance of the same HA would be running (lowest priority), logging a fresh Boot. Then installing the Update, and logging again.
Then those two Logs could be compared. We could identify Problems by only showing the Differences, while the Main System would be still up and running.
That would mean it would probably take, lets say 12hrs, to do only the Process of checking. Personally, i would be happy to accept that, if at the end i know it safe - or where should be looking at.
It’s nice to know I’m not alone on this planet with my feature request, I maintain that this is along similar lines. And it’s not about testing the 2000 HACS integration, but maybe installing the 30 most used and installing the next update to see if HA survives
So the Alexa Media Player doesn’t work again, luckily I don’t use the thing, but it’s going in the same direction again, “HA breaks the player”, but it’s not HA that’s the problem, it’s the third party addon.
The first half of what you described is very common in the software industry for things like web apps: start a new instance (receiving no traffic), make sure it starts up okay, then cut over traffic to the new instance and shut down the old.
Unfortunately, software has to be written in a specific way to make this possible. In most cases (and definitely with HA), running multiple instances together will cause problems for both of the instances. They’ll be trying to do the same things at the same time and will conflict with each other. Refactoring the code to deal with this would be a very very large undertaking. Maybe even a ground-up rewrite.
But it’s linked to a pull request, right? Compiling a list of PRs tagged with “breaking changes” seems like exactly the kind of thing that could be automated via a github actions job that’s added to the release process. And maybe that’s already happening?
One plausible path to implementing something like this could be:
When building each release, something in CI compiles a list of PRs in this release which were tagged with breaking changes, and stores a list of [integration tagged in the PR, PR number] pairs somewhere like a json file that becomes one of the github release artifacts. I suspect something very close to this is already automated, but I don’t know the details.
Whatever it is that shows available updates in HAOS (I use the docker container, so I don’t have it) can compare that file (and the files for intermediate versions that are being skipped) to the list of integrations the user has installed, and present a list of “possibly incompatible” changes to the user in some sort of appropriate UI. If you don’t have the listed integration installed, then don’t bother showing it.
This would not be perfect, but it would be an easy way to get most of the way there, in a “here’s what you might want to watch out for” kind of way.
The reason the “canary” solution @smartin mentioned would be so much work is because home assistant is not hermetically sealed. It wouldn’t be very useful if it was! It interacts with many resources like its database, a z-wave/zigbee stick, an MQTT broker, all your other add-ons, various cloud services, etc. None of those things are designed to be able to support multiple copies of the same HA at the same time. And there are so many different ways to configure home assistant that it doesn’t seem reasonable to try to simulate them all.