HA stability - experiences. Came here from competitor

I evaluated both HA and OH (and a couple others) when I first started. I don’t regret going with HA. The core itself has been rock-solid stable for me.

But… It’s not for the faint of heart. As mentioned, there are frequent core updates. These updates often require hands-on work to fix breaking changes, which are fairly well documented.

The biggest problem, in my experience, is that updates to integrations are rolled into the core updates. Integrations are notoriously hard for any one developer to test. They can only test in their own environment. Yours is probably different. Bugs are common, and not documented anywhere. It’s possible to update the core and then run into a bug in an integration, even if you diligently read the breaking changes list. The only non-invasive way to back out one of these integration updates is to back out the entire core update. In fairness, the developers usually do a great job of jumping on these bugs, but by the time one is fixed (in the next release) another could be introduced in a different integration you need. Once you get a couple of versions behind, catching up takes a lot of effort.

Right now the best a user can do is to minimize the number of integrations they depend on. I did put in a feature request to decouple the integration updates from the core updates, but it didn’t get much traction.

This shouldn’t be a show-stopper. Just go in with eyes open and understand that HA, and probably any such system you’re actively tinkering with, is going to require a commitment of your time to maintain. I just finished digging myself out of a three-releases-back hole and I’m still a fan.

2 Likes

I’m using HA for over 5 years now and can only confirm @frits1980’s statement. Rock solid, every issue/unstability I had was caused between the chair and the screen :slight_smile:

2 Likes

I consider HA to be a particularly unstable platform, for this very reason. Frequent updates have breaking changes that, unless you have time to read all the documentation about the update, will break an integration, script or automation.

If you have the ability to run it as a container, then you can pin a particular version until you have time to read about the upgrade and sit down to iron out the breaking changes. This is what I usually do.

If you want an example of the havoc caused by breaking changes, read up on the z-wave and Tuya issues after the last major update (2021.10).

I would compare HA to Arch Linux more so than Debian.

1 Like

No version updates itself so anyone can stay at whatever version they desire, and update after they have read the docs.

3 Likes

HA have been great for me and so far no complain. Off course people have to remember regarding breaking changes and read the release document. Most people having problem due to just upgrade without know what they are getting into (its happen to me during my early adoptions)

1 Like

If you use the search button above you’ll find plenty of info and opinions on openhab vs ha. This one in particular

2 Likes

I have news for you. You ARE on the dark side.

Move into the light where developers actually care about user input. documentation is actually useful because it is written by the developers who create the code. openHAB expects users to write the documentation for the core. For addons OH requires the developers to provide documentation though.

The OH leader’s actions, many times, do not agree with his actions. I know for a fact the "stable OH 2.5.0 was released with a bug that was reported over 6 months previously. They refused to actually fix the problem in the core code but worked around the issue in 2.5.1 without changing the core. New users who installed OH & the addons package were unable to configure their system. I know this because I discussed it at length with the lead developer and pushed for a resolution.

1 Like

The thing with stability is IMHO:

A: Configure your recorder properly, and then you can leave HA alone, it will work alone.
B: IF you have new ideas (and you will have ideas), you have to choose:
=> B1: Realize those ideas in your home system => Your will cause unexpected downtimes, that affects other people in your house
=> B2: Realize those ideas on an sandbox system => Your will not cause unexpected downtimes, but you will have to spend more time to maintain your system

The problem with B2 is (in my experience): You have not two homes, one live-home and one sandbox-home :slight_smile: .

3 Likes

I have done this on occasion to test breaking updates to integrations. If you have the capacity to run docker containers, it’s trivial to set up a new, test instance of HA.

My underlying storage is ZFS, so I usually clone my data and spin up a new updated container to upgrade. If all goes well, I get rid of the old container. If not, I get rid of the new one and wait for the problems to be ironed out.

In the last 13 months I have not had a single stability issue, though I am not running on an RPi… that seems to be the source of 90+% of stability issues judging by forum posts.

1 Like

Really? I’ve had integrations break a few times in the last year.

Were these breakages mentioned in the release notes and you did not fix them before updating?
Were those core integrations? Can you name some examples?

Also for me an integration breaking is not instability unless it took your whole system down, but that’s just my personal opinion.

1 Like

Yes it’s not unussual that core integration(s) not mentioned in changelog suddenly stop working after update (or at least part of their funcitonality is broken)

I guess someone who considers moving to new platform is interested in reliability in overall not stability per-se.

I could say HA is reliable if you have a luck or don’t uses system under specific conditions which might
cause problems. The few issues I’m aware of:

  1. Supervisor may fail on update crashing whole HA. unfortunatelly Supervisor updates without user control as we know. It happens to people in waves kind of once a year or two.
  2. Communication with NabuCasa breaks without reason. Probably snitun module fails and not recover. Happens about once a year
  3. Transition from DST to normal time crashes half of HA functionality, makes high CPU load, some installations stopped working at all (see todays reports).
  4. Core may DDOS itself and your local network because of DNS issue
  5. Last but not least: all issues with various severitites related to updates. incl impacting components which are not listed in changelogs. It does matter after the fact that all components must be updated at once (with whole HA). It increases a risk of breaking one component while updating another one in order to fix it
  1. One of the reasons why I don’t use supervisor. But still, a failure only once or twice a year is not that much in my view. There are commercial systems that fail more often
  2. Not using Nabu Casa anymore. Again an online system that fails once a year is actually pretty solid, Amazon/Google Services fail more often.
  3. No issues at all here, from what I could see it was mostly caused by time pattern trigger, while I agree that this shouldn’t be the case, I still haven’t seen a single automation that actually needed a time pattern trigger and other triggers would not have been more efficient.
  4. I agree this is an issue discussed heavily in another topic and should not be the case. Again I don’t use supervisor, so can’t comment on that.
  5. I run HA for 5 years now and my system never went haywire due to an update, I also never had the system reatart itself, not even once.

I got a feeling that people expect a military grade reliability system for free. In maybe 95% of the cases I saw here on the forum (and I read quite a lot of them) the issue was not with the system, but with the user and more often than not it was mentioned in the release notes, but people rather ignore them and whine afterwards than taking 15 minutes to read and 10 minutrs to adjust their system before updating.

The “breaking changes” section of the release notes does not list changes which break things (bugs.) It is intended to only list intentional changes which will force some or all users to make configuration changes to keep things working correctly.

For a long time I didn’t know this and thought I’d be OK if I just read the release notes and fixed any breaking changes which applied to me. This is not the case. You also have to monitor these forums and Github to scan for any reported bugs with the specific integrations you use. Or just cross your fingers, update, test everything, and be prepared to back out if necessary.

Exactly. Bundling all integration changes with the core update, then not documenting bugs in one central location is setting users up for failure.

I agree there’s a lot of “pilot error.” I’ve certainly made my share, and folks here have always been kind and helpful in setting me straight.

I do question the 95% figure, but even so, it’s hard to call it whining if someone does read the release notes but is impacted by a bug in an integration which was known but not documented, especially since there’s no way for the average user to back out an integration update without backing out the whole core update.

2 Likes

I ensure you that it happened to installations which are not using pattern triggers.
BTW You are not using most adviced installation type. Consider hight percentage of issues happening to users who opted for HAOS deployment.

It would have been pretty bad if 95% of issues had been done by programmers. But I bet to OP more important is the remaining 5%.
In addition the system which month by month requires extensive maintanance and care from users also hardly can be called “reliable”.

I never had to do this and it’s certainly not the most common thing.

This you should do anyway and always when updating a production system in my opinion.

I was talking about the whining when people update and have an issue/breaking change that was documented in the release notes. For other bugs I agree, it’s certainly not whining. I have been here for quite some time and I helped out quite a lot of people and most of the time it is a documented change.

Yes, because I consider myself an experienced/advanced user who wants to have control over his system as much as possible :slight_smile:

Extensive maintenance is an exaggeration, I updated every month (in the past twice a month) since somewhere around 0.4x and I never had to invest more than 30 minutes incl. reading the relevant breaking changes. If you are not up to invest this time, than HA is not for you. It is still a system for tinkerers and IoT passionate people, it’s not set and forget and it probably will never be regarding the thousands of integrations.

One thing not mentioned here in terms of stability is that every Home Assistant installation comes with a built in ticking time bomb: the database.

Unfortunately the HA database design is, to not mince words, the worst possible way to store data. The “states” table treated as if it is a flat file. Every state is stored as varchar(255), every domain is varchar(64), every entity_id is varchar(255) and to top it off, the attributes column is of text type. Every state change adds a record containing mostly character data and attributes is actually stored as a JSON encoded string containing all the attributes of the state change.

Anyone who knows the basics of database design is facepalming right now after reading this.

This is why the HA database uses 10GB to store 500MB worth of data. Why the database is slow, why there are so many writes, why history graphics spike the CPU to generate, etc. etc.

And left untouched with a default install, this DB is run under SQLite which will happy grow to fill the disk at which point HA will stall, the DB corrupt. And if running on an SD-card, by now the card is already burned out by writes.

HA will never be a reliable platform out of the box until the database is redesigned.

4 Likes

My fairly experience with OH is that developers rush to get features / changes integrated just befor e release of a Milestone or “stable” version without proper testing in a previous snapshot release. Many times this results in breakage in OH releases. With HA, you are free to keep running a stable system. You can wait until others have found issues and patches before upgrading.