WTH why is there no Active-Active or Active-Passive high availability setup?

WTH why is there no Active-Active or Active-Passive high availability setup? I think this would keep the impart to reboots down or at least provide a easy way to use one as test bed.

I can’t speak for the devs here, but it’s probably due to the following:

First, doing clustering, particularly 2 node clustering, can be challenging.

Second, several of the IoT tech that folks use (particularly Zigbee and ZWave) do not support multiple control nodes. Meaning that if you’re using that sort of tech, you will have the mesh collapse when one of the nodes restarts because you can’t migrate the controller. Matter (which finally released today!) should help with that, but that’s all new tech that isn’t available on the market yet.

4 Likes

Yep, I am Sure you correct that it not a normal and challenging setup, but would be nice to have.

Your Second point is why I have Zigbee and Zwave on second device, I was getting annoyed that by zigbee and zwave network would die on every reboot.

2 Likes

Oh, I totally get that, that’s why I do my updates / reboots when the family isn’t up yet (thank goodness I’m an early bird!)

As one of the (likely very very small group of) people who runs HA in a Nomad cluster I’ve definitely been interested in doing something like this for a while. My MQTT server, database, Zigbee2MQTT, and a half dozen other utility jobs all run in my cluster alongside HA. While HA is reliably one of the most robust and stable things running on my cluster the hosts themselves are not, and shuffling HA between them has a non-zero amount of transition time. It’s slightly frustrating to lose those metrics in the shuffle, and it’s part of what’s keeping me from migrating more things out of my InfluxDB server and into HA native.

Having some kind of clustering, failover, or otherwise would be beneficial. I say this with little expectation for it happening as this opens quite the can of worms in any codebase not designed with it in mind from the ground up.

1 Like

Achieving a full high availability setup may be hard or impossible (within reason) at this time but it should not be a barrier to implementing what is possible or making progress in that direction. All those who argue that since you can only have a single controller, therefore if that goes down you are toast are short sighted… In my case, repairing all the devices (200 spread across zigbee and zwave) is the least of my worries…

It would be awesome to have two instances of HA running where one is inactive and just sucking in all changes, etc. on a schedule (nightly?) made the primary instance. If something goes wrong, it should easily inherit whatever settings are needed to take the place of the primary system. And if for the time being this requires the radios to be moved, I don’t see it as a showstopper. If one wants to minimize this issue then they can use the ethernet to [zigbee/zwave] controllers. Keeping a spare around would help reduce downtime thanks to the backup/restore options that are now more widely available having been integrated into HA.

A barrier is a poor excuse not to make progress.

My comments are not really addressed to posters in this thread. There was another one similar to this that got closed with multiple comments more or less saying “what is the point since you can’t do X anyway”. Not trying to stir up a debate, just sharing an opposing view.

4 Likes

stumbled upon this thread when researching the forum for high availability.
My current setup is 2 NUC’s with ProxMox/ZFS. HA is running on one host and replicated to the other every minute. Container with HAOS has been added as High Availability (HA) container. When node A fails it takes about a minute for HA to reappear. Manual migration downtime is next to none.

1 Like

@DBM - I too have 2x Lenovo i7 TinyPCs both running Proxmox. I need to decipher what you explained and see if I can do it… but I don’t understand how you can replicate HA “every minute”… if Proxmox is doing it, then I assume it replicates the whole logical drive which in my case is 128GB dedicated to HA. Also, how do you deal with HA changing IP, etc. ?

I have a 3 node ProxMox cluster (2 X NUC and 1 container at my NAS for quorum).
Prox1 is running HA as a LXC container (Proxmox Helper Scripts | Scripts for Streamlining Your Homelab with Proxmox) and I have put the following under the config of HA:

this replicates HA to Prox2 every minute. Next put HA as a container under high availability:

HA does not change its ip as it is set to a fixed ip address. The ip settings and such are all part of the replicated node so nothing changes…

6 Likes

I have also been searching for a method of doing this, the second post by @DBM is much better but it would be amazing if he wrote a step-by-step guide for us poor humans :wink:

I personally think this is overkill. And definitely not in the scope of home assistant. I have seen one of the core devs talk about it and they said pretty much the same thing. KISS, you don’t need high availability on home assistant.

@langestefan If you only have a few sensors and running a few automations, I agree it’s overkill.
However, if you become “dependent” on HA, if your entire family is already used to a series of automations, it has sensors that guarantee the security of the house, high availability becomes a mandatory requirement.

4 Likes

I definitely don’t agree. And neither do the devs. This is a blog from frenck, one of the core devs: https://frenck.dev/the-enterprise-smart-home-syndrome/

I think he is 100% right. I have been using HA for almost 7 years now and it controls everything in my house. In the beginning failures used to be common. But for the past 3 years it hasn’t crashed once. So what would high availability solve exactly?

Past success is no guarantee of future success. It’s not because in 3 years you haven’t had more problems that you’ll never have again.

As PIOTR said in the comments: " Can I accept failure? For how long? For how long can my smart home remain in non-operational state? " . That is, if you really accept spending hours without your sensors and automations, high availability really won’t help you at all.

2 Likes

In this case I am very confident that the quality of HA will only go up. You have to try really hard to make it crash, it can handle any kind of component failure without crashing. That used to be completely different in the past.

Most people who talk about high availability are sys admins who use it in their job and therefore think it’s the greatest thing ever. It’s a bias that is difficult to get rid off.

If your home automation server can’t be down for an hour or two without the house burning down you have bigger issues you need to solve first.

I just want to ensure you are aware of the fact that none of the HA devs will support this. So you can built it, but don’t go asking for support because you will not get it.

Having an active/passive cluster can also be nice when you’re doing upgrades or maintenance.

Honestly, if you can flash Tasmota or ESPHome on an IoT device and properly configure the integrations, setting up high availability isn’t that much more involved.

If you have a home lab with a Kubernetes cluster, or Proxmox cluster, etc, then by all means just make a Home Assistant virtual machine highly available with shared storage.

If you have commodity hardware such as a couple of NUCs lying around, you can deploy Home Assistant in an active/passive high availability cluster with existing open source components.

1 Like

I don’t think the issue is about HA that can’t be missing, but in my case it’s my family that get bad experiences.

Formerly I only used Google Home everything went well but came with limited possibilities (HA your only limitation is how far your imagination reaches), and of course the biggest downside of being in the cloud.

And there comes the catch: while it was all Google it went never down, so that was the user experience of my family, but as soon as I moved to Home Assistant they experienced downtimes. Not only because of updating home assistant and it’s natives, but also for each has update.

The results: family complains about home assistant with quotes like: google was reliable, and can’t we switch back?

In the end I would also like to have a native active/passive solution so I can update at anytime.
As I guess the USB hardware is the biggest struggle I think this should start with active/passive on the same node/host(so it’s not a failover but just active/passive on the same host).
This offers the ability of zero downtime as long as your node/host stays alive.
But again this should be natively supported and will add another big benefit over other locally running systems.

I do also strongly believe that HA becomes more and more important in a way dat it can’t be missed at all in the future (e.g. disabled people who strongly rely on automations that opens doors etc.)

When you update is under your own control. You don’t have to follow the HA update window, for example I update once every 2 months or so. My HA hasn’t had any downtime in the past 3 years so I wonder what you are doing?

If you can make it work for yourself and your specific situation by all means try it. But I think to make it a generic ‘works for all people’ option, which it would have to be if it was part of HA, would be very difficult and time consuming to build and only used by a very small number of people (only sysadmins). So I don’t see that ever happening

Don’t use HA for healthcare, that’s not what it’s meant for.