WTH why is there no Active-Active or Active-Passive high availability setup?

WTH why is there no Active-Active or Active-Passive high availability setup? I think this would keep the impart to reboots down or at least provide a easy way to use one as test bed.

I can’t speak for the devs here, but it’s probably due to the following:

First, doing clustering, particularly 2 node clustering, can be challenging.

Second, several of the IoT tech that folks use (particularly Zigbee and ZWave) do not support multiple control nodes. Meaning that if you’re using that sort of tech, you will have the mesh collapse when one of the nodes restarts because you can’t migrate the controller. Matter (which finally released today!) should help with that, but that’s all new tech that isn’t available on the market yet.

4 Likes

Yep, I am Sure you correct that it not a normal and challenging setup, but would be nice to have.

Your Second point is why I have Zigbee and Zwave on second device, I was getting annoyed that by zigbee and zwave network would die on every reboot.

2 Likes

Oh, I totally get that, that’s why I do my updates / reboots when the family isn’t up yet (thank goodness I’m an early bird!)

As one of the (likely very very small group of) people who runs HA in a Nomad cluster I’ve definitely been interested in doing something like this for a while. My MQTT server, database, Zigbee2MQTT, and a half dozen other utility jobs all run in my cluster alongside HA. While HA is reliably one of the most robust and stable things running on my cluster the hosts themselves are not, and shuffling HA between them has a non-zero amount of transition time. It’s slightly frustrating to lose those metrics in the shuffle, and it’s part of what’s keeping me from migrating more things out of my InfluxDB server and into HA native.

Having some kind of clustering, failover, or otherwise would be beneficial. I say this with little expectation for it happening as this opens quite the can of worms in any codebase not designed with it in mind from the ground up.

1 Like

Achieving a full high availability setup may be hard or impossible (within reason) at this time but it should not be a barrier to implementing what is possible or making progress in that direction. All those who argue that since you can only have a single controller, therefore if that goes down you are toast are short sighted… In my case, repairing all the devices (200 spread across zigbee and zwave) is the least of my worries…

It would be awesome to have two instances of HA running where one is inactive and just sucking in all changes, etc. on a schedule (nightly?) made the primary instance. If something goes wrong, it should easily inherit whatever settings are needed to take the place of the primary system. And if for the time being this requires the radios to be moved, I don’t see it as a showstopper. If one wants to minimize this issue then they can use the ethernet to [zigbee/zwave] controllers. Keeping a spare around would help reduce downtime thanks to the backup/restore options that are now more widely available having been integrated into HA.

A barrier is a poor excuse not to make progress.

My comments are not really addressed to posters in this thread. There was another one similar to this that got closed with multiple comments more or less saying “what is the point since you can’t do X anyway”. Not trying to stir up a debate, just sharing an opposing view.

4 Likes

stumbled upon this thread when researching the forum for high availability.
My current setup is 2 NUC’s with ProxMox/ZFS. HA is running on one host and replicated to the other every minute. Container with HAOS has been added as High Availability (HA) container. When node A fails it takes about a minute for HA to reappear. Manual migration downtime is next to none.

1 Like

@DBM - I too have 2x Lenovo i7 TinyPCs both running Proxmox. I need to decipher what you explained and see if I can do it… but I don’t understand how you can replicate HA “every minute”… if Proxmox is doing it, then I assume it replicates the whole logical drive which in my case is 128GB dedicated to HA. Also, how do you deal with HA changing IP, etc. ?

I have a 3 node ProxMox cluster (2 X NUC and 1 container at my NAS for quorum).
Prox1 is running HA as a LXC container (Proxmox Helper Scripts | Scripts for Streamlining Your Homelab with Proxmox) and I have put the following under the config of HA:

this replicates HA to Prox2 every minute. Next put HA as a container under high availability:

HA does not change its ip as it is set to a fixed ip address. The ip settings and such are all part of the replicated node so nothing changes…

6 Likes

I have also been searching for a method of doing this, the second post by @DBM is much better but it would be amazing if he wrote a step-by-step guide for us poor humans :wink:

I personally think this is overkill. And definitely not in the scope of home assistant. I have seen one of the core devs talk about it and they said pretty much the same thing. KISS, you don’t need high availability on home assistant.

@langestefan If you only have a few sensors and running a few automations, I agree it’s overkill.
However, if you become “dependent” on HA, if your entire family is already used to a series of automations, it has sensors that guarantee the security of the house, high availability becomes a mandatory requirement.

4 Likes

I definitely don’t agree. And neither do the devs. This is a blog from frenck, one of the core devs: https://frenck.dev/the-enterprise-smart-home-syndrome/

I think he is 100% right. I have been using HA for almost 7 years now and it controls everything in my house. In the beginning failures used to be common. But for the past 3 years it hasn’t crashed once. So what would high availability solve exactly?

Past success is no guarantee of future success. It’s not because in 3 years you haven’t had more problems that you’ll never have again.

As PIOTR said in the comments: " Can I accept failure? For how long? For how long can my smart home remain in non-operational state? " . That is, if you really accept spending hours without your sensors and automations, high availability really won’t help you at all.

2 Likes

In this case I am very confident that the quality of HA will only go up. You have to try really hard to make it crash, it can handle any kind of component failure without crashing. That used to be completely different in the past.

Most people who talk about high availability are sys admins who use it in their job and therefore think it’s the greatest thing ever. It’s a bias that is difficult to get rid off.

If your home automation server can’t be down for an hour or two without the house burning down you have bigger issues you need to solve first.

I just want to ensure you are aware of the fact that none of the HA devs will support this. So you can built it, but don’t go asking for support because you will not get it.

Having an active/passive cluster can also be nice when you’re doing upgrades or maintenance.

Honestly, if you can flash Tasmota or ESPHome on an IoT device and properly configure the integrations, setting up high availability isn’t that much more involved.

If you have a home lab with a Kubernetes cluster, or Proxmox cluster, etc, then by all means just make a Home Assistant virtual machine highly available with shared storage.

If you have commodity hardware such as a couple of NUCs lying around, you can deploy Home Assistant in an active/passive high availability cluster with existing open source components.

1 Like