High Availability

A second installation of HA to that pulls the same configuration and acts as a backup in case the main instance is offline. When the main is back online, the backup can update the database on the main to fill the gap

This concept breaks down once you start having any Z-Wave or Zigbee devices. You can have a radio stick plugged into both machines, but you can’t have them both be the master on the same network, so you would have to physically move your zigbee/zwave stick from the master machine into the slave for the slave system to control those devices.

1 Like

Respectful suggestion that you change the wording to be Leader/Follower or Primary/Backup etc. Remember the point isn’t what connotation you take from the language used, it’s about how such systemic language impacts others.

6 Likes

High Availability Standby is what you are describing.

Only one can be in operation at a time, but both need to be aware of the current state of all things.

Complexity is added when you start pulling in integrations that have hardware. Mentioned above, Zigbee, Zwave, etc… I know with ZWave you can have to a secondary controller, but that requires some planning and controls be added to handle it. Not sure about Zigbee and other integrations.

I use a Z-Wave stick and a Zigbee stick with two HA instances. They both run at the same time.

But I use it to test new things on one instance without messing with the instance that runs the home.
General question.
Do you guys have so much downtime or critical devices that don’t allow downtime? I run my production instance for more than 2 years and the only downtimes were during HA updates and when I moved HA from a Pi to a NUC. I do weekly backups of the VM (I did the same for the Pi’s SD card back then), this way I would be up and running again in less than 5 min in case HA fails. I even tried this once just for fun (funny, I know :rofl:) with the Pi.

I would love to have the option to run HASS in Kubernetes.
The zigbee stick is a valid argument, but if I use a network Hub(Hue, Aqara, Ikea etc) for my network this isn’t an issue.

It’s actually one of the things I dislike about ZHA, the Single Point Of Failure. Hass has become too important to have downtime.

1 Like

I love the challenge of making HASS highly available and I do think it is possible, the question I keep asking myself is it worth it? Risk/reward calculation plus the “because I can” addition. :slight_smile:

From the beginning of my home automation journey I always worked on the simple principle that if the system does not work things continue to work as expected. Example, light switches still turn off and on the intended light if HASS is offline. For the most part this is true, there are some situations that break this principle even in my setup, but having this principle for the system being unobtrusive is important to me, my family and my guests.

With Zigbee and Zwave moving/moved to an external system, I feel this helps to make the HASS HA story easier, but at the expense of now placing HA requires on the respective external systems. Even with that, I feel this mitigates the complications of having these within HASS.

One concern I have about HASS, and this has gotten way better in recent version (thank you devs!), is stability. There are still some situations I have observed where I would consider HASS to be in a degraded state and should hand off to the backup, but detecting these situations just needs to be figured out. :slight_smile:

I would like to see this function although it has some challenges to overcome.
This is something integrators might like to see as well.

I run my HA instance on a high powered VM due to the size of my HA setup but i would like a backup that is on something like a Raspberry Pi. The backup will have increased lag in my instance but still function.
I currently have a script copying the latest snapshot from my VM to a basically configured Pi HA install. If my VM is going down for a long period I restore the snapshot to my Pi.
I have stopped short of automating the restoring of a backup snapshot.

Some of the challenges that would need to be thought about are:

  • local ip will change. This could impact anything that is initiating connections internally. High availability local IPs that don’t change is out of reach for most home networks.
  • physically connected hardware. I have kept my hardware separate and network connected to give me freedom. E.g. deconz in a separate pi rather than in an addon.
  • remote connections. I use duckdns so would need to redirect my remote connection. One way for this to be managed would be to use Nabu Casa remote and make it backup aware and redirect accordingly.
  • How do you manage when the primary becomes available? Having two instances could be an issue. Some of my integrations only allow certain number of connections and a second instance would exceed that. Also data to influxdb could be messed up for example as it the same data could come from multiple sources.

Some of these challenges could be solved by the backup being a limited version of the primary. A bit like safe mode but on separate hardware rather than true high availability.

  • Disabling things like certain addons and external databases, possibly user configurable on what can be enabled/disabled.
  • Nabu Casa becoming backup aware. And redirecting accordingly.
  • On detection of the primary again it goes back to a dormant state of copying the primary config without running. Immediately after the primary would need to restart to ensure it gets all it connections back that the backup might have been using.

At least under this scenario most of the HA would function.

1 Like

There is high availability home assistant (HAHA) on the forum.

I checked out HAHA and it looks like it is built pretty well using existing systems for something that could be true High Availability.
From what I have read it is Home Assistant Core, so no supervisor and written for Raspberry Pi only. That would mean giving up addons and snapshots.
HAHA github
Community post

I cannot speak for the community but from my experience I think for most people what they probably want is a backup server as opposed to true High Availability as most home setups could likely handle a short handover period. Preferably automated backup and handover.

That is only an issue if you just direct-attached USB adapter for Zigbee and Z-Wave.

High Availability failover to a secondary HA instance should still be able to work fine if your Zigbee and/or Z-Wave controller is an external networked bridge/gateway/hub on your local network, such as Sonoff ZBBridge for ZHA, Samsung SmartThings, Tellstick Znet Gateway, IKEA Tradfri Gateway, etc.

1 Like

ZHA works with Sonoff ZBBridge hacked with Tasmota.

You can also make your own ser2net server (serial bridge server) with a ESP8266 or a Raspberry Pi

I second the master/slave idea. To be truly useful, it would have to be cross platform. Biggest challenge may be to redirect mqtt type devices.

+1 for sure.

Or just run zigbee2mqtt on the remote pi. I’m using a Pi 1 for mine for the last month, as I needed to move the ZigBee controller closer to my sensors without moving the HA server.

Yes but what is your high availability solution for zigbee then?

I don’t have one. You also need redundancy for the MQTT broker.

1 Like

Yeah it is hard to have anything for redundant. With zigbee and z-wave there is (I think) a connection based on hardware key between your IOT device and the USB sick.

If you have 2 identical zigbee2mqtt sticks, one can take over from the other. Can’t have them both online at the same time though.

+1 that would make life a lot easier. But I also understand getting there is not easy

I think that the hardware piece is going to be the most difficult. I agree with using a pi as a backup, because worse case I have it connected to a large battery and I can make decisions related to whether I shut down some, part, or none of my system.