High Availability

I use a Z-Wave stick and a Zigbee stick with two HA instances. They both run at the same time.

But I use it to test new things on one instance without messing with the instance that runs the home.
General question.
Do you guys have so much downtime or critical devices that don’t allow downtime? I run my production instance for more than 2 years and the only downtimes were during HA updates and when I moved HA from a Pi to a NUC. I do weekly backups of the VM (I did the same for the Pi’s SD card back then), this way I would be up and running again in less than 5 min in case HA fails. I even tried this once just for fun (funny, I know :rofl:) with the Pi.

I would love to have the option to run HASS in Kubernetes.
The zigbee stick is a valid argument, but if I use a network Hub(Hue, Aqara, Ikea etc) for my network this isn’t an issue.

It’s actually one of the things I dislike about ZHA, the Single Point Of Failure. Hass has become too important to have downtime.

3 Likes

I love the challenge of making HASS highly available and I do think it is possible, the question I keep asking myself is it worth it? Risk/reward calculation plus the “because I can” addition. :slight_smile:

From the beginning of my home automation journey I always worked on the simple principle that if the system does not work things continue to work as expected. Example, light switches still turn off and on the intended light if HASS is offline. For the most part this is true, there are some situations that break this principle even in my setup, but having this principle for the system being unobtrusive is important to me, my family and my guests.

With Zigbee and Zwave moving/moved to an external system, I feel this helps to make the HASS HA story easier, but at the expense of now placing HA requires on the respective external systems. Even with that, I feel this mitigates the complications of having these within HASS.

One concern I have about HASS, and this has gotten way better in recent version (thank you devs!), is stability. There are still some situations I have observed where I would consider HASS to be in a degraded state and should hand off to the backup, but detecting these situations just needs to be figured out. :slight_smile:

1 Like

I would like to see this function although it has some challenges to overcome.
This is something integrators might like to see as well.

I run my HA instance on a high powered VM due to the size of my HA setup but i would like a backup that is on something like a Raspberry Pi. The backup will have increased lag in my instance but still function.
I currently have a script copying the latest snapshot from my VM to a basically configured Pi HA install. If my VM is going down for a long period I restore the snapshot to my Pi.
I have stopped short of automating the restoring of a backup snapshot.

Some of the challenges that would need to be thought about are:

  • local ip will change. This could impact anything that is initiating connections internally. High availability local IPs that don’t change is out of reach for most home networks.
  • physically connected hardware. I have kept my hardware separate and network connected to give me freedom. E.g. deconz in a separate pi rather than in an addon.
  • remote connections. I use duckdns so would need to redirect my remote connection. One way for this to be managed would be to use Nabu Casa remote and make it backup aware and redirect accordingly.
  • How do you manage when the primary becomes available? Having two instances could be an issue. Some of my integrations only allow certain number of connections and a second instance would exceed that. Also data to influxdb could be messed up for example as it the same data could come from multiple sources.

Some of these challenges could be solved by the backup being a limited version of the primary. A bit like safe mode but on separate hardware rather than true high availability.

  • Disabling things like certain addons and external databases, possibly user configurable on what can be enabled/disabled.
  • Nabu Casa becoming backup aware. And redirecting accordingly.
  • On detection of the primary again it goes back to a dormant state of copying the primary config without running. Immediately after the primary would need to restart to ensure it gets all it connections back that the backup might have been using.

At least under this scenario most of the HA would function.

1 Like

There is high availability home assistant (HAHA) on the forum.

I checked out HAHA and it looks like it is built pretty well using existing systems for something that could be true High Availability.
From what I have read it is Home Assistant Core, so no supervisor and written for Raspberry Pi only. That would mean giving up addons and snapshots.
HAHA github
Community post

I cannot speak for the community but from my experience I think for most people what they probably want is a backup server as opposed to true High Availability as most home setups could likely handle a short handover period. Preferably automated backup and handover.

That is only an issue if you just direct-attached USB adapter for Zigbee and Z-Wave.

High Availability failover to a secondary HA instance should still be able to work fine if your Zigbee and/or Z-Wave controller is an external networked bridge/gateway/hub on your local network, such as Sonoff ZBBridge for ZHA, Samsung SmartThings, Tellstick Znet Gateway, IKEA Tradfri Gateway, etc.

1 Like

ZHA works with Sonoff ZBBridge hacked with Tasmota.

You can also make your own ser2net server (serial bridge server) with a ESP8266 or a Raspberry Pi

I second the master/slave idea. To be truly useful, it would have to be cross platform. Biggest challenge may be to redirect mqtt type devices.

+1 for sure.

1 Like

Or just run zigbee2mqtt on the remote pi. I’m using a Pi 1 for mine for the last month, as I needed to move the ZigBee controller closer to my sensors without moving the HA server.

Yes but what is your high availability solution for zigbee then?

I don’t have one. You also need redundancy for the MQTT broker.

1 Like

Yeah it is hard to have anything for redundant. With zigbee and z-wave there is (I think) a connection based on hardware key between your IOT device and the USB sick.

If you have 2 identical zigbee2mqtt sticks, one can take over from the other. Can’t have them both online at the same time though.

1 Like

+1 that would make life a lot easier. But I also understand getting there is not easy

I think that the hardware piece is going to be the most difficult. I agree with using a pi as a backup, because worse case I have it connected to a large battery and I can make decisions related to whether I shut down some, part, or none of my system.

There was a recent post from someone setting out their kubernetes based install.

EDIT: here K8s deployment

Thanks @nickrout

Not entirely true.

I run 2 proxmox servers. HA runs in a VM on my main server and the VM is backupped each hour onto the second node. I have also a third proxmox ‘node’ which is just a RPi which role is to confirm if one the main node is dead for proxmox management to able to swtch VM from the main node to the backup one.

I have 2 USB sticks on each server : one AeotecZ-Stick GEN5 fro ZWave and a Zigbee2MQTT controler.
The 2 AeotecZ-Stick GEN5 are identical (same firmware, backup one using the aeotec backup tooln and then import the backup on the second one).
The 2 Zigbee2MQTT sticks are also identicals.

When my main server goes down, HA starts in about one minute on the backup node (so there is a HA down-time of about one minute which is totally acceptable from my point of view). And both ZWave and Zigbee networks work just fine on the second physical node.
My 2 nodes are distant of about 10 meters one from the other.

10 Likes

I’m a bit late to this thread, but my setup is similar to @Shaad.

I have my primary instance of HA running as a Hyper-V VM, with replication to another physical host.

Then I have 2 RPIs with aeotec z-sticks (not the same version or firmware FWIW - I initially bought the second stick so I could keep it up-to-date as a spare as my dependency on z-wave grew). The backup Pi is powered off, but can be powered on by the main HA instance if the primary Pi stops responding/fails (& rules to make sure only 1 of the 2 sockets is powered on at any one point), and I just take a backup of the z-stick, and a hassio snapshot periodically and load them onto the backup z-stick/pi.

home-assistant-remote is at the heart of it all: Everything on the pi shows up on the main HA instance, and because the Pis use the same IP & snapshot, the long lived token is the same. The result is that you can power one pi off, power the other on, and the main instance will happily go on talking to what it thinks is the same device.

1 Like