High Availability HA

Very nice and simple. Can it be used to talk to other devices in the same way over MQTT?

Currently, the main focus is HA. Yes, the mechanism could be applied to other services but they wouldā€™ve to implement the heartbeat mechanism to send messages to keeper. Personally I think this project should, for now, focus on Home Assistant which is the principal point of failure. That said, this only performs service restarts but Iā€™m open to suggestions such as backups (or others) which is another piece of high availability.

nragon, very nice indeed.
Digging into HAProxy, could we have multiple instances on this oneā€¦ What I mean is that ā€˜entry-pointsā€™ should always respond. I saw some pictures about only one entry point, the load balancer, but what if that craps out.

Historically, load balancers are by default a single point device/service and subject to being a SPOF but ordinarily, the load balancer is the only software on the device and is therefore very stable.

Generally, you canā€™t foolproof everything but, if you ensuring failover or redundancy on the most complicated pieces, that will greatly reduce risk.

My goal is to have redundant access points, redundant HA hub, redundant database, redundant MQTT and backup power for everything. Outside of that, having replacements of the rest (like network switches, routers, cables, etc.) and a restore plan that is documented, is about as good as you really need. Ultimately, making sure that your home automation system fails dumb instead of stupid, is your overall best possible scenario.

I wrote up my goals and assumptions as someone who is designing my system for a person who has a disability (my wife) in this article if interested.

4 Likes

I think load balancing is not the main issue here. Availability is. Does any of you get 100%+ on HA?
@anon3881519, Keper can totally evolve into an active/passive approach deployed in multiple machines. Currently the downtime is the time needed for HA to start.

1 Like

So, I was reading up on Docker, and stumbled upon Docker Swarm.
Seems that with its setup of ā€˜managersā€™ and ā€˜workerā€™ nodes, High Availability can be achieved.

nodes can be both physical or virtual.
So with ā€˜Docker on Dockerā€™ we could add Raspberries or other hardware that runs docker and add it to a cluster.

Found a very nice tutorial here:

Official documentation:


https://docs.docker.com/get-started/part4/

Found out about swarms when I stumbled upon this: (4piā€™s, 2 managers and 2 workers)

Cheers.

1 Like

These are some great resources. Thanks for the links!

I wonder how ā€œswarm awareā€ HA would need to be to pull this off effectively.

Read on here, seems that itā€™ll work. user quasar66 has a pretty decent setup running!

Thereā€™s been a fair bit of work recently on high availability configurations of kubernetes, and also on using rasberry pi clusters for kubernetes. Assuming you have a high availability kubernetes backplane (i.e. multi master), storage and multiple cluster nodes, high availability for home assistant (excluding Z-Wave) should be possible through:

  1. Configuring HA to run in a kubernetes pod or pods
  2. Deploying a single instance of HA (i.e. your configured pod(s))
  3. Allowing kubernetes to detect failure and restart the failed pod(s) on another cluster node

The kubernetes backplane takes care of monitoring, failover and routing. Failover probably wouldnā€™t be instant, but tuning could make it acceptably fast.

k3s (https://k3s.io) looks particularly promising, but doesnā€™t have a high availability config yet.

I run home assistant in a kubernetes cluster using k3s. Even in a non ha cluster you can survive node outages if the master keeps alive. If the master is broken your workers will just continue with their work. A master node can also act as a worker node at the same time. This means in a two node setup you will survive the outage of any one node. With a three node setup you might survive two simulations node outages.

K3s will provide an ha mode in the near future.

1 Like

Would be interested in more detail, if you have time to write.

Hello everyone :hugs:,

I have recently created a project called HAHA - Highly Available Home Assistant, which creates a highly available cluster which runs Home Assistant.

It runs using Docker swarm and includes a MariaDB Galera cluster and Mosquitto MQTT broker. It uses GlusterFS to synchronize Home Assistantā€™s files, so that in case of a failure, all of the state is transferred to other nodes.

More details about the project in this thread: HAHA - Highly Available Home Assistant

5 Likes

Hi, would you like to share your config? Tried to use the helm config but can only get it to work with hostnetwork=true on k3s?
Are you running it with hostnetwork?

I would like to not use hostnetwork and use metallb as loadbalance for a fixed ip-address to connect inwards and traefik as reverse proxy. Anyone got that working and able to share his/her deployment manifest?

Edit: ok, like always, after posting I find my errorsā€¦ Seems to work now, although not yet with runAsUser, which I would like. If anyone is interested I can post the deploy I currently use.

Edit2: saving to lovelace ui fails. Needs some workā€¦

Regards

Yes I use the host network because otherwise some plugins which use upnp donā€™t work.

Actually, such a problem could be avoided altogether. Think of MS SQL Always-On. This is setup as a cluster. Each individual device has an assigned IP, then there is a ā€œsharedā€ IP for the cluster (only one-node would actually be active at a time). In the case of MS-SQL, for Write functions, this is true, and only one of the instances is in ā€œwriteā€ mode, while the other is strictly in read mode. For read operations, it would really matter which node is being looked at.

In this way, delays can be minimized. This would also offer redundancy in that there are actually two complete copies of the DB. Biggest thing to work out is syncing data across both instances. This could be applied to services as well.

I have had an instance of HA fail for no apparent reason (running on rPi). Re flashed the SD card, and restored my config, and everything is up and running again. If I had a cluster though, there would have been no outage, and perhaps even a restore could be achieved faster, since a clustered setup would have to sync the configurations.

Just a thought. A native HA ā€œclusterā€/ā€œload-balancedā€ config would be very cool.
This could be processor intense, so depending on your load, might require more than a rPi3, maybe an rPi4 minimum.

Just my two cents.

Come to think of it, clustering in Linux already exists. If someone could "sync the HA DB, and the configurations (rsync), then this should be possible running HA under linux on a couple of small machines.

+1 here. I would also like to see this natively supported

2 Likes