Very nice and simple. Can it be used to talk to other devices in the same way over MQTT?
Currently, the main focus is HA. Yes, the mechanism could be applied to other services but they wouldāve to implement the heartbeat mechanism to send messages to keeper. Personally I think this project should, for now, focus on Home Assistant which is the principal point of failure. That said, this only performs service restarts but Iām open to suggestions such as backups (or others) which is another piece of high availability.
nragon, very nice indeed.
Digging into HAProxy, could we have multiple instances on this oneā¦ What I mean is that āentry-pointsā should always respond. I saw some pictures about only one entry point, the load balancer, but what if that craps out.
Historically, load balancers are by default a single point device/service and subject to being a SPOF but ordinarily, the load balancer is the only software on the device and is therefore very stable.
Generally, you canāt foolproof everything but, if you ensuring failover or redundancy on the most complicated pieces, that will greatly reduce risk.
My goal is to have redundant access points, redundant HA hub, redundant database, redundant MQTT and backup power for everything. Outside of that, having replacements of the rest (like network switches, routers, cables, etc.) and a restore plan that is documented, is about as good as you really need. Ultimately, making sure that your home automation system fails dumb instead of stupid, is your overall best possible scenario.
I wrote up my goals and assumptions as someone who is designing my system for a person who has a disability (my wife) in this article if interested.
I think load balancing is not the main issue here. Availability is. Does any of you get 100%+ on HA?
@anon3881519, Keper can totally evolve into an active/passive approach deployed in multiple machines. Currently the downtime is the time needed for HA to start.
So, I was reading up on Docker, and stumbled upon Docker Swarm.
Seems that with its setup of āmanagersā and āworkerā nodes, High Availability can be achieved.
nodes can be both physical or virtual.
So with āDocker on Dockerā we could add Raspberries or other hardware that runs docker and add it to a cluster.
Found a very nice tutorial here:
Official documentation:
https://docs.docker.com/get-started/part4/
Found out about swarms when I stumbled upon this: (4piās, 2 managers and 2 workers)
Cheers.
These are some great resources. Thanks for the links!
I wonder how āswarm awareā HA would need to be to pull this off effectively.
Read on here, seems that itāll work. user quasar66 has a pretty decent setup running!
Thereās been a fair bit of work recently on high availability configurations of kubernetes, and also on using rasberry pi clusters for kubernetes. Assuming you have a high availability kubernetes backplane (i.e. multi master), storage and multiple cluster nodes, high availability for home assistant (excluding Z-Wave) should be possible through:
- Configuring HA to run in a kubernetes pod or pods
- Deploying a single instance of HA (i.e. your configured pod(s))
- Allowing kubernetes to detect failure and restart the failed pod(s) on another cluster node
The kubernetes backplane takes care of monitoring, failover and routing. Failover probably wouldnāt be instant, but tuning could make it acceptably fast.
k3s (https://k3s.io) looks particularly promising, but doesnāt have a high availability config yet.
I run home assistant in a kubernetes cluster using k3s. Even in a non ha cluster you can survive node outages if the master keeps alive. If the master is broken your workers will just continue with their work. A master node can also act as a worker node at the same time. This means in a two node setup you will survive the outage of any one node. With a three node setup you might survive two simulations node outages.
K3s will provide an ha mode in the near future.
Would be interested in more detail, if you have time to write.
Hello everyone ,
I have recently created a project called HAHA - Highly Available Home Assistant, which creates a highly available cluster which runs Home Assistant.
It runs using Docker swarm and includes a MariaDB Galera cluster and Mosquitto MQTT broker. It uses GlusterFS to synchronize Home Assistantās files, so that in case of a failure, all of the state is transferred to other nodes.
More details about the project in this thread: HAHA - Highly Available Home Assistant
Hi, would you like to share your config? Tried to use the helm config but can only get it to work with hostnetwork=true on k3s?
Are you running it with hostnetwork?
I would like to not use hostnetwork and use metallb as loadbalance for a fixed ip-address to connect inwards and traefik as reverse proxy. Anyone got that working and able to share his/her deployment manifest?
Edit: ok, like always, after posting I find my errorsā¦ Seems to work now, although not yet with runAsUser, which I would like. If anyone is interested I can post the deploy I currently use.
Edit2: saving to lovelace ui fails. Needs some workā¦
Regards
Yes I use the host network because otherwise some plugins which use upnp donāt work.
Actually, such a problem could be avoided altogether. Think of MS SQL Always-On. This is setup as a cluster. Each individual device has an assigned IP, then there is a āsharedā IP for the cluster (only one-node would actually be active at a time). In the case of MS-SQL, for Write functions, this is true, and only one of the instances is in āwriteā mode, while the other is strictly in read mode. For read operations, it would really matter which node is being looked at.
In this way, delays can be minimized. This would also offer redundancy in that there are actually two complete copies of the DB. Biggest thing to work out is syncing data across both instances. This could be applied to services as well.
I have had an instance of HA fail for no apparent reason (running on rPi). Re flashed the SD card, and restored my config, and everything is up and running again. If I had a cluster though, there would have been no outage, and perhaps even a restore could be achieved faster, since a clustered setup would have to sync the configurations.
Just a thought. A native HA āclusterā/āload-balancedā config would be very cool.
This could be processor intense, so depending on your load, might require more than a rPi3, maybe an rPi4 minimum.
Just my two cents.
Come to think of it, clustering in Linux already exists. If someone could "sync the HA DB, and the configurations (rsync), then this should be possible running HA under linux on a couple of small machines.
+1 here. I would also like to see this natively supported