HAHA - Highly Available Home Assistant

Hi everyone :relaxed:,

I wanted to find a solution for running Home Assistant with high availabilty - with a backup failover in case something breaks, like when the SD card in Raspberry Pi decides to die.

After searching, I quickly found out that no ready-made solution to this problem exists yet, and the ones that do, do not solve issues with state transferring or are complicated to setup.

With inspiration from user quasar66 and his setup described here, I went on to attempt to make a simple to run solution for creating a redundant cluster, running Home Assistant.

Well, after a lot of trial and errors, I managed to create a project called HAHA (which stands for Highly Available Home Assistant) and today, I would like to share it with you.

It’s features are:

  • Runs on Docker swarm
  • Easy setup using Ansible playbooks
  • Preconfigured MariaDB Galera Cluster for the recorder component (thanks to Colin Mollenhour)
  • Included Mosquitto broker
  • Uses GlusterFS for synchronizing Home Assistant logs and files and Mosquitto retains

It is made to be run using three and more Raspberry Pi devices, but can technically run on just two. More details can be found in the GitHub Repo. Please try it out and let me know in the issues if you encounter any problems.

Link to the GitHub repository: https://github.com/cvb941/HAHA

34 Likes

Nice! I will take a look.

To have highly available zwave, using gen 5 aeotec sticks you can do this:

For sticks plugged in a pc that’s in the docker cluster

  • backup the stick after everything is done and it works
  • restore backup to another 1/2/x sticks
  • plugin sticks to different nodes
  • label the nodes with the sticks with has_zwave=1 or something
  • make sure home assistant starts on those nodes

For “remote” sticks (socat/ser2net, usb-over-ip) it does not matter on which node home assistant runs

If you want “true” high availability for zwave

  • one master primary stick
  • one master secondary stick
  • one dedicated home assistant for each, integrated a single mqtt browker
  • the main homeassistant should send commands to that mqtt brokwer and the primary or master one will read and run the commands
1 Like

Interesting - thanks! I’ve been wondering about this since HA started to be more reliable than the hardware I was running it on :slight_smile:

I also hadn’t come across Gluster, which could be useful in other contexts.

Nice! How does it handle automations, specifically preventing triggering the same thing on all nodes?

only one instance of home assistant is running at a time. if it goes down it starts on a different host

Another approach to consider is running Home Assistant in Kubernetes (there is a helm chart already for it), and allow kubernetes to handle the scheduling like this solution aims to do.

1 Like

It seems this is most fresh thread on this redundacy aspect. I’ve come to share my take on the matter which seems to work quite well.

What I’m doing:

  1. I have configured everything on my HA so that it can be straight copied to another server/instance and once the main HA server fails, an automation will activate all alerts, other automations etc. on the backup HA server
  2. I transfer 3 times a week the whole main HA to the backup HA and restore the latest snapshot so that the backup HA stays up to date automatically. So I won’t need to update two times the same things…

How I’m doing it, several components involved:

  • Samba share -addon to easily open route for file transfer to the (backup) server.

  • Samba backup -addon to backup everything to the backup HA instance

  • Command line sensor to check which servers are online:

  - platform: command_line
    name: main
    command: curl -m 3 -s http://main.ip.add.ress:8123/ > /dev/null && echo ON || echo OFF
    scan_interval: 60

  - platform: command_line
    name: backup
    command: curl -m 3 -s http://backup.ip.add.ress:8123/ > /dev/null && echo ON || echo OFF
    scan_interval: 60
  • local_ip: in configuration.yaml for a sensor for automations to detect in which instance the HA is running

  • Automation constructed from the local_ip -sensor together with the main server -sensor so that automations and other unwanted overlapping processes are turned off in the backup -server unless the main server is on (for 2 minutes)

  • Another automation vice versa to turn everything on in the backup -server if main server is off for 60 minutes.

  • And one automation for restoring on the backup server the up to date snapshot 3 times a week with shell command

  • The shell command that parses from ha snapshots list the latest snapshot (I hope so, I’m not sure if this works yet as the list command seems to give them in random order. On my first testing the “ha snapshots list” -command listed the latest snapshot as first, so this command takes the first “slug” in the list and restores it.

shell_command:
  restore_latest: variable=`ha snapshots list | grep 'slug'|cut -f 2 -d ":"|head -1` | ha snapshots restore $variable

EDIT 26.11.2020: The shell_command is not working this way. It does not accept piped commands if I understood the documentation correct. I have tried now following without success so far:

  • Created restore.sh with the same command and running it with shell_command “bash restore.sh”.
  • Tried to create command_line sensor to grep the latest snapshot slug code to be used in automations data which would launch the HA snapshot restore service. I’m unable to get this work either.

So what is important to understand: I’m running exactly same copy of HA on two different instances but with different static IP addresses. This way I’m able to build automations that make it possible to turn on/off services depending on the server the HA is on and depending which server is online etc.

With this methology one can build quite complicated system. I’m on early stages on testing. Fingers crossed!

Edit 27.11.2020: I’ve now given up on the automatic backup restoration on backup server. Reasoning on this post: Update notifications! Core, HACS, Supervisor and Addons

3 Likes

I just read the GitHub readme… it is such a great project, I was looking forward to using but you mention this isn’t for x86 hardware. Maybe I can adapt it, thanks for sharing! :vulcan_salute:

1 Like

While this is an older thread, I do have a high availability solution that works on x86_64 as well as ARM64/AArch64. Note at this time you will have to compile a few packages from source when using ARM64/AArch64.

Like many of you out there, I have recently found myself more and more reliant on Home Assistant. After looking into Home Assistant high availability I found there aren’t really any options supported by Home Assistant.

So I went ahead and built a solution using existing open source high availability components such as DRBD and Pacemaker. Home Assistant can be made highly available fairly easily using existing tools that have been around for well over a decade. These same tools are used in countless enterprise-grade production deployments. Full disclosure: I am also employed by LINBIT, but I’m not here to sell you anything. These tools are just as free to use as Home Assistant and are part of the open source ecosystem. The required software components can easily be installed through our PPA.

I am personally a big fan of running Home Assistant as a Docker container. My blog post walks you through the process of making Home Assistant highly available using containers. With this solution, Home Assistant’s data is mirrored over the network in real-time (by DRBD, think network-based RAID-1) while Pacemaker controls which node is currently active (replicated filesystem mounted, virtual IP active, and Docker instance(s) running, etc). Failover to a secondary node takes mere seconds once the primary goes down.

I hope to make this topic into an ongoing series with more and more content.

3 Likes

@ryan-ronnander , before I read your web site, does this really work if you have antennas, e.g. ZigBee? I have lots of devices on ZigBee and HomeMatic.

While I don’t personally use ZigBee and have a very reliable WiFI network, it should work. Conceptually it would function similarly to how a virtual IP address is passed around in a high availability cluster depending upon which host is currently “active”.

Hi Gentlemen very nice & interesting topic, on HA-HA. My question is the following: Do you think that HAHA could be bone using HA operating system on different physical machines? In the reading I understood that it was done by somebody on 3 raspberry pi, but I did not find many details. Actually I’m running HA on X86 machine (2nd hand thin client computer).
Any feedback will be welcome