Is it possible to configure several HomeAssistant Docker containers to act as a Docker Swarm?

GigabitGuy · February 20, 2018, 9:19am

Maybe I’ll give it a shot. Would be nice if it was possible to share sensors/inputs across devices in the swarm, making it sort of a integrated HA hive-mind

flamingm0e · February 20, 2018, 2:27pm

physical connections and sensors is impossible.

a Docker swarm is just a bunch of docker hosts that can pass the services around on it, it isn’t x-number of services all running together.

GigabitGuy · February 20, 2018, 2:45pm

Okay I want to understand this correctly, so let’s say that I have 3-4 pi’s around the house, one have a z-wave dongle, the others have some different sensors and so on attached to it. I then turn all of these Pi in to a docker swarm with HA running global (one instant on every pi). Does all these different sensors, from different Pi’s appear to HA like its just one big Pi?

flamingm0e · February 20, 2018, 2:59pm

No. Not at all.

That is not how it works.

A Docker swarm is a collection of docker hosts. FULL STOP.

a service running on a swarm can migrate from host to host.

You cannot have 3-4 Pis all acting as one big pi with different configurations and sensors on each one. You can install home assistant on each one, configure MQTT Eventstream and use a ‘main pi’ to get all the data to one Home Assistant instance. This configuration has nothing to do with Docker Swarm.

GigabitGuy · February 20, 2018, 3:05pm

Okay, thanks for the reply, that was also my understanding sadly.

What I’m getting at here is just if it would most sense to have HA within or external to the swarm, and external seems like the way to go here (for me).

TheAK · February 20, 2018, 3:30pm

You could always run a separate pi with Hass, have it act as a zwave/physical interfacing bridge and connect to it from the swarm-hosted main instance through MQTT.

pietervanw · November 21, 2018, 3:50pm

Is anybody still on this? I’m reviving a reasonably old topic it seems, but I’m also actively working on this. I’m looking for knowledge and experience from others which have deployed in Docker Swarm, and I might have some experience that I can share :).

My main motivation is increasing the number of available resources (because I do believe that it’ll be difficult for a single Raspberry Pi to run HA + a lot of add-ons), but I’d like to increase the availability (reliability, uptime) of the system as well. Basically I want to build the system such that the probability of the system failing is as low as possible. This is ideally done at both the hardware level (running on a single Raspberry Pi is not a redundant architecture), as well as the software level (easy config, rollbacks in case of failures etc. etc.)

So… is this still on top of anybody else’s agenda?

quasar66 · November 21, 2018, 3:58pm

I’m running this for the last couple of months. I have a couple of PI with read-only filesystem with zwave sticks (and other serial adapters) that are exposed on an internal network as a tcp server with ser2net

The home assistant docker runs socat to connect to the “slave” zwave and create a serial device inside the dock command. The configuration is stored in git, with a caching proxy. When a home assistant container starts it does a git pull, runs ser2net to have the zwave device locally then runs hass…

This has been running on a 3-node docker swarm without any issues. For example if i re-plug the zwave stick then ser2net is restarted, socat is restarted and home assistant is restarted, in this order…

For the zwave stick you can use something like a MR2030 ( very small ddwrt / openwrt router) with ser2net on it, or a pi or anything else…

I’m running now:

3 zwave sticks on read-only pi
1 heating container using homegear and a max cube device, writing to mqtt
1 mqtt container
3 home-assistant containers that translate the zwave data to mqtt
8 home-assistant containers with various stuff (tv, multimedia, presence detection); basically one purpose for each instance, and they write to mqtt
1 node-red for core data transformation from mqtt to mqtt
1 node-red for automations
1 “main” home assistant that reads and write to mqtt

quasar66 · November 21, 2018, 4:03pm

Docker container build files with home assistant + socat (but no git, that’s private)

quasar66 · November 21, 2018, 4:05pm

For zwave devices you can run 2 sticks on the same network - second stick won’t be able to include/exclude and some other stuff, but at least some controll / monitoring will work even if first fails… I haven’t done that since it seems a bit overkill… maybe in the future…

pietervanw · November 21, 2018, 4:10pm

That sounds great! Do I understand that you have a single z-wave stick in on one device, and then use ser2net to ‘mount’ that serial device to another physical device of your choosing, depending on where the docker container runs that needs that device?

I don’t actually have a Z-wave network, but I’m very interested in this part - mapping physical devices on one host to any other (docker) host that you have. Did you also, for example, setup redundant storage to have docker volumes (where state of all the services is persisted, if configured correctly) redundant as well? And a redundant / distributed database that stores the home assistant data?

I’m starting now with the redundant docker volumes (trying Minio first, then switching to a NAS if that’s not performant enough.

quasar66 · November 21, 2018, 4:29pm

I use 3 nodes that are “equal”

in docker all are managers and workers
storage is mirrored between the 3 of them by using glusterfs
all devices are either from mqtt or use another type of network detection, so it does not matter where the device is
mysql as db storage was setup as master/master to run on 2 different nodes, with the 3rd acting as a backup every hour (and upload encrypted backup to remote server)

Any of the 3 nodes can fail, and almost everything can run on one server if needed. I have a big ups on the 3 nodes ( j1900 quad-core nodes with ssd, about 10w / hour power usage), and if the ups goes under 2 hours runtime one node will shutdown to keep as much time as possible. when power goes under 30 minutes all non-security stuff is stopped and when power goes under 10 minutes all security stuff moves to 2 raspberry pi that have a 48 hour battery.

If you want to use docker and gluster you need to do this

setup docker and the swarm
disable docker auto start
stop all docker nodes
setup glusterfs
create a cronjob that check if glusterfs is mounted and running - if yes, start docker, if not - stop docker

If you start docker before gluster, bad things will generally happen…

I installed everything on a basic ubuntu18, on celeron cpus (as I said above), with ssd and 16 gb of ram each and 2 intel network cards (one for storage/inter-cluster), one for the rest of the network. They use about 15w in “idle”, 30w full load…

I tried PI’s but when you add up the pi, the power source and everything else needed to make it stable, power on / shutdown automatically it’s not much cheaper than an asrock j1900 board (or similar) but more complicated… Now the PI’s are running as various radios serial to network adapters and bluetooth scanners (and some other small, read-only jobs).

pietervanw · November 21, 2018, 7:44pm

Thanks once again, great advice. I’m not familiar with the Asrock boards, but you’re right, they’re not that much more expensive than the Pi’s… However I do already have an RPi cluster with 4 nodes running, so I’ll try that first and will see how it behaves. Will also check out glusterfs. I can imagine that Pi’s don’t work well if you want to add a UPS underneath, but I haven’t started looking at the power supply yet.

Good to know at least that it works! I’m also interested in in any deployment scripting (docker compose files, ansible playbooks, whatever you’ve been using) that you might be able to share?

quasar66 · November 21, 2018, 8:45pm

Most of my scripts are highly integrated, I’ll see what I can share… Most of them are pretty simple bash scripts.

My docker swarm uses portainer as the main interface, and I have a personal git repository so I just schedule pulling from there automatically. That takes care of rollbacks too… I’m in the middle of doing some automated testing once a new set of git versions are pushed so I can save the whole list of versions and if no action is taken in an amount of time the versions are rolled back until latest stable version. Having stuff split into the smallest possible chunks and a bit of creative coding does ensure that if something breaks in only affects a minimal part of the system…

For example this is to check if something is mounted and start/stop docker service. Instead of start/stop you could consider putting the node in drain mode.

mount="/mnt/shared"
mounttype="/mnt/shared fuse.glusterfs rw"

if grep -qs "$mount" /proc/mounts; then
  echo "It's mounted."
else
  echo "It's not mounted."
  mount "$mount"
  if [ $? -eq 0 ]; then
   echo "Mount success!"
  else
   echo "Something went wrong with the mount..."
  fi
fi

mountok=0

if grep -qs "$mounttype" /proc/mounts; then
  mountok=1
fi


if [ "$mountok" -eq 1 ]; then
  echo "Making sure docker is started"
  if pgrep -x "dockerd" > /dev/null; then
    echo ".. running"
  else
    echo ".. starting .."
    systemctl start docker
    echo ".. done"
  fi
else
  echo "Making sure docker is stopped"
  if pgrep -x "dockerd" > /dev/null; then
    echo ".. running .."
    systemctl stop docker
    echo ".. done"
  else
    echo ".. stopped"
  fi

fi

I will take a look and try to make a github repository with the scripts.

If you start with the pi’s and need more power, you can always add x64 nodes and do a gradual roll-out…

As a side-note, be careful of brain-splits, if a swarm node seems unable to rejoin the swarm, leave it for a couple of hours (4-6) before trying to manually fix it… most of the time it fixes by itself and if you do anything, the whole cluster will be broken…

I’m not completely satisfied with docker, when the swarm breaks (without touching it at all for weeks at a time) you kinda need to re-build it. So the best thing you can do is to have a script to bootstrap all the stacks / containers / services / etc, otherwise you will end up very angry one late night (because it never breaks when you have time to fix it) - being able to destroy the swarm, create it again, join the other nodes and run the bash/perl/whatever script to setup everying is worth its weight in platinum, even if it’s tedious to do at first… by the time the 3rd time the swarm craps itself you’ll thank yourself

I’m considering trying to do another small cluster as a proof of concept, but I’ll probably use banana pi’s (or other versions) with a sata controller for better speeds and reliability - I would not recommend keeping too big of a database on the sd cards…

quasar66 · November 21, 2018, 8:48pm

For glusterfs this is a good tutorial - just use a folder instead of a brick device

When i had a pi cluster i used good usb drives for glusterfs, and if you can afford to get a 5th pi.

If you don’t have an UPS or any other kind of battery for running the pi’s, the sd cards will become corrupt more easily - even a good 2 amp phone battery that does not use a button to turn on might be enough to save your files a couple of times… I’ve had good luck with samsung 10.000mah batteries and anker ones so far…

cvb941 · July 26, 2019, 3:00pm

Hi everyone ,

Building on quasar66’s description of his setup, I have created a similiar solution to his, which is setup easily using Ansible playbooks. It runs on Docker swarm and creates a stack running Home Assistant, MariaDB in a Galera cluster and Mosquitto broker.

The project is called HAHA - Highly Available Home Assistant, you can check it out in this thread: HAHA - Highly Available Home Assistant

Thanks to quasar66, I hope the project helps you out!

DanDan · May 1, 2021, 1:21pm

Thank you for the description of your design @quasar66 . Can you please explain how does “all devices are either from mqtt or use another type of network detection, so it does not matter where the device is” work? If a Z-Wave based sensor is connected to node n₁ and n₁ fails. How would you get events from the sensor?

TheAK · May 1, 2021, 1:41pm

You wouldn’t. What they mean is that the devices in use are controlled over the network, and as such can be accessed from any node, in contrast to a Z-Wave stick or other directly connected which can only be attached to a single host.

DanDan · May 1, 2021, 1:51pm

devices = the computers HA runs on? or the Z-Wave devices?

comet424 · January 10, 2022, 4:56pm

hi i was wondering if any of you guys had this working as a non Docker… but running OS version of home assistant… i asked for a feature request
https://community.home-assistant.io/t/home-assistant-cluster-feature-request/377975/4

as i run HA Supervisid OS in a VM under unraid

my current solution is i set it up… shut it down copy to my 2nd unraid box… power up the VM change my name to backupHome… and my main is HomeAssistant

so i have a failover…

have you guys been able to tackle it in the OS you create a dashboard and it instantly updates the dashboards on the backup home assistants? does that docker thing work there for the OS version?

so i can have like 3 identical KVM VM OS version running… as the Docker of HA in unraid not as good as the VM