I am curious if there are plans to add high availability options to Home Assistant such as the ability to fail over to a second instance?
This is currently not actively pursued.
Well, in some extent with a couple of simple scripts and an external database it would be possible already. Perhaps even with a shared SQLite DB and central location for the configuration if the environment allows it.
Ok, thank you for the update.
I would also like to get this feature to be available natively.
Another who would love to see native support for this!
I think zwave and zigbee will be issues as they don’t do multicontroller. Using a external gateway is still a SPOF.
Perhaps HomeKit allows this?
Actually, planning for zwave is exactly what made me think of failover problems.
If you do have Universal Devices ISY-994i with Z-Wave, what about Virtual Router Redundancy Protocol? VRRP is for high availability. If one Home Assistant instance goes down, another instance of Home Assistant could pick it up. To do VRRP, you would install keepalived in both your two Home Assistant servers.
But what if the master server is still running but the instance of Home Assistant is stopped and the secondary backup server is still running Home Assistant?
Could the backup instance of Home Assistance take over its virtual IP and let it do it’s job until the master server is back up and running? If the master server is running again and the instance of Home Assistant is up, the backup instance of Home Assistant can copy its database of events occurred back to the master so that history can be kept up-to-date.
I think Home Assistant may need a “High Availability” component.
But that is assumed no USB or RS232 devices are attached to the master server. It makes sense to make use of ISY-994i instead. If you have a home theater receiver with RS232, I’d use RJ45 if a receiver wants to notify Home Assistant of states such as volume or source selection. ISY-994i and home theater receiver would talk to a virtual IP address instead of Home Assistant master or backup server directly.
Another question: How is the Home Assistant front-end going to know that the master server is down and backup server has taken over?
And yes, this tweet below is what made me search for “SPOF” in the forum and i came across the thread here.
Well, what do I know when someone started a thread about high availability!
Im also interested in this subject. This could be achieved using USB-over-IP for Zigbee and Zwave connectivity. Of course this wouldnt be a really “high availability” because a RPi with Zigbee/Zwave would still be a single point of failure, but this would be much better than nothing. Right now ive got Hass hosted on a Dell Poweredge R210II machine sitting in my server rack. Home Assistant is controlling all the fire alarms, sirens, etc in my house. The problem here is the fact, that this 24/7 server in a rack with other equipment is also the most fire - vulnerable place in my house.
+1 here. I would also like to see this natively supported. I’m pretty new to home assistant so I was surprised this was not a standard feature by default. If you’re going to run you’re house, garage door and alarm systems on this, would it not make sense to make it redundant? Hardware is prone to break down from time to time.
+1 from me too… just started reasearching this as I had a failure on my home server which means the box is down currently… so all of my HA is dead which is not great when all the lights etc are controlled from it!
All of the applications I manage at work implement ‘high availability’ in one way or another but generally the most effective is an “active, active” scenario where both instances are up and aware of requests. The instances share the request load and and maintain synchronization through a dedicated communications link between the two instances. They maintain journaling in a database but also use local files to store temp transactions/logs/etc in case the database connection fails. In this case the “front door” or central connection is a load balancer of some kind to be the called upon front end by the devices and the interface requests.
Granted, this scenario would be a significant change to the home assistant architecture. I’m sure there are other ways to do this.
I would be interested in this too, but then purely from the technical perspective. Since I migrated to a NUC I haven’t had a single second of downtime (as opposed by running on a PI which needed some love and attention every few days)
The problem with active / active and synchronizing the data is the overhead that it generates. At least for Home Automation speed is a primary concern. I wouldn’t want to wait any longer for a light turning on just because two HASS instances first have to agree on who handle the trigger. So in case of HASS I think a master / slave setup should be preferred for performance reasons.
a duplicate ha instance that isnt running.
with an appdaemon installed on the same machine.
if the original ha version isnt working for whatever reason, the duplicate ha gets started by AD.
the ha config gets regularly backed up to the duplicate system.
it gives the option to maintain the original instance while the backup is running.
as soon as AD detects the original again, the backup is shut down.
i have tested that with 2 RPIs
i will set it up again like that in a short while, but on beebox
Perhaps digging up an old post here, but I’m very interested in this topic.
I think it all depends on what you’ve running on a ‘box’ and ‘what’ fails if a box goes down.
Currently running HASS on QNAP NAS, but like to ‘failover’ to a RPi if possible,
keep in mind that in my setup I run an external mosquitto broker in a different docker on the same machine(s)
What is involved? basically, storage and connectivity.
So, that got me thinking.
- Shared Storage: The NAS is just fine, even though a docker container craps out, the most rudimentary service of the nas will sill run: serving files… if that doesn’t work, my house burned down,
- Connectivity: Preferably I’d like to connect everything through DNS as I have a redundant DNS service running at home.
Assume that my QNAP has x.x.x.1 and RPI has x.x.x.2
In both my DNSs you’d register:
- home.local (for easy GUI access for human interactance) resolving to primary: x.x.x.1 and secondary x.x.x.2
- mqtt.broker (to configure my smart devices mqtt’s broker, currently only tasmota) resolves to x.x.x.1 and x.x.x.2
- on both active instances for mqtt setup, configure mqtt.broker instead of ip address, even if the mqtt container on the same box craps out, you still have a mqtt service running on your network and vice-versa.
What am I overlooking? configuring the storage location/access type/connection string to the HASS database?
In the above I assume that you don’t have any devices I/O connected directly to the HASS box, all communication must be done through IP.
So in High Availability / Failover / Disaster Recovery Management it all depends on how far you want to go…
- basic is HASS / MQTT / Shared Storage for DB
but outside HASS domain:
- DNS (happen to have 2 dns services running in my network)
- Power (my RPi runs on mains-network, but can also failover to battery and solar-panel)
- Location (happen to hape a identical QNAP nas at 2+ km connected through fiber)
- Dual network connectivity (if a network controller, switch or endpoint craps out, NAS has dual network, RPi has not)
So a lot of thoughts… I’m only touching the surfaces of HASS (only 6 weeks experience here), but as it is so powerfull and can control anything, it must be there anytime…
(perhaps you sense a bit of OCD or paranoia in my story above, but in my professional work I need to design architecture on highly automated production systems (ISA95, PLC, etc) but as this platform, with it’s plethora of cheap devices, can do (almost) the same for 1% of the cost, I’m VERY VERY interested, to say the least, that’s why I’m ‘upgrading’ the home with HASS… installed 8 tasmota devices and HASS, just to see what it is, I’ve seen enough, 50+ more devices/sensors on their way from china.)
Anyway, I hope to see this discussion going, as Always On, Always There, Never Fails, just as a hardware switch is very important for (my) WAF (Wife Acceptance Factor)
I’ve been picking away at this a little myself and you’re right, it entirely depends on what you’re running. Towards that, I’ve tried to limit the number of communication protocols used. I am working really hard to limit my primary protocol to MQTT because it can be completely independent of the HA controller. Also, even though MQTT doesn’t natively support load balancing or failover, there are some folks that have developed a basic MQTT server using the protocol that does.
From there, it is entirely possible to have two instances of HA running on two different Pis as long as each share an external recorder database (in my case, using mysql) and one of the two has all of the automations turned off.
I have been running two HAs for a few months so I can see what kind of issues arise and I’m happy to say that very few issues have come up as long as automations are off on the secondary instance.
I haven’t tested it yet but HAProxy should work for handling the URL routing.
My bigger issue is with z-wave. I’m thinking that an external gateway would be the most practical solution and have it run through HAProxy as well but that is just speculative at this point.
That link carries a lot of value added input for me AndrewHoover, thanks!.
question though… does your mysql-db have HA/FailOver?
Well, currently it does only the monitor part of HA systems but it’s a start: https://github.com/nragon/keeper
No, not yet. Honestly, I haven’t gotten that far yet.
I’m glad the link was useful. That project introduced me to HAProxy which was a very useful find.