I’ve finally gotten around to writing up my High-Availability Home-Assistant setup. A reasonably detailed description is available on my wiki and I have my config up in Github.
Create a sensor which indicates which instance is active
Enable/Disable automations based on which instance is active
There are probably a lot of edge cases and integrations which this wont work well with, but I’ve not had any issues with my modest system.
With a little modification this could be made to do load-balancing as well; ie half your automations run on one instance, and the other half on the other instance, with complete fail-over if an instance goes offline.
Interested to hear how other people have gone about doing this, or if you have any suggestions.
However, I think that this is - as you even wrote yourself - is not applicable to a lot of the set ups here.
Many use ZigBee or ZWave networks and other external devices.
Another issue would be with addons like Node-RED.
But I like the progress and ideas you and others come up with!
Definitely worth looking into as a Home Assistant outage can be pretty bad if your devices are only controllable through Home Assistant.
If that’s the case then that system is poorly designed. At a minimum every “critical” device (lights, outlets, HVAC) needs to be able to still be manually controlled if your HA is down.
Provided you have some method of replicating state data read from ZigBee/ZWave across to the other instance, and similarly injecting commands from each instance back into the ZigBee/ZWave network then this shouldn’t be insurmountable.
Additionally, Node-RED should be able to use a similar arrangement of tracking the state of a sensor to know whether its automations should fire or not. It would require adding this condition to each Node-RED script though, which could be tedious. Mind you, to get this far you’ve already had to create a whole separate instance of HA, so maybe not such a big deal.
I less involved, but also less responsive method could be to start/stop HA. This avoids needing to suppress automations on the standby instance, but does have significant hand-over latency.