How to design a Home Assistant architecture for failover?

I use HA for automations (in particular - light automation) so I need a failover solution in case HA crashes or is otherwise unavailable.

First of all: maybe there are suggested / recommended setups for such solutions, I would be grateful for a pointer.

Otherwise my current setup is the following, probably a fairly straightforward one:

  • wall RF433 switches …
  • … which signal is detected by a Sonoff RF Bridge
  • … which sends to a MQTT broker
  • HA listnes to the topic, makes decisions based on automations and
  • … sends a message to MQTT …
  • … which is picked up by Sonoff Basic power switches

There are several points of failures:

  1. the Sonoff RF Bridge

I will have two of them, sending messages to rfbridge-1/... and rfbridge-2/.... I am currently listening (in HA via MQTT) to only one of them, and thinking how to listen to rfbridge-?/…` and ensure the duplicate message is taken into account by HA only once. The solution will probably be to put a delay in the automation after the automation is done.

  1. the MQTT broker

No solution so far. Idead:

  • maybe some clustering solution because all my devices accept only one MQTT broker in the configuration.
  • since this is a zero maintenance system, maybe put in on my EdgeRouter
  1. Home Assistant

No solution so far. A built-in active-passive solution would be ideal so I will probably write a script which connects to HA via websockets and when the connection is lost, starts a new instance somewhere else.

  1. Sonoff Switches (actually the WiFi they are on)

That one is outside the focus of this forum, but there is no soution either. They can hop to another WiFi when loosing one, but they then stay on it and do not try to get back to the first one. I will also probably write a script which reboots them if the primary WiFi is back to life.

Any comments/input/soution is welcome.

How unstable is your HA? Mine never crashes. MQTT runs in docker and never goes down either.

Perhaps you need to look at the hardware you’re running on instead.

It is very stable - runs in a container on my main server.

But shit happens: any component of the server can fail, the WiFi AP can fail - and one thing I would like to avoid is to not have lights (my family would simply kill me, especially than it will certainly happen when I am traveling or something).

Other sensors and whatnot can vanish, but not the lights.

This is why I am planning ahead for failover.

1 Like

Hardware doesn’t necessarily always mean the computer. If your lights depend on WiFi and HA, you have a design flaw. Everything should be controllable physically without software in case of an outage of your software

1 Like

Ultimately it is - the lights are behind Sonoff Basic switches which have a hardware button. So it is in principle possible to switch them on and off but it gets particularly cumbersome when one has to crawl under beds or behind sofas.

I could certainly rewire my house to use KNX, z-wave or something like that but it means a lot of work in the house. This the failover mechanism which should cover most of the needs, when set up properly.

EDIT: I understand your concerns but I am looking for a software failover (active-passive, probably) to account for failures which are not primordial for life (ability to get in /out of a house for instance) but which would be, let’s put it, emotionally costly :slight_smile:

That’s definitely “ultimately” … but think the spirit of the question was is it “practically” controllable and the answer is no. :slight_smile: Therefore I understand why you want to make the HA server so bombproof.

FWIW, when I started automating our home (over a decade ago) I chose the strategy of loosely-coupled systems. Lighting, HVAC, and security run independently of the HA system which only serves to glue them together for synergy (and automation, of course). None of these three pillars (lighting, HVAC, security) depend upon the HA system, or a cloud service, to do their job.

Should the HA system fail (happened once when the server’s power-supply failed after several years of service), the house still gets heated/cooled, the security system still monitors the house, and all lighting can still be manually controlled (and conveniently).