High Availability or getting rid of the most critical SPOF

The one question that prevents me from sleeping really well is what happens if my Raspberry or it’s NVMe (greetings to @frenck@fosstodon.org) fails.

I would love to have a second server running in passive mode, silently synchronizing with the active server - i would not mind loosing a few minutes of stats as long as the last config change is copied asap.
Then if the active server goes down the passive one takes over.

Basically it would behave like continuous backup with almost instant restore.

In a WiFi/Thread environment no further action would be needed.
I don’t know Zigbee or Z-wave well enough so I don’t know if you could convince an Ethernet connected coordinator to transparently communicate with another server or what it would take to do the trick.

I think you can use Proxmox to do this.

2 Likes

Some WiFi devices still only low one control connection, but if your devices allow for more, then there should be nothibg holding you back from achieving your wish.

The issue is generally third party stuff, like Z-wave and Zigbee networks only allowing one coordinator or a web weather service banning your IP if too many requests are made from it.

Of course you will also have toake your own system for failover detection. There are many methods, so making a general one is not really possible.

1 Like

Also if you set up your home automation system such that everything can still be controlled manually and you have automated storing your backups off system then during the 30 minutes (maximum) it takes to restore a backup on new hardware, it isn’t really an issue.

hehe thanks for the shout out :rofl:

Although, that machine wasn’t running my Home Assistant. I use Proxmox for my development environments (which gives me the ability to spin up machines for testing quickly as well).

About HAHA (no joke™) I personally always refer to an old blog article of mine:

1 Like

Oh yes a big problem there is. I spend most of my time in spain, 1400 Km away from our home that is inhabited by 3 persons.

1 Like

I spend 15 months at a time 5000km from home and my house sitters are often luddites.

If my HA server fails they can still manually operate the lights, blinds, heating/cooling and lock and unlock the doors.

The only thing that does not work is the automations.

Not sure how long it would take me to remotely spin up a VM on another machine and restore a backup but it wouldn’t be more than an hour.

My zigbee coordinator is PoE network connected so that is no issue.

Completely agree with the KIS principle, so dedicated Pi with HA OS, no fancy VLANS, only single SSID

Despite all the other shortcomings, Apple managed to implement high availability transparently by allowing for multiple hubs and by using Thread with multiple border routers, so from my perspective the upgrade in flexibility and functionality by migrating to HA also brought me a downgrade in availability.

I explicitly don’t say reliability as HomeKit while having no spof still had an endless repository of means to fail me with “not responding” for no apparent reason or automations or scenes not executing correctly because of ghost devices (deleted devices still being referenced and leading to time-outs).

In the Open Home perspective I am also convinced that for the smart home to become really mainstream it mustn’t need an expert ready to jump in 24/7 and do his magic in order to get everything up and running again.

I used to have KNX for several years and it still is my reference as far as robustness is concerned. The comparison is unfair, I know, as KNX uses a wired bus and relies heavily on device to device communication without needing a server for basic operation. This leads me to Matter over Thread, do you happen to know if Matter could allow Home assistant to one day configure direct device to device comms or does it have to be built-in like EVE’s Thermo with their room controller?

1 Like

How do you manually control Zigbee devices when HA is down?

I like the KISS concept here, which in reality is not using home automation at all, simply walking to the blind and pulling it up :slight_smile:

But on the other hand I would rather stand in front of the CIO of a bank when 2 million customers can not login into my internet banking software than in front of my wife when there is no hot water or heating in the house… :slight_smile: So which is more important enterprise or home? :slight_smile:

To be serious: I have a normal low-cost and low-power x86 PC with enterprise SSDs, nothing fancy, around 8 years old eating around 10W. HA running in LXC container. Remote backup daily. And here comes my HAHA: I have an identical machine which is turned off, but I can wake it up via WOL, and fire up all my containers in minutes from backup on it. That is my lab server also if I want to test or hack something.

3 Likes

Using the push button the light dimmer or switch is connected to. Physical remote for the blinds. Keypad for locks. Wall mounted thermostat controls for climate.

HAHA is something that I kind of miss as well.

High Availability or “business continuity” is not the same as having backups, as I’m sure everyone here is aware of.

I do agree that having made backups and having some preparation done, is a good solution for people that know what they’re doing. I can set-up something like that myself (and will in the coming year).

But there are some other considerations:
Home Assistant is breaking through to some spheres of population which are very new to it. At least that’s my feeling. I’m sure HA people here will be able to confirm or debunk this feeling of mine.
Additionally, I feel that lots of these people are just a bit less technical, since they don’t have to be anymore. More and more things are more and more user-friendly. And can be done through GUI and such, without having to be able to edit yaml files and such.

At the same time these people will rely more and more on HA. Some (most?) of them won’t necessarily have manual backups available for their devices if HA goes offline. And only when that happens for the 1st time may they even start thinking that it would be a good idea if there was a backup stored somewhere.
Now, imagine if these people are the same ones that have purchased HA Green, for example. Again, I’m sure you’re familiar with HA Green marketing approach: “It’s plug-and-play”. Very assurance-inducing statements there.

It would be soooo much better if there was a way to buy a second HA Green box and set it up as a passive redundant pair for the 1st one. Have it running, but not do anything apart from being a recipient of daily (hourly) backups. When primary Green goes offline, the second one takes over the MAC and IP address and goes into active mode and keeps on serving the home.

Of course there are some potential obstacles (Z-Wave being maybe the most obvious, but even that could be bypassed, I believe). But putting some effort in additional stability of home automation, which we all love and increasingly rely on, would be inspiring to a lot of people who are looking into it.

Actually, one of my peeves here is that this sentiment has been expressed a few times and the feeling that I get is from responses: “you don’t know that you don’t actually need this”. That gives me a feeling that I’m not used to in this community. It gives me a feeling of being patronized. And I don’t like it here. It’s mostly absent in other areas, though :slight_smile:

Don’t want to be too harsh. Product is great and I love it to bits. Community is awesome and developers and owners are kind of demi-gods in my opinion.

3 Likes

If Home Assistant primary fails the secondary can simply take over.

But also during the installation you can select that this is the secondary or … and that it then makes a clone of the original. and will continue to do so in the future

I am still quite unfamiliar with the exact operation of Zigbee and Zwave or whether this is feasible.
Wifi devices seem feasible to me possibly with a VIP.

1 Like

Zigbee and Z-wave have a single point of failure in the “bridge” between the networks.
WiFi is a bit more tricky, because there really is no control protocol for WiFi, only a communication protocol.
That means some WiFi devices can handle multiple control connections, while others can only handle one.
Multiple does not always mean many here.
Some WiFi devices can only handle a limit number, like 2 and one of them might go to a cloud service or to an user device running the vendor’s app, which can limit the number of connections available to HA to just one.

Matter is maybe the only protocol really designed for this, but the implementation of the multi-admin feature is still a long way from being free of problems. Maybe in some (not so) distant future.

I’m running Home Assistant high available-ish using kubernetes and k3s on 3 small mini-PCs.
This works but only 1 replica of Home Assistant can run at a time. The config directory is handled by Longhorn storage with auto snapshots and backups.

Zigbee is fixed by running Zigbee2MQTT in k3s as well + a separate UZG-1 PoE Zigbee adapter.

Not sure if Z-wave has anything similar with network-attached adapters since don’t use Z-wave.

But for Home Assistant to be “truly” high available it would need to be able to run with multiple replicas running at the same time. The most simple way would be have Home Assistant communicate and agree that one is “master” and the other(s) backup. When communication with the master is lost a new master is negotiated. (Kina what you described)

This is all very/kinda complicated and overkill for most users. But a simple “HA” setup would be neat! Since Aqaras M3 hub can have a secondary shadow backup in case the main fails and UniFI recently made their routers support shadow backup failover too.

Tl;dr: I think it would be amazing, but sadly don’t see it happening since i feel like 99% of users would not use this and focusing on UI/UX improvements would benefit everyone more.

1 Like

In my opinion the best way to think on this topic is imagining what happens if the system dies and you too. Will the family suffer (ex. no heating) without HA or not. If yes then these are the ways to go:

  1. Make HA as fail proof as possible. Hardware and software wise too. And write documentation about your system, so someone in the family with IT knowledge or a family friend, colleague can solve the problem. That is the way I’m doing it within viable limits. For example I have a dashboard with views with panelized markdown cards where I write the documention of the building and print it out sometimes and put it into a sleve with other important things just in case. The documentation is important for me too, as I can not remember everything. For example when have I replaced the anode in the hot water tank, or which one of the 17 ethernet relays I’m using to control the heating is the one which opens the thermoelectic valve in that particular room.

  2. Make everything manually conrollable as tom_l suggested. I think in some cases this is much more work and money than making Home Assistant somewhat highly available. Think on puting smart thermostats into every room instead of using virtual ones in Home Assistant. And you will lose some flexibility too with this approach.

  3. Go with industrial grade building automation systems like Loxone, Bosch, Siemens etc. They have certified technicians all over the world, so your family can hire someone to solve the problem. I would recommend this to everyone, but these systems are much more limited than HA is, not very open and cost a fortune.

If your use case not falls into the above scenario then you can just run HA on a Raspberry Pi and SD card.

In a few years HA has gone from being something mildly useful and interesting to tinker with to an integral part of many houses controlling lights, camera’s, heating & hot water, batteries … the list is endless and the hardware failing is a real risk.

I have HAOS set up on x86 bare metal (N100) and have another N100 PC with HA installed in reserve should the operating PC fail. Have tried running both together but seem to get conflicts and strange behaviour even after runnng HA CORE STOP on the backup computer and ensuring different IP addresses are assigned by the router.

Would be great to have a setting on one instance so it just sits there, waiting to be bought into action if needed but mirroring configuration of the running instance. My DB is on a NAS but appreciate some would need their DB mirroring as well.

Have seen various instructions on attempting this using VM’s or docker but like the simplicity and ease of running core on bare metal.

I have never used it, and I am no fan of Docker, but look at Docker Swarm. I think this does what you want.

My home has over 100 sensors, lights, switches, etc. We would sorely miss Home Assistant should something go down. My Home Assistant is running HAOS bare-metal on an Intel NUC i3 and I do nightly backups on a NAS folder. So getting Home Assistant back online with a replacement NUC would only take an hour. That’s tolerable here. BTW, the NUC has been running consensually for more than two years, so this timeline hasn’t been tested.

Don’t forget, the vast majority of Home Assistant users run on Raspberry Pis. I think the audience for redundant hardware would be rather limited and probably not worth the effort to develop and support.

This has been brought up and the issue is hardware not software

Sensors can connect to multiple HA hosts
Cameras can connect to multiple HA hosts
Everything on HA can be mirrored to another host and run concurrently

The limitation is hardware

You cannot have

Maybe this is possible now.
Zwave allows multiple controllers so if you have 2 HA hosts I think you could attach a second stick to HA host 2 and if HA host 1 goes down it would take over

I don’t know about Zigbee.

Reverse proxy can be set to failover when HA Host 1 is down so I believe this is already possible.

So really all you need to do is bring up second host with a recent backup and adjust for hardware differences and all would work

You can use folder watcher or a cron job with rsync to match configs

It is not enough to just get the sensor data to each of the HA servers.
You need to design your HA setup to also be prepared for it.
That means no relative actions and a way to stop runaway actions.
And of course you need to have a way or multiple ways to detect a server being offline.

Relative actions can quickly go haywire, if the HA servers get out of sync and coming home to a house where the music have played at max volume all weekend and the heating have also been on max is not that fun.

Absolute actions will not go that wild, but it can quickly drain a battery powered device if it gets contradicting actions every second.

Sometimes the built-in functions of HA have to be controlled and reined in too with automations.