Redundancy

I read old threads about that and who proposed this request “half” solved the problem.
In these years HA grew a lot and many devices are controlled by HA.
The need to have HA always online has grown too.
Last week I fought with my wife because it wasn’t possible to turn any light on.
Why?
Because 2021.10.0 update was in progress.
Do I have to make future updates during the night? I hope no. :joy:
I know 2 sessions of HA are possible but there should be a sort of link between them.
One “baptized” MASTER and the other one STANDBY.
In the STANDBY machine you set the IP of the MASTER; If it listens MASTER is online it sleeps (or at least autimations are deactivated).
When it doesn’t listen MASTER any more (or MASTER sends a shutdown/reboot event) it takes the control until MATER comes online back.
I know, in my case, 2 sessions are easy to be built up. My HA runs in a virtual machine but I think it could be very useful for lots of people.

You should update during day time to avoid a fight with your wife about the lights :wink:
But how do you handle zigbee, zwave, etc devices? Do we need a robot arm to unplug the USB stick from your MASTER and plug it into STANDBY?

2 Likes

That is simple, put the Zigbee and Z-wave stick in another pc, and use mqtt to talk to HA.

I just don’t see the requirement for a MASTER and a STANDBY that are interconnected. My HA instances work fine without such link. If they are aware of each other, it is not really a standby, is it ?

Sure, but what if OP needs to update that PC? He still ends up fighting his wife.

That is why I don’t use Z-wave, and use an Ethernet based Zigbee coordinator :slight_smile:

It really doesn’t matter zigbee zwave or whatever. not about lights too.
The point is, that updating HA renders into system outage. Afaik it took hours in case of some versions this year.

The motivation is to provide high availability for the home automation system. IMO it’s fair need and justified requirement.

1 Like

That is why I don’t update my HA instances all together., but one at the time. Another instance can take over at that point.

Can someone point to any useful information on how to setup and or maintain 2 versions / installs of HA so as to make life a little less painful when doing things like updates.

I do tend to agree with the OP as we run some car manufacturer platforms and they will download latest version and install latest version while still using current version and there is literally a SW toggle switch that allows us to choose which version we want to run.

That way there is no downtime during potentially large downloads and updates but if it is desiccated there is an issue / bug in latest version we can simply revert back.

I read so many post about running two version and post like OP’s and can’t help but think most of these issue would be null and void if HA could be setup similar to the above example.

3 Likes

I highly rely on the operation of HA and to be working 100% of the time. I currently use home assistant to automate my indoor garden. This controls everything from Lights, temperature, humidity, cloner machines etc. My automation server crashed yesterday when I went out. I didn’t realize until the next morning and by that time it was too late. My cloner machine I have automated by HA and it turns on the water pump for 45 sec every 3 mins (timeouts adjustable through HA). The server had crashed while the cloner machine was off basically killing off all the plants. So a suggestion to implement server redundancy using a service like keepalived using a floating IP that allow a back up server to take over if the main server no longer responds. This would also help during system upgrades as you would then be able to upgrade the main system while the back up takes over. I am actually quite surprised, for such a fully fledged software that this is not already available in HA. If anyone has any solution it would be greatly appreciated.

1 Like

Maybe check this out:

Someone has a project they are currently working on in terms of high-availability (redundancy) of HA running RPi’s cluster. Some interesting work here and would be nice to see if this could be baked into HA OS natively.

I’m the developer of this redundancy project.
I did not update the related thread, but my project has eventually progressed and I have made the first prototype with a real PCB.
At this time it is capable of failing over a ZigBee coordinator and the I2C bus.
If someone is interested and can cooperate with the related software devlopment (Python) I can share the schematics and the work I have done so far.
Here below a picture of the real thing in action

5 Likes

Wait - This is interesting. Could you help elaborate on this part?

I would be interested in assisting with this if possible. Please send me what you have so far and I will work on getting the required equipment.

My board features a CC2652P coordinator which can be logically connected to either Raspberry A or B depending on which node Home Assistant is currently running on.

My cluster is based on the Pacemaker/Corosync stack. Even if it is working fine and has proven to be stable it is missing a real resource agent specifically designed for the Home Assistant service.
This is explained here

https://github.com/ClusterLabs/resource-agents/blob/a5f40b4c3ed3d8b2ff24692a10a2190ed7adcce6/doc/dev-guides/ra-dev-guide.asc

This agent is needed by Pacemaker to know the real status of the Home Assistant service and, based on this, take a decision to initiate a failover to the other node(s).

In my setup this is currently handeld as a regular service.

As soon as I can I will post the updated procedure to build a 2-nodes cluster an my github repo, adapted for the latest 64 bit Raspberry OS.

1 Like

I’ve just gone through the expense of purchasing a full set of redundant hardware (Pi4, DeskPi Pro case, SSD, and Zigbee coordinator), in the event that I have some critical hardware failure my system should only be down for some hours rather than days.

This setup looked promising, especially the Zigbee coordinator failover! Has there been any update?

You can find some more information, in my blog https://csrlabs.io/
I will be posting other articles about my project when I have some spare time.
If you have any specific question I will be glad to respond here.

I have an old small form factor PC powered off in the rack beside my main server. It has an old HA image ready to be upgraded and have the latest backup restored to it.

If the main server hardware fails for whatever reason it’s a matter of 10-20 minutes at most to be up and running again. My zigbee coordinator is connected via serial over Ethernet so that does not even have to be moved.

Everything in the house can be operated manually, lights, alarm, irrigation, media players, etc… during this time. All that is missing is automation. Which while nice, isn’t really “life support critical” in nature.

3 Likes

I also have an approach to making Home Assistant highly available with Pacemaker and Corosync. My solution differs slightly by using DRBD to synchronously replicate Home Assistant’s data. I also prefer running Home Assistant in a Docker Container, but this general approach to High Availability doesn’t necessarily need to use Docker.