I like Frenck’s comments about “enterprise level HA”, about the merits of keeping it simple.
As an IT consultant, I’ve often advised clients that all the additional complexity in attempting to achieve HA (high availability) can make a system MORE brittle, because it introduces new failure modes.
Finally i see someone considering death, and have literaly identical solution with me.
Documentation, high availability, making everything manually controllable
High availability is the one missing piece because I didn’t want to spin up >= 2 Proxmox servers since they’re not lightweight. But now, I’m seriously considering adding more hardware because I don’t see any other solution that’s as reliable after all this time exploring.
Hey all, I am using a couple of older zwave door keypad locks to activate a dry contact on an electric gate. This requires the HA server to be online and functional (locally). Is there a way to add a backup HA server with zwave to take over in the event of a primary server fail?
Obviously this is not just “something nice to have”. This is a critical function for us and we need the HA to be online no matter what.
(HAOS standalone running on a 2013 Dell 6430 laptop).
Can two HA instances run on the same network? Can they share a nabu.casa account?
Additional: If automatic HAOS failure detection isn’t possible, I wouldn’t mind if I had to manually start up the secondary server.
I am just looking for recommendations to remotely access the secondary server and quickly restore a backup and get it online.
Is there a “paused” HA mode that doesn’t do anything until activated, but is still active enough to be available via Nabu.casa.
Does Nabu.casa support more than one HA server per account?
Your first issue might be Z-wave.
I am not certain with Z-wave, but Zigbee for sure only support one coordinator USB stick and if that is the same with Z-wave, then you need to manually move the USB stick anyway.
Z-Wave specification does support have a
Secondary Z-Wave Controller (a.k.a controller slave) on the same Z-Wave network, but it does not have out-of-the-box support for automatic failover so something needs to tell the secondary Z-Wave controller to promote itself into the primary Z-Wave controlle (a.k.a controller master). There are other existing Z-Wave controllers/hubs that feature this and failover, search for more info:
As for Zigbee it could be solved by using an RCP radio design with ”Zigbee on host” (a.k.a. host-side Zigbee stack) that works similar to how OpenThread RCP works where the IoT protocol stack can run on a remote host computer with a more or less dumb radio, and both ZHA and Zigbee2MQTT developers are actually working on that independently of each other, see:
And also demote the first controller, which often means you need to sign into the ecosystem from the vendor of the controller to get it to work somewhat reliable.
I think you misunderstood the RCP design.
The design is to spread out the processing tasks over multiple CPUs, in order to assign CPUs most suited to the processing task at hand.
This will not remove the single point of failure or the single coordinator limitation and moving the Zigbee stack will not help, because that is where the issue is.
The network attached Zigbee and Z-Wave radios are often the best solution if they are running independently. They will not remove the single point of failure or the single coordinator limitation, but running it on a simpler setup with no other purpose often means the hardware can be better optimized against failure.
Of course this requires that the device is actually one that is supported and not some cheap Chinese junk from a vendor that only exist a couple of months, before they close up and change name.
No you misunderstand as the ”Zigbee on Host” concept means that you could have multiple RCP radios in standby ready to take over that task of being used as the radio. So while you can only use only one radio at any time it at least allow you to easily move to use a different physical radio adapter since the whole Zigbee stack in running on the host computer and not on the adapter. Aain see:
I understand that you can switch the radio chip out this way, but all the limitations are still in the Zigbee stack and that is still locked to only one device, your host.
It is only possible to switch radio, if the actual radio chip dies and the host keeps running stable.
That means the HA can not crash and that is the most likely to crash.
Yes but i only said that Zigbee and Z-Wave limitations can be worked around. You still need Home Assistant itself to feature a high-availability feature as well, hence the original feature request is still valid. I was only clearifying that should not see Zigbee and Z-Wave as showstoppers, so no need to get hung up on those.
The Zigbee limitations are still present for users.
That project will not change the limitations for users, but for device driver developers it might change some.
It is meant to change Zigbee limitations for users, so I don’t think you understand how it will work.
puddly is working on “ziggurat” Zigbee stack meant to work in combination with “zigpy-spinel” to achieve “Zigbee on host” which means nothing unique is stored on the physicial Zigbee radio adapter.
The concept is basically that the user will flash an OpenThread RCP firmware on the physicial Zigbee radio adapter and that will send raw data to the dumb radio via OpenThread’s standard Spinel and HDLC protocols while all your unique Zigbee network data and security is stored on your host computer running ziggurat and zigpy.
"This project aims to replace the functionality provided by existing radio adapters running Zigbee firmware and move all processing to the host, eliminating practically all limitations imposed by microcontroller-based Zigbee stacks.
Existing Zigbee applications (i.e. ZHA, Z2M, and OpenHAB) would implement a new radio type and communicate with the Ziggurat server over TCP, a UNIX socket, or possibly a virtual serial port, using a high-level wire protocol similar to that of existing Zigbee stacks.
Ziggurat communicates with a 802.15.4 radio hardware over the OpenThread Spinel serial protocol. We currently use OpenThread RCP firmware to just send & receive packets and automatically send 802.15.4 ACKs. The stack handles all encryption, decryption, and processing, treating the radio hardware as just an 802.15.4 frontend. We aim to use with OpenThread RCP firmware for the foreseeable future, as it provides a uniform and hardware-agnostic 802.15.4 frontend that theoretically runs on chips from every major vendor and eliminates the need to use multiple firmwares when switching between Zigbee and Thread applications.
PS: Again, Nerivec is independently also working on a similar concept for Zigbee2MQTT:
PPS: Since this new Ziggurat stack will also support TCP/IP connection it will also be able to work similar to the old Z-Wave over IP Gateway (Z/IP Gateway) concept too:
I do understand how it works, but it does not solve the limitation and single point of failure of the Zigbee protocol.
The single point of failure is the limitation to only use one coordinator and that part is made in the design of the protocol, so you can not get around it by splitting up the Zigbee stack.
Splitting up the Zigbee stack can improve some efficiency over the microcontrollers, which is the limitations that is being talked about.
Moving part of the Zigbee stack to the HA hardware will however also mean it is running on hardware that is more complex and therefore more prone to failure.
While you are right that can only have one physical “Zigbee Coordinator” adapter connected at any one time you still misunderstand this new architecture concept and how it has workarounds for that limitation using these exerimental stacks.
You see with this “Zigbee on host” architecture the ziggurat is the “Zigbee Coordinator” and it will be able run on any computer or host on your local LAN network since it is runt like any software application and it will have a built-in TCP server so you can connect to it over IP. That means that you could put the ziggurat in some kind of high-availabilty solution.
That ziggurat can then in turn connect communicates with a “dumb” 802.15.4 radio hardware over the OpenThread Spinel serial protocol, which can also be done over TCP/IP, and since nothing unique is stored on the “dumb” 802.15.4 radio hardware you can have multiple such “dumb” 802.15.4 hardware radios on your local LAN network them and make ziggurat automatically fail-over to the next radio if one fails, but yes you will only be able to use one at any one moment in time, but as long as you have redundant hardware can it can failover without manual recovery, so will be robust enough IMHO.
The architecture will be able to look something like this
ZHA ↔ zigpy-spinel ↔ ziggurat ↔ any “dumb” 802.15.4 radio hardware ↔ Zigbee devices
So in practice it means that if developers implement all of that then the user will be able to simply buy several network-attached 802.15.4 radio to Ethernet adapters and flash those with OpenThread RCP radios which ziggurat (as the software “Zigbee Coordinator”) will be able to use as its “dumb” radios. As example hardware that users will be able to use those from TubeZB or SMLIGHT which use ESPHome firmware on ESPHome to enable serial-over-network connection to the 802.15.4 radio.
Again, both ZHA and Zigbee2MQTT developers are now working on solutions for that architecture.
If I’m not mistaken, you can have a fresh install of haos running and waiting for a backup to be installed.
You might have to update the haos on the backup device from time to time, but I think that is a solution. It will recover the backup and install current updates.
Also, if you have a hypervisor (proxmox, virtual box), you can easily start a fresh haos on that and get the backup recovered there, swap ip addresses… Done?
By the way, interesting topic. I’m coming from consultancy, working in compliance and it security. I have a rpi5 with nvme running haos. Backup to gdrive, local, and smb, at alternating times. Spare rpi4 with SSD waiting for his job. Zigbee managed by two uzg, so I have a spare one just in case. Separate rpi3 running z2m in Docker with backup to smb on daily basis. In hindsight, two medium powered machines with proxmox would have made sense, but I have zero experience and little time, so had to rely on what I knew.
2 opnsense firewalls with HA set up as well. That’s pretty neat I must say.
Regarding zigbee fail over… Please correct me if I’m wrong, but in my head and my recent experience with backup and restore, you just need the device ids and can easily move from one z2m instance to another one, and the router/coordinator (uzg) does not store anything at all, so can be swapped with 0 tasks to be done. So if uzg fails, start a new one, set ip address to same one, done. What fail over are you missing here? What is the scenario?
The problem is that with hardware-based Zigbee Coordinator it actually contain the whole Zigbee network, including all the security keys and device IDs, so in a classic Zigbee gateway implemenation the host application running on your computer is only acting as as interface to recieve status information and send commands. And that means that if you today want to migrate from one physicial Zigbee Coordinator adapter to another then you need to backup all that data from the old adapter and restore it to the new adapter.
The limitation is that two can not have two Zigbee Coordinators in the same network, and as such the scenario is that if you could to have two identical hosts computers which could run Home Assistant with a high-availability service that replicate all data between them and be able to automatically fail-over to the secondary node when the primary node fails, then it would not be as simple for the Zigbee setup even if you had two physically identical Zigbee Coordinator adapters because the adapter on the secondary would first need to get a backup restored (flashed) with an up-to-date copy of latest Zigbee network. Then you would need the same to happen if and when you want to fail back.
So could you achieve all that today by running Home Assistant in a virtual machine on something like a Proxmox high-availability cluster and some scripting; yes, you more or less only need to make sure that never have two Zigbee Coordinator adapter active at the same time, and that is the most complicated part in a classic setup, so one could easily see something go wrong with fail-over and fail-back of Zigbee network backups and restore. But that is the part what the “Zibee on host” architecture would solve.However you would still need to solve the problem of developing some kind of easy to use high-availability solution for the Home Assistant core itself and its add-ons.
Are you really sure this is the case? Because I have the network key, the channel, the pan id, the ext pan id, and all device ids in my configuration.yaml on my Docker pi. Nothing indicates that the UZG hardware coordinator would store anything at all.