A/B Testing

I’ve been using an HA Yellow for a couple of years and generally happy with the performance etc. But one area that has always been a concern is how to test changes and rollback with the least disruption. My house is quite large and my family rely on a lot of automations, so having the system down for more than a few minutes is not a popular option. Main integrations are using ZHA, Thread/Matter or WiFi.

I am considering having two hardware HA servers, so one can be live and the other standby or test. Probably each behind a separate UPS to protect the equipment in case of surges/brown outs and power cuts.

ZHA network only supports one coordinator - can I use a POE Zigbee Antenna connected to two HA instances?

Using my Unifi network - would some network isolation be useful to allow two HAs to coexist?

I’m not looking at full High-Availability, I just want the peace of mind that I have a functional system on standby if something fails.

Update Concept:
HA 1 - live
Make a backup
HA2 - isolated from HA1
Restore backup to HA2
Upgrade HA2
Testing
Switch the network isolation HA2 now live
Check all ok.
If not, switch the network isolation again.

This gives time to review / address issues…

What considerations are there?

I can’t say what happens if two instances try to talk to the same coordinator, but this sounds like a lot of work you do every time just in case you need it once. Restoring a backup is quite quick nowadays, so having a backup and spare hardware will go a long way and save you a lot of time.

I guess it can be done, but you’ll have to deal with addons etc which have the HA ip address, because that is different for both instances. You’ll have problems with rate limiting on external services if you have two instances doing the same.

I would invest in making sure your house will function (albeit less smart) without HA too. Because hardware faillure or faulty updates is not your only concern. Even if HA stays the same, cloud services changing, device firmware-updates, etc will sometimes break things. Even a neighbor with a new wifi router can mess up your Zigbee. A/B testing won’t save you from that.

ps. I’ve been beta testing with my (large) main instance for a couple of years now. I’m running a beta as we speak, no problems. Sometimes integrations break for a short time. Often because of required changes from outside influence, such as Google depregating an API recently, or Tado changing login methods. I never had serious problems, nor that I needed to revert to a backup because of it. And I’m still married :slight_smile:

I run 2 HA systems, 1 for development and 1 for production. Many integrations allow for both instances to work with the same devices, in the case of Zigbee and Z-wave I use the integrations “Remote Home Assistant” and “MQTT Discoverystream Alt” to synchronize the production devices/entities to the development system.

It’s more work, but it’s how I started my migration to Home Assistant and so far I’ve stuck with it. It allows me to work out how complex integrations can be deployed and how the Frontend cards can be configured. Since I keep the device and entity names the same on both systems it’s easy to copy and paste dashboards between the two systems once I’m happy with a dashboard design on the dev system.

I also have duplicate Zigbee and Z-wave hardware on the development system, mainly to load integration and add-on changes before pushing to production. They don’t have any devices paired, however so it’s not a comprehensive test. This hardware can be pressed into service in production should a hardware failure occur.

I use Home Assistant CLI and regularly dump all the entity names and compare them between systems. It’s easy to get out of sync so there is maintenance to do so.

The Watchman integration is also useful to checking differences and keeping systems clean of orphaned devices and entities.

I generally always load updates on the development system first. I run both systems on High Availability Proxmox, so the VMs can can fail over between physical servers. I run Zigbee and Z-wave as external TCP connections so fail over of the VMs still works with that hardware.

In the end is it worth the effort? Hard to say. Probably not, but there is a lot of education value in the process.

1 Like