Remote access Zigbee/Zwave controlllers for Live Migration (High Availability)

I’m trying to find a way to have true High Availability on my homelab cluster.
I’m currently running 3 NUC nodes in a cluster, running various tasks with the primary (and relevant) application being HomeAssistant (technically a VM running HASSOS).
On my “primary” node, I had 2 USB devices installed for use with HomeAssistant (1 Zigbee and 1 Z-wave stick). Through a total lack of complete foresight in how it would all work, I purchased a copy of VirtualHere, installed the server on my QNAP nas and have the clients running on each PVE node. That part worked flawlessly, and I have them configured in HA without issue.
However, it didn’t solve the problem I really have in that the VM can’t be live migrated with “local devices” attached. I have to remove the USB passthrough to the VM, migrate, then re-add the USB to the destination node.
Is there a way to either force live migration with local devices (since I can configure them the same on both nodes) or is there another way to create a remote access to these devices?
I’ve seen 2 alternatives, but both seem to not be real options:

  • USBip or USB2Ser, which is what VirtualHere is and would suffer the same limitation as VH, by Proxmox seeing it as a local device.
  • remote zigbee2mqtt server (same for Zwave?) - This just moves to a different SPOF and doesn’t really provide HA anywhere outside the cluster (if I lose the z2m server, I lose it all anyway).
    Is there some solution out there that I’m missing? I can’t be the only one who’s had this need?

A seemless migration is difficult. You can get an ethernet connected coordinator and should be able to move the vm, but the coordinator is then the spof.

The problem with having a warrm spare coordinator is that they both need to have the same mac, and they can’t be powered simultaneously with the same mac, even if not in active use. Some scripting to update the mac on the spare, and cut power to the failed unit would be needed. Even if apparently bricked, the failed coordinator could block the new coordinator if it has any power applied. Then the spof moves to whatever zwave/wifi/etc plug or relay you use to cut power to the coordinator.

Not worth the trouble IMO, but I think it would be doable. I’ve done wilder things.

I’ve been playing around the High-Availability concept for Zigbee coordinators attached to Home Assistant during some months… please find my prototype approach in the following repository:

Any feedback is welcome!