I'm looking for help troubleshooting a Home Assistant setup.
Hardware
-
Mac Mini 2018 (3.0 GHz 6-Core i5, 16 GB RAM, 512 GB SSD)
-
Home Assistant OS running in UTM
-
Zooz 800 Z-Wave stick
-
Sonoff Zigbee 3.0 USB Dongle Plus
-
Aeotec Home Energy Meter Gen5 (Z-Wave)
-
Two Third Reality Zigbee plugs
Automation
This issue has happened twice.
The water heater continues turning on and off and the HEM wattage appears to update, but the automations stop responding. The history stop updating. If i manually turn the plugs on it turns off when the water heater is on but it doesn't turn off when the water heater is on after 30 seconds.
When this happens:
-
Re-interviewing the HEM hangs indefinitely
-
HEM entities become unavailable
-
Home Assistant restart does not help
-
Host restart does not help
-
Restarting the UTM VM immediately fixes everything
Error:
2026-06-01 10:41:48.258 ERROR (MainThread) [homeassistant.components.hassio] Failed to to call /addons/core_zwave_js/restart - Another job is running for job group addon_core_zwave_js
The same Aeotec HEM Gen5 ran on Hubitat for a long time without similar issues, so I'm trying to determine whether this is related to:
-
Home Assistant OS in UTM
-
USB passthrough
-
Zooz 800 controller
-
Z-Wave JS hanging
Has anyone experienced something similar?
The detail that cracks this is that only a UTM VM restart fixes it (HA Core restart and a HAOS host reboot don't). Both of those happen inside the running QEMU process, which keeps holding the USB-passthrough device in its wedged state. Only stopping/starting the UTM VM tears down and re-creates the passthrough, which is what actually recovers the stick. So the failure is at the USB-passthrough / serial layer, not in Home Assistant. Z-Wave JS "hanging" and the Another job is running for job group addon_core_zwave_js error are just symptoms of the driver being stuck on a dead serial handle.
Most likely trigger: Z-Wave JS soft reset. Z-Wave JS soft-resets the controller on startup and at intervals, and on 700/800 sticks that causes a USB re-enumeration. In VMs with USB passthrough (Proxmox, libvirt, UTM/QEMU on macOS), the re-enumeration doesn't reach the guest, so the controller disappears until the VM is recreated. That matches your symptom exactly.
Try this first (free, targeted): Settings → Add-ons → Z-Wave JS → Configuration → disable Soft Reset, save, restart the add-on. HAOS usually auto-disables it, but under UTM it may not detect the VM correctly.
Contributing factors worth addressing:
- SiLabs 800 SDK lockups. The ZST39 (SiLabs 800, SDK 7.19.x+) has known lockup bugs. Firmware updates exist but are risky (some ZST39 updates to 7.21.3 brick the stick or cause an endless unresponsive loop), so don't rush a flash. Try the soft-reset fix first, and only update via Zooz's official OTW tool if you're confident.
- HEM Gen5 report flood. A water heater hovering near your 500W threshold makes the Aeotec HEM spam power reports, flooding the network and raising the odds of a lockup (and thrashing your plugs). Set its reporting params to a sane interval and/or a meaningful % change instead of constant reporting (params ~101-103 for which group reports, ~111-113 for intervals, plus the change-threshold params), and add a
for: debounce on the automation's "off" side too.
- USB hardening. Put the Zooz stick on a USB 2.0 powered hub or extension cable, away from the Mac Mini and especially away from the Sonoff Zigbee dongle (USB3 plus two adjacent radios is a classic interference combo). Make sure UTM binds the stick exclusively to the VM.
Why it was fine on Hubitat: Hubitat talks to the radio natively, with no virtualization, no USB passthrough, and no Z-Wave JS soft-reset behavior, so none of this applied. The HEM itself is almost certainly not the culprit.
Order of attack: disable soft reset → tame HEM reporting + automation hysteresis → USB 2.0 powered hub away from the Zigbee dongle → (last resort, carefully) ZST39 firmware.
Refs:
Thanks for the detailed explanation. Here's what I've done so far:
- I checked Z-Wave JS UI and confirmed Soft Reset was already OFF.
- I migrated from the standard Z-Wave JS add-on to Z-Wave JS UI.
- Controller Recovery is ON.
- Watchdog is ON.
- Log to File is ON.
- The Zooz 800 controller and Aeotec HEM are both visible and healthy in Z-Wave JS UI.
- I have not updated the Zooz stick firmware.
- My HEM reporting thresholds are currently set to 500W for clamp power changes and 10% for percentage changes.
- My automation uses the Whole HEM value with a 500W threshold and a 30-second delay before turning loads back on.
- The Zooz 800 stick and Sonoff Zigbee dongle are currently plugged in very close to each other. Based on your suggestion, I'm planning to move them onto USB extension cables and separate them.
One thing that still points me toward the VM/USB passthrough theory is that when the issue occurred, restarting Home Assistant and restarting the HAOS host did not recover Z-Wave. Only restarting the entire UTM VM restored the controller and automations.
At this point I'm actually leaning toward moving Home Assistant OS to bare metal on the Mac Mini to eliminate UTM, USB passthrough, and the virtualization layer entirely. The fact that only a full UTM restart recovers the issue makes me wonder if the VM is a bigger factor than the HEM itself.
I'll run Z-Wave JS UI for a while with logging enabled and see if the issue returns. If it does, I'll check whether the controller is still visible in Z-Wave JS UI before restarting anything.
Does the fact that Soft Reset was already OFF change your thinking on the most likely cause?
soft-reset being off rules that theory out and strengthens the USB-passthrough cause. "Only a full UTM restart recovers it" means the failure is at the QEMU USB layer, not HA. Next hang, before restarting, check if the serial device vanished (ls /dev/serial/by-id/) and look for USB errors in dmesg, that tells you passthrough-drop vs stick-lockup.
Bare metal would fix it but the 2018 Mac's T2 chip makes it fiddly; running zwave-js-server on a cheap Pi is the easier way to drop the passthrough.
Keep the USB extension/separation either way, it cuts the USB errors that can trigger the wedge. The HEM thresholds look fine; with soft reset off I'd treat the HEM as a possible trigger, not the root cause.