Migrating Z-Wave-JS-UI from HAOS to standalone Docker

miztahsparklez · August 1, 2025, 7:51pm

I was trying to move my Z-Wave-JS-UI instance from being an addon to HAOS to it’s own Docker container on a raspberry pi, as I was trying to setup a cluster for my proxmox environment. Everything seemed to be okay initially, but after I woke up today, all my Z-wave stuff had stopped working, despite showing available in HA.

I migrated by exporting settings and also all the files under store and importing them to the new docker instance. All my keys and stuff seemed to have come over correctly. I then just unplugged the dongle from one of my proxmox server nodes and then plugged it into my raspberry pi.

I see about half of my devices responding in Z-Wave JS UI, but the other half are reporting dead. However, none of the devices are able to transmit/receive anything (basically not responding).

Any ideas on how to recover? Trying to do anything I can before blasting the whole thing and starting over? It’s funny that the Zooz devices are the worst offenders.

cornellrwilliams · August 1, 2025, 9:39pm

If they are battery powered devices try waking them up and performing a reinterview.

tmjpugh · August 1, 2025, 10:39pm

if you moved the zwave stick all the routes changed.

They may also be taking their time reconnecting if battery device.
Look at the zwavejsui logs for hints. after maybe try trigger device, reinterview, rebuild route – in that order – for the device still not wokring.

miztahsparklez · August 2, 2025, 2:29am

hmm. the physical location of the stick hasn’t really moved, maybe 1ft at most. Also most devices are LR devices, so it should be hub and spoke model vs mesh.

I actually restored my HA back to yesterday’s backup and moved the dongle back to the original host without issue. My devices started responding again.

Fast forward to round 2 on the rpi, I blasted my config on my rpi and redeployed the docker image. This time I only restored the configuration json to restore the keys. It does show the devices that were originally paired, but in an non-interviewed state (“unknown”), as I would expect, but have a continually spinning wheel on “protocolinfo”

In the logs, I do see that the nodes have responded:

2025-08-01 19:07:11.246 DRIVER « [RES] [GetNodeProtocolInfo]
2025-08-01 19:07:11.249 CNTRLR « [Node 005] received response for protocol info:
                                 basic device class:    Routing End Node
                                 generic device class:  Entry Control
                                 specific device class: Door Lock
                                 node type:             End Node
                                 is always listening:   false
                                 is frequent listening: 1000ms
                                 can route messages:    true
                                 supports security:     false
                                 supports beaming:      true
                                 maximum data rate:     -Infinity kbps
                                 protocol version:      3
2025-08-01 19:07:11.251 INFO Z-WAVE: [Node 005] Interview stage PROTOCOLINFO completed
2025-08-01 19:07:11.252 CNTRLR   [Node 005] Interview stage completed: ProtocolInfo
2025-08-01 19:07:11.253 CNTRLR » [Node 005] pinging the node...

then… later all of them do this

2025-08-01 19:07:11.340 DRIVER » [Node 005] [REQ] [SendDataBridge]
                                 │ source node id:   1
                                 │ transmit options: 0x01
                                 │ callback id:      71
                                 └─[NoOperationCC]
2025-08-01 19:07:11.352 DRIVER « [RES] [SendDataBridge]
                                   was sent: true
2025-08-01 19:07:18.633 DRIVER « [REQ] [SendDataBridge]
                                   callback id:            71
                                   transmit status:        Fail, took 7270 ms
                                   routing attempts:       3
                                   protocol & route speed: Z-Wave, 40 kbit/s
                                   routing scheme:         NLWR
                                   TX channel no.:         1
                                   beam:                   1000 ms
2025-08-01 19:07:18.638 CNTRLR   [Node 005] The node did not respond after 1 attempts, it is presumed dead
2025-08-01 19:07:18.640 CNTRLR   [Node 005] The node is dead.
2025-08-01 19:07:18.643 INFO Z-WAVE: [Node 005] Is dead
2025-08-01 19:07:18.647 CNTRLR   [Node 005] ping failed: The node did not acknowledge the command (ZW0204)
2025-08-01 19:07:18.649 CNTRLR   [Node 005] Interview attempt (1/5) failed, node is dead.
2025-08-01 19:07:18.651 ERROR Z-WAVE: [Node 005] Interview FAILED: The node is dead

I’m scratching my head on this one… as I think it should be working

For fun, I reimported the directories again and the names, etc. repopulated, but everything stays in the failed state. Nodes no longer respond to pings and everything is just marked dead

tmjpugh · August 2, 2025, 3:01am

Cool. Cool. But did you check what I suggested?

So your WAG didnt work. Can you restore backup?

~~Me to. You were in a meaningful state but now I am kinda unsure what you did. Sounds like you just wiped zwavejs config and now things dont work. Did you restore the keys?~~
I presume you reboot after things stabilized? starting fresh docker container sometimes doesnt result in fully restored system since the first run of container is lacking data. A reboot is sometimes need.

all nodes or just some nodes?
what happens if you “reinterview” node 5?
what happens if you “rebuild routes” for node 5?

PeteRage · August 2, 2025, 12:49pm

Since it worked for a while, it sounds like stick problem on the new computer. What stick what version of FW?

Start by putting it on an extender cable and a USB 2.0 powered hub and have the stick as far away as possible from other electronic equipment.

Also check the zwave Startup and Recovery Options.

miztahsparklez · August 3, 2025, 10:15am

soooo… it seems like I’ve narrowed it down to the latest arm64 docker image or some strange conflict with my rpi 4.

I tried all of your guys’ suggestions and even swapped out the existing extension and tried various powered usb hubs with no progress (my extension is at least 4 feet away from anything and is of good quality). I ended up spinning up another docker image on my Synology nas to see if maybe it’s docker related.

Now it’s not exactly a walk in the park here, but on the NAS (amd64) I was at least able to do no security inclusion with the device responding fully to everything. Moving the dongle back to the RPI 4 had some minor differences this time, showing a smiley face for connectivity, but interviewing and rebuilding routes would all fail with no response. however, one major difference is that it actually registered some activity when pressing buttons in the log, despite not having a full detailed interview completed.

in either case, I think I’ve exhausted everything I could try on the RPI… even my hacky synology solution (lack of native usb support) seems to work better than this thing. I will most likely just go with a POE esp32 dongle at some point instead. Thanks for all of your suggestions.

PeteRage · August 3, 2025, 12:03pm

FWIW, i run on zwave on Synology amd64 docker with zero issues for 4 years

tmjpugh · August 3, 2025, 3:39pm

Well you might want to save this date

jh95959 · August 3, 2025, 6:18pm

My experience with “timeout” kinds of problems turned out to be from occasional lack of computer power.

As I added more devices and more integrations, it became more common for devices to be declared dead, or other failures. This was especially annoying while HA was running on a NAS, so I switched to a dedicated RPI, which was better, for a while. Added more devices and integrations and then nothing seemed to work any more, until I put more memory and replaced my RPi3/1GB with a RPi4/8GB. Now at 100+ devices, mix of Zwave and Zigbee, and lots of integrations. Everything’s worked fine since.

My speculation is that when some “high priority” task is running, other tasks (like checking to see if a device is dead) don’t run fast enough to avoid timeouts. So you experience lots of “device dead”, or “failed to ACK” events.

I had a similar problem running Channels DVR on another RPi4. All those problems went away when I moved it to a Beelink mini-PC running Ubuntu.

If I have HA problems like this in the future, I’ll get another low-end mini-PC.

I suggest looking at your hardware setup and see if more CPU and/or memory might help.

miztahsparklez · August 3, 2025, 9:29pm

I thought that maybe it could be limited by device power consumption as my pi is POE based. I tried adding two different powered usb hubs to see if the issue would go away but no dice. The pi doesn’t have anything else running on it at the moment, so cpu/ram should not have had too much of a hit. But I do think there is some validity to what you’re saying. My high performing proxmox nodes work immediately without issue. Inclusion/exclusion work with security. My nas only seems to be able to reliably include via manual inclusion and without security. The raspberry pi won’t do anything.

That new nabu casa zwave thing might be exactly what I’m looking for….

jh95959 · August 3, 2025, 10:08pm

In HA, there are tools to look at CPU and memory usage - in Settings, go to System/Hardware. This may give you an idea of what’s going on. But you have to be careful - the OS manages memory and tries to keep some free at all times. That means it may “swap out” some of the program to your external disk storage (which in my case was an SD Card), which is just asking for trouble with delays. When parts of the program are “swapped out”, timeouts are more likely. I also saw evidence of “memory leak” behavior, possibly from some integration that I had added.

The system I used for my Video (Channels DVR) server was A Beelink S12, which cost about $150 last year. I’ll probably get a similar mini PC if and when the Pi4 becomes too limited for HA.