Z-Wave JS Stopping Constantly

I’m at my wits end and honestly just about at the end of my rope with messing with HA. Trying to just take a deep breath but I’m literally sinking hundreds of dollars trying to fix Z-Wave and just hit one brick wall after another. It started out being unreliable and slow, ended replacing every single component from the rPi to the USB hub to the power supply to the Z-Stick itself. The Z-Stick helped make things much faster, but Z-Wave still dies multiple times a day.

Today I get up and none of my Z-Wave devices work. The battery operated motions sensors show motion, but not a single wired device will turn on or off. After restarting Z-Wave JS to MQTT then everything works great, and instantly. Now, just 8 hour or so it has happened again, battery devices seem to report back but wired devices no longer work. Restart Z-Wave JS to MQTT and, once again, it’s all working again.

There’s not a peep in the logs about anything being wrong, either in Supervisor, Core or the Z-Wave JS to MQTT log. I just don’t know where else to look and don’t have any more things I can replace and am frustrated beyond belief that I have to hold this systems hand every few hours just to have things work.

Any suggestions on what I can look at or do?

1 Like

Are you using a 700-series USB controller? If so, those have a bug, and it could be related (hard to say).

Is your USB controller on an extension cord? If not, that nearly always improves things.

The latest version of zwavejs2mqtt is using a beta driver version. Did you notice this happening recently, or has it always been the case? Perhaps downgrading (if possible) could help.

A lot of work, but have you considered going to z-wave js? I have 89 z-wave devices and the network just works. Any chance you have a bad wired z-wave device? That can reek havoc on your system/mesh. I have had ~10 devices go bad and every time one goes bad the response is horrible and finding the bad node is not easy. I had to put surge protection on the panels to stop frying the electronics.

Bad SD card maybe??? Have you tried moving to a VM on a computer you own to see if it is the pi? The resources consumed on a modern computer are very little so it will happily run in the background while you do work.

Did you put your Z-Stick on a USB extension cable to get it away from any interference?

Are you on all the latest version of software??

Have you healed all the wired nodes?

Can you find the last node the z-wave js to mqtt communicated with. This may give a clue to a bad node if it is consistent.

Hope this helps and I understand the frustration.

No, it’s a Gen5+

In my original post I mentioned using Z-Wave JS, so I’m already on that. It’s been working fine until a month or so ago and it’s been a battle ever since, one I am losing. As to the rest of your query:

  • Already did away with SD and moved to SSD
  • Yes, have an extension and that was replaced as well
  • Yes, I’m always up-to-date on HA and the plugins
  • Healing doesn’t really work in my experience, but based on the instant response times once the broker gets rebooted I don’t think I have a need to heal
  • I have tried to find a common node that is “last man until dead” and have some theories but nothing that is backed up by a log entry that points me to any device in particular. I just find it hard to believe that even a jabbering Z-Wave device can bring down the stick entirely. I could see it choking my network, and that’s what I thought was happening until I replaced the Z-Stick two days ago, but causing the Z-Stick to fail entirely would be odd

I’m not a home automation newbie by any stretch, nor am I shy about digging deep into HA because I’m a developer and I’m comfortable doing that.

I’ve already started working on a Home Assistant running on a VMWare session to do away with the rPi entirely. I think I’ve outgrown it’s usefulness and maybe that is part of the problem. I spun up a VM with Home Assistant and am blown away at how responsive and fast it is - of course it’s devoid of all my automations and devices at this point but perhaps I need to just go that route and see if it resolves anything.

It’s been working fine until a month or so ago and it’s been a battle ever since, one I am losing.

If you were able to pinpoint when it stopped working, that might help. Several releases lately have been problematic.

Bad:

  • z2m 6.0.0 to 6.0.2 (driver version 8.8.0 to 8.8.2)

Good:

  • z2m 6.0.3 (driver version 8.8.3 which reverted previous problems)

Questionable:

  • z2m 6.1.0 and 6.1.1 (using a beta version of driver 8.9.0)

I would say 6.0.3 is the latest stable version. Unfortunately, as you are using an add-on it might not be possible to switch to other versions.

I just find it hard to believe that even a jabbering Z-Wave device can bring down the stick entirely. I could see it choking my network, and that’s what I thought was happening until I replaced the Z-Stick two days ago, but causing the Z-Stick to fail entirely would be odd

An overly chatty device will easily bring down a Z-Wave network because it is low bandwidth. Has nothing to do with which USB stick you are using.

To have more of an idea what’s going on, you’d want to check the driver debug logs. That would tell you if devices are not responding, or if the controller is not, etc.

Interesting. Here’s what my Z-Wave JS is:

zwavejs2mqtt: 6.1.1
zwave-js: 8.9.0-beta.1
home id: 3823806645
home hex: 0xe3eaa8b5

Perhaps I’m dealing with software gremlins.

I’ve been trying to keep an eye on that but haven’t found any smoking gun as of yet.

I should have been more clear: z-wave js without mqtt?

I have had a bad z-wave node do very odd things. A bad node has caused my lights to flash on and off when activated by an automation. I have had a total lack of communication with all the nodes - only a reboot will fix. In my humble opinion, there is no limit to the havoc a bad z-wave device can cause on a mesh network.

Hope the VM route gives you some answers.

I haven’t tried that, mostly because most things I read say that the interface with the MQTT is superior to the native app. That and redefining my entire Z-Wave network in HA a second time doesn’t sound fun (having a LOT of devices this was pain from moving from the OZW).

True, I admit that perhaps this could be the problem but trying to narrow down which device it is could be difficult with so many devices. It’s curious that my wireless devices seem to report in and even trigger events on my Insteon network just fine, but no wired devices work until I restart the broker.

I’ve started debug logging all Z-Wave to a file, but where is that file located? Up until now I’ve been doing live logging but figure this might be a better way to chase this down. It says store/zwavejs2mqtt_{date} but I don’t know where that is. I don’t see it the config or the config/.storage folders.

I can see it in the Z-Wave JS “Store” folder, but that’s painful to read :).

1 Like

You can download the file and open it in any editor you want.

1 Like

Ok, this made me laugh. As a combat veteran one of the mantra’s we lived by, and still do today to an extent, is “embrace the suck”. Z-Wave JS found it for me:

2021-12-21T00:00:37.176Z CNTRLR   finding SUC...
2021-12-21T00:00:37.194Z CNTRLR   This is the SUC

I’m having similar problems. Started bringing over all my devices from ST 2 weeks ago and its been a nightmare. Constantly having to hard reboot (pull the power cord) from my rpi. Running a rpi 4 and a zooz 700 stick. Z-wave js just starts acting stupid when trying to pair the next device after a couple. The dialog box never loads properly showing the status of the pair. Then I find a half configured node in the list sometimes. Sometimes I can remove the device other times I have to force remove it. Then force removing stopped working for awhile. Do a hard reboot and I can pair a few more devices. I was trying to do a heal after ever room but the box never showed finished even after 24 hours running the heal a couple of times. I have noticed that a few hours to 24 hours after doing a heal my zwave network crashes. It seems I’m getting data from the sensors but can not control any of the lights or switches. I’ve tried restarting HA. I’ve tried restarting zwave js in supervisor. The only thing that works is pulling the power cord and letting everything reboot.

In the debug logs I’m not seeing any problem node, rather the controller just bombs and gives this message over and over until I restart:

2021-12-21T01:00:54.279Z CNTRLR   No response from controller after 1/3 attempts. Scheduling next try in 100 ms.
2021-12-21T01:00:54.595Z DRIVER   unexpected response, discarding...
2021-12-21T01:00:54.604Z CNTRLR   Failed to execute controller command after 2/3 attempts. Scheduling next try i
                                  n 1100 ms.

The controller is brand-spanking-new. The entire list of Z-Wave devices shows up in Z-Wave JS, but nothing responds and the status shows “Driver Ready”. Pinging any node comes back successful. The live debug window shows lots of activity and the re-interview works fine. The odd thing is that once I did all of that, it partially again without me restarting anything. It’s almost like it goes to sleep or something, however some of my nodes just now are dead until restart or pinging the dead nodes, then it works again.

This is baffling.

I’ve been having this exact issue over the past several weeks. Unfortunately my issue started around the same time that I updated the firmware on my HUSBZB-1, so I thought this was the issue. I’m running driver version 8.9.0-beta.1 of zwavejs2mqtt - every 2 days or so my zwave network goes completely unresponsive until I reboot the container. Nothing informative is showing up in the logs - the network was rock solid for the year plus I’ve been running zwavejs2mqtt.

Common theme, mine was working rock solid all year as well until recently. That’s when I started sinking dollar after dollar trying to chase the issue.

For the moment, I’ve set up automation to restart the ZWaveJS to MQTT add-on periodically to get me over the hump. Not ideal, but gives me a little breathing room as I continue to chase down this issue and start the slow migration of everything over to a virtual machine instead of rPi.

@sdc if you want to mimic what I’m doing, here is how I set up my automation:

alias: _DEBUG_ Restart ZwaveJS to MQTT
description: ''
trigger:
  - platform: time
    at: '06:00:00'
  - platform: time
    at: '11:00:00'
  - platform: time
    at: '15:00:00'
  - platform: time
    at: '21:30:00'
condition: []
action:
  - service: hassio.addon_restart
    data:
      addon: a0d7b954_zwavejs2mqtt
mode: single

1 Like

I am able to force the state you seem to be having but I doubt this is what is happening to you. I am new to HA so had to migrate a lot of devices over from another ZWave network. I used HA to exclude the device before I include it in the HA network. Pretty much every time I excluded a device then added it to the network then put HA back into exclude mode the entire HA zwave stack became unresponsive. It would not enter exclusion, inclusion or anything else. Sometimes I could get to a third exclude but rarely.

To recover I had to hard reset HA completely. Resetting just parts did not seem to help at all.

Also, for debugging ZWave I found putting it in silly debug level was helpful so you can see the actual ZWave packets and try to figure out from there…

I am running ZWave JS but not mqtt

Also, when this was happening the only thing I had in HA were ZWave devices.

@yepher I also have it in silly mode at the moment. It’s an overwhelming amount of information but you can see the whole thing. Fortunately I have HA on a 250GB SSD so I can let that just eat up as much disk space as it wants - however I also understand that it’s highly likely that it is degrading my system performance by doing such verbose logging.

Very helpful - thank you. When did things stop working for you?

I’m running zwavejs2mqtt in a stand-alone docker container w/ watchtower (so it’s always being updated to latest version), it looks like the container defaults to using the latest beta drivers vs stable, the beta release of the driver was 15 days ago (v8.9.0) - which is around when my issues popped up.