Z-Wave JS Stopping Constantly

I don’t have a hard date on when things started to go south on me, but I would say around a month or so I’ve been chasing gremlins - but that is just a guess so purely anecdotal. It may be just a few weeks. If you read a few posts up you can see where @freshcoast gave some excellent reviews of the recent versions.

For me, I’m on HAOS, I don’t run docker (well, we all do technically, but it’s just the normal install). I had the add-on set to auto update and I’m leaving that on for now hoping for a stable release and then I’m toggling that off!

When it gets into the bad state is the serial port listed registered with the OS? Or maybe more important question is when it is in bad state do you see any ZWave data flowing?

Yes, I do see data. I was watching it last night when it bugged out and saw packets flying through the Z-Wave logs, I also saw the driver was online and I could even see that my battery operated devices were reporting status to HA (I could see motion sensors “on” then “off” in normal fashion). It’s giving me the impression this may be software issues rather than hardware but I cannot yet say for sure.

Ok, good so from what you say looks like data is flowing into the controller. When you flip a switch from HA side do you see the packet go out (I assume yes) but what is more important is do you see the ACK come back (which will indicate the message actually got out the RF interface or not)

Just trying to isolate the issue is before or after the serial interface.

I haven’t yet gone that deep, right now stability is my priority but I’m planning to let it fail again today so I can do some more testing like that. But, so far, I can see that when the system is not working that I get lots of “Controller Did Not Respond” messages (see my post right above @sdc first post) so while I see packets outgoing in the log I don’t see ACK’s. My test will be to live-debug and see what actually flows in real time.

1 Like

Yea, sorry if I am asking questions you already answered. It was not clear to me above the exact state in your answer… the log messages about controller did not respond is what got me on the current path of questions :slight_smile:

I think one interesting way to try and recover is not reboot HA but instead power cycle just the ZWave controller (pull it out or unplug it and then put it back in and see if things recover).

I can’t tell from this thread if the controller is hanging or if the HA/driver software is the issue.

I think since you reboot the software side that it is not the rest of the ZWave network…

Not sure if this is related, but I’ve found that running a manual “heal” brought my network down (took several hours to lock up). Doesn’t zwavejs2mqtt automatically run nightly heals?

Had this exact issue start recently after move to zwavejs mqtt 6.1.0

Froze network once or twice a day. Reboots or just waiting 15 minutes (sometimes) would fix it

At the end of day I had device at edge of network (far far away) that had poor communication with controller. I removed that and no problems since. I think the zwavejs beta driver bundled with it not handle the communication issue well but who knows.

EDIT
Someone asked about controller hanging and I’m not sure. I unplugged/plugged controller once and got network back and another time it not work. Clearly I can say it was related to the difficulties it had getting response from the device that I removed.

I added second network using RasPi+zwavejsmqtt docker closer to device and no issue in that network so not device specif either

Restarting ZwaveJS to MQTT repairs the problem for me. I don’t have to reboot the rPi or unplug the Z-Stick like you do when Z-Wave really misbehaves, that is why I’m really thinking it may be software.

That’s almost always the case. All your battery devices have to wake up to heal and I just don’t heal anymore, it brings Z-Wave to its knees and doesn’t resolve anything. Healing happens somewhat automatically as nodes respond faster or slower to relaying packets, so they dynamically adjust their neighbor list during regular activity, in my experience the things you need to “heal” work themselves out in a day or two.

Yea, I’m considering that a device might be the cause, but can’t identify it yet. While I have a couple of new devices in the mix I’ve actually removed them one-by-one with no change so far. I also upgrade from a Gen5 to a Gen5+ which has better range - and that by itself is the single best upgrade to Z-Wave I’ve seen, it’s super fast when it is working and the devices that were farther away and slower are now just as responsive as the ones right next to the stick.

1 Like

Issue reported here seem to have a lot of similarity to your… Not sure if you have seen it. It is from May:

Similar but I’m not sure it’s the same. That error message can probably pop up for a number of reasons. Since I don’t use a docker install I don’t know if that’s the same issue. Besides, that was in May and they released a fix for that issue.

I’ll probably open up a Git ticket when I have more information, right now I’m trying to get to the bottom of it. If it seems to really point at software then I’ll open one, if it seems to point at hardware then I have to find the culprit.

I don’t know if it’s hardware though since I don’t have to reset any hardware to fix the problem, I’m leaning hard towards software. I could be caused by something hardware, I suppose, but why would restarting the plugin fix the problem if it’s hardware? The plugin is just relaying packets from Z-Wave JS to HA.

Good luck… I am new to this entire software stack so am not sure how to see if that fix has made it upstream. Base on that issue thread seems to have a lot of things in common beyond just the log message.

Honestly Z-Wave JS has been pretty outstanding since it was released and a huge improvement over Open ZWave before it. This is the first major glitch since the release that I have experienced. When it’s working, and with the new Gen5+, it’s lighting fast, responsive and exactly what I want from home automation.

But this is currently far more maintenance on home automation than I think should be needed, so I hope to resolve whatever this gremlin is very quickly.

1 Like

As another point of information…

I am running HA Container, Zwavejs2MQTT in a standalone container (non-add-on) on a NUC PC.

My zwavejs2mqtt is running the following:

ex

And I’m not experiencing any issues like you are describing above (yet…)

So it’s either being caused by hardware or possibly an add-on issue?

TBH, I never had any issues at all with OZW 1.4.

The only reason I switched is because I knew it was going to be removed at some point so I figured get ahead of it. And I was pretty hesitant about doing it even when I finally did just recently. for the reasons that you and others seem to be experiencing (using beta versions of things as things evolve and improve). I don’t know if it’s my overall system config that has saved me or maybe I’m just lucky. :man_shrugging: But I doubt it. I’m never the lucky one it seems. :laughing:

And I didn’t see any real improvement in performance after the switch compared to the old zwave either.

Neither did I, it wasn’t nearly as fast as Z-WaveJS though in my experience, plus you couldn’t tweak Z-Wave device configs as easily.

Same!

Actually, after a lot of debugging I found that the issue I’m having is a known issue and addressed in the latest beta 3. It has to to do with S2 security and locks, which I was able to pinpoint as when my system crashes being when I automatically locked or unlocked my locks I would get an exception in Z-Wave JS and Z-Wave dies (Communication with node stalls after `ZWaveError: Security CC requires a nonce to be sent` · Issue #3811 · zwave-js/node-zwave-js · GitHub).

My current solution isn’t elegant but I just have an automation restarting Z-Wave JS to MQTT periodically to try to band-aid the problem until a stable release is published that fixes the problem.

This sounds like: This.
If that is the case you should go back to 8.83 or 8.8.9 beta.3
Not sure what is causing this issue.

Yes, it does seem similar, but the error I reported on the ZWaveJS Git is identical to the one apparently addressed in Beta 3 for that. I may downgrade both to a more stable version (not even sure how I ended up on a beta anyway, I don’t subscribe to beta channels).

There is an update that should fix this issue.
zwave-js/zwavejs2mqtt/releases
It is installed with latest tag.
docker pull zwavejs/zwavejs2mqtt:latest

zwavejs2mqtt: 6.2.0
zwave-js: 8.9.1
1 Like

The lock issue makes a ton of sense - my network was going down at night when my lock automation runs to lock all of the doors. I’m running 8.9.1 as of this morning, fingers crossed - did not see any issues with beta 3, but not sure it was out long enough to see the issue.

1 Like

Put a 5-10 second delay between commanding your locks to lock, this will stop it from locking up your zwave network.