zwaveJS network stops responding after 24 hours or so with "Timeout while waiting for an ACK"

In case it helps, my Z-Wave devices consist of:

2 Yale locks
6 Inovelli Red dimmer switches
1 T6 Pro thermostat
1 AEON Labs smart outlet
1 HomeSeer floodlight sensor (issues were happening before I added this device)
1 HomeSeer HS-FS100+ light sensor (issues were happening before I added this device)

Currently the locks are the only devices on battery power.

Minor update. I could have sworn a soft reboot of the Raspberry Pi didn’t fix anything but I tested it again today and it seems to work to fix things now. I also tried just resetting the zwavejs add on and that also fixes the issue temporarily.

At this point I’d settle for a way to detect errors and reset the add on, though I’m sure resetting the add on multiple times per day is likely to cause some signals to be missed and would still like to solve to root issue.

When restarting the add on I noticed my logs were only set to INFO. I’ve switched to DEBUG and will post new logs next time it hangs.

Try disabling the watchdog, seems to solve unexplainable issues sometimes.

I’ve had similar experience with an RPi. The timeouts and “failed to execute” behavior seemed to indicate that the system was struggling to keep up with the activity. My logs looked a lot like yours.

In my case, I eventually tracked the problem down to a lack of memory in my RPi setup. My RPi was a 1GB version. Checking the memory and processor usage (in System), after power up it was running at 80% or so memory utilization. Over time, maybe 24 hours or so, that would climb to 95% and behavior would start to get all those timeouts and such errors.

I swapped the system over to a new RPi with 8GB memory and everything has worked fine ever since, with all the same devices, integrations, etc.

I suggest looking at your processor and memory usage to see if that might be the problem. HA seems to go berserk when it’s starved for resources. My system now is running at about 15% max utilization.

Good luck!
Jack

Thanks for the suggestions!

Last night I migrated from zwavejs integration to zwave js ui (still using websockets) for better visibility into what is going on. I’ll try out these options after the next crash so I’m only changing one thing at a time.

If you’re still having problems, check to see what your memory/processor usage is immediately after you power up the RPi. More details here: HA Strange Behavior -- SOLVED!

Try the advices from one of the zwavejs developers when you have Timeout while waiting for an ACK from the controller:
https://zwave-js.github.io/node-zwave-js/#/troubleshooting/connectivity-issues?id=problems-communicating-with-the-stick

Improve the network health to avoid message flood: Z-Wave JS - Z-Wave driver written entirely in JavaScript/TypeScript

Use the network graph in zwavejs-ui to find how many hops the devices use to communicate with the controller. It should be using one hop to the controller: Z-Wave JS UI

Did you mean the watchdog for the zwave add-on? If so, that does not seem to be my issue as I just had a lockup with the watchdog disabled (though it did run for a bit longer than usual).

Thanks for the suggestion. Every time I look at the memory usage page it is between 30-40% used. I used to have the systemmonitor sensor logging the memory usage but that seems to have mysteriously broken sometime in September. I’ll try to get that fixed and then cross reference that with the crashes going forward.

For the record, I’m using a Raspberry Pi 4 with 4 GB RAM.

Thanks for those links! I’ll dig into those and see what I can tweak.

Looking at my graph, I currently have 3 nodes at the 1 hop level, 6 nodes at the 2 hop level, and 3 nodes at the 3 hop level. Two of those 3 hop devices are actually pretty close to my Z-Wave controller, so I’ll try healing the network to improve that. I have 2 devices inside and one device on the outside of a detached garage about 40 feet from the house which will likely need at least one extra hop, but they seem to be at least as reliable as the rest of the network.

You should not have nodes which has so many hops to the controller. It should be one hop to the controller.
I see you have a 700 series controller. I tried to use a Aeotec z-stick7 with a extension usb cable as a controller, but experienced some of my nodes had two hops to the controller. The controller logged lots of packet loss and problems communication with the controller. When I changed my controller to my old Aeotec z-stick5 all the logging disappeared. All the nodes have now one hop to the controller.
If you have a old zstick5 available maybe trying to do a restore to this zstick will help.

Unfortunately the 700 series stick is the only one that I have.

As for the memory usage theory, now that I’ve been logging memory usage I can say that usage is pretty steady at 37% used. I had a jump to about 47% during the last update while my database was migrating, but otherwise it has been extremely consistent.

You can try to ping the dead nodes: Automate ZwaveJS Ping Dead Nodes? - #3 by freshcoast and maybe it helps.

Interesting. I’ll give that a try next time it locks up on me.

Based on the logs the problem is with the stick / USB interface. I have read on this forum about similar issues caused by low voltage levels on the PI. See if you can check into that.

Other things I’d consider
a) use a different USB port
b) get a powered USB 2.0 hub, plug this into the PI and the long cable into it. If it’s a low voltage on the stick this should solve it.
c) get another stick (you’ll want to have a backup stick anyways)

Thanks for the suggestion. I’ve borrowed a powered USB hub and connected the Z-Wave stick through that. Fingers crossed that helps. If so, I guess I’ll buy a dedicated powered hub for that server. If not, I guess I’ll be buying an alternate Z-Wave stick.

I’m now at 4 days of uptime without any Z-Wave issues, so I’m assuming the powered USB hub is the solution and have ordered a dedicated one.

Thinking back, I used to get a few days of runtime before it would lock up. I suspect that the firmware update fixed a software bug that a lot of people (including me) were experiencing, but maybe just so happened to be more demanding on the power, causing it to actually make my system worse.

Either way, I’m happy I can finally properly use my Z-Wave network!

Glad that it is working. At some point try to take a look at the voltage going into the PI from the power supply (or get a high quality one). If this is low - which it appears to be since the powered hub is helping - you may end up with other apparently random failures as the power from the utility fluctuates.

I’m using the 5.1V 3.5A USB C power supply from CanaKit. I thought that was supposed to be a high quality supply, but maybe it isn’t (or maybe I got a dud). If I see more weirdness I’ll see about measuring it or swapping it out with something better.

Oh, and I just checked and I have had the Raspberry Pi Power Supply Checker integration installed and running. It has been in “ok” status for weeks.

1 Like