ZwaveJS is this normal? Failed to execute controller command after 1/3 attempts. Scheduling next try in 100 ms

Hi, is there any other thing I can do to get rid if these disconnects?

EDIT: created an issue:

Maybe it’s something with the hypervisor you’re running on? I see you mentioning ESXi, what version? There are quite some issues with v7.0 Update 3x. As you’re running in a VM environment, can you double check the console to see any dmesg logging about USB (re)connection? If so, please also check the same on your host to see if maybe your hardware is starting to fail.

Otherwise I’m all out of ideas and I’ll continue to monitor the conversation/issue created for future references.

6.7. u3 I run (if it ain’t broken don’t fix… well it might be actually :-))

I can get the dmesg log via the ssh SSH & Web Terminal addon but cannot copy.paste it… tips?

via the ESXi host I can get it… but I do not wish to share publicly…

Just wondering if you see your Z-Wave USB device triggering the logs every X seconds. Don’t need to actually see the logs :slight_smile: Should you ever need to paste long texts you can use Pastebin. (Web service)

Hi Rick, not sure how I can see that in the logs. I do not now how to look that up in linux…

Some more info. I investigated the homeassitant host, the hypervisor host ans discovered some weird errors. I google them and saw some hits but no solution. And all on esxi 6.5 and 7. I have 6.7

Preventive rebooted the ESXi host which had been running for 364 days :-).

But still errors in homeassitant.

Below errors were in esxi before reboot. Now gone in esxi.

e[7m2021-12-06T11:16:42.685Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.685Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.686Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.686Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.693Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.693Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.694Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.694Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.694Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.694Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.694Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.695Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.695Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.695Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.695Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.695Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.695Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.696Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.696Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.696Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.699Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.700Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.700Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.700Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.700Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.700Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.700Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.701Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.701Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.701Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.701Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.701Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.702Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.702Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.702Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.702Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.705Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.706Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.706Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.706Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.706Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.706Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.707Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.707Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.707Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.707Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.707Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.707Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.708Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.708Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.708Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.708Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.709Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.709Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.709Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m
e[7m2021-12-06T11:16:42.710Z cpu3:2097577)WARNING: vmkusb: transfer error [18]: endpt = 0x82, nframes = 1, timeout = 0, interval = 0, max_packet_size = 32, max_frame_size = 32: Failuree[0m

I have this host (6.7u3) running for over 2 years now with usb passed trough all the time:

[root@localhost:~] lsusb
Bus 001 Device 006: ID 0781:5583 SanDisk Corp. Ultra Fit
Bus 001 Device 005: ID 8087:0aaa Intel Corp.
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 0658:0200 Sigma Designs, Inc. Aeotec Z-Stick Gen5 (ZW090) - UZB
Bus 001 Device 002: ID 1cf1:0030 Dresden Elektronik
Bus 001 Device 001: ID 0e0f:8003 VMware, Inc. Root Hub

Never had issues… even with OZW Beta. Until now with ZwaveJs I see the disconnects…
I do see some topics about 6.5:
https://kb.vmware.com/s/article/2147650

But none of them clearly mention 6.7, only 6.5

I do not “dare” to execute the command esxcli system module set -m=vmkusb -e=FALSE on my 6.7 machine since I might get into worse troubles then.

Any advise?

What’s also to add. I have no error at all for my deconz stick…

Did you install the update yet? There was a bug in the old version. I posted about it here and it could be the cause of your errors. I would update and check again if you haven’t already.

Yes I did that that directly

The logs appear to be from the addon. What do the logs set at “silly” level from the zwavejs server itself show? They can give more of a clue as to what’s going on. To get them go through these screens/steps:

Screenshot from 2021-11-04 20-12-36

Screenshot from 2021-12-10 18-39-05

Someone hinted that the USB is disconnecting, but I don’t think it’s that. These are what logs look like for a disconnecting USB, and yours don’t look at all like that.

Yours seem more like a routing issue, like the controller can’t get a command through to a specific node and keeps retrying. You mentioned sometimes there is a lag so maybe it fails the first time and succeeds on the retry? The silly level zwavejs logs should reveal more clues about what command exactly is failing to go through and to which node(s). Try posting them here. Its likely the developers will also need those on your github issue to further diagnose.

1 Like

Hi Tim, thanks for taking the time to help here!

Is the log of the addon different than the one pointed by you? I see the same output from silly in the addon. I think it is just a matter of selecting the right choice in the integration?

I have captured a part of the log this way. I did try to heal around 2 weeks ago but that took 2 days and I stopped it…

Then I tried this morning to heal and took around 3 hours an I stopped it.

Did anyone ever had a “successful” heal?

I don’t use the addon myself (use zwavejs2mqtt in docker), but i think if you set them to “silly” in the addon configuration they’re probably the same. I know the ones generated on that integration configuration screen are the right ones for the zwavejs server though.

Rarely, and there was some discussion on another thread the network heal logic might actually be flawed, and could, at least temporarily, make issues worse.

70 Zwave device network stuck healing for over 24hrs - #20 by HeyImAlex .

I heal individual problem devices vs the whole network now, which for me works better. A node with log errors, or the device responding slowly, are the ones I heal.

I uploaded more logging to the github issue

Does anyone know how to interpret or where to find how to interpret the logging? What does CNTRLR SERIAL DRIVER mean?

This seems to be such an event of mine:

2021-12-11T17:17:41.027Z SERIAL « [CAN]                                                                   (0x18)
2021-12-11T17:17:41.028Z CNTRLR   Failed to execute controller command after 1/3 attempts. Scheduling next try i
                                  n 100 ms.
2021-12-11T17:17:41.130Z SERIAL » 0x0109001348022602250aa4                                            (11 bytes)

Take a look at the zwavejs documentation, specifically for the failed to send error

Failed to send the message after 3 attempts
If this happens, your network is either extremely busy at the moment or something is really wrong with the Z-Wave controller. Try waiting a bit, try removing and re-inserting the stick or try restarting your host.

The documentation overview also explains the difference between controller, driver, etc

I looked at your logs and there seems to be a lot of traffic in your network. Node 61 seems to cause the most issues. What is it, how far is it away, and have you noticed any issues with its performance? I’d try healing it individually (go to the device in the zwavejs integration and select “heal device”).

You might also be “stuck” in a previous heal attempt. Try removing the stick or restart the host as suggested by the documentation, and don’t do another whole network heal. Also, are you polling devices at all? If so try to decrease the frequency of polling. It’s also possible you have an automation stuck in a loop continuously calling a switch to be on/off. I’d check those too.

Update

I see they are working on this under a new issue, hopefully the developers can get to the bottom of it for you

Hi Tim, very much appreciated!!

node 61 is a cover device… I must say it’s an old one, but I have more of them (8 in total)
I successfully healed ID61 now.
image

I have restarted a lot already (also with the recent 2012.12 update and OS update and ESXi reboot, etc.)

May I ask, how did you see ID61 was a troublesome one?

And now you mention it…

I hall 8!) do not send back reliable instant status I have an automation running to poll them all at the same time every minute via zwave_js.refresh_value
I instantly disabled that now and will monitor…

Could this be the culprit? And if how could I overcome? Not to do the poll every minute for all 8 but do them 5-5-5-5 seconds or…?

I chimed in on the git hub issue. It looks like your well on your way to solving this - think the key to your issues is reducing traffic - avoid S0 security, decrease polling, and decrease frequency of power management reports.

After a lot of the fail to send 1/3 attempt message, the next message was something to node 61 so figured that was one causing most of the issues. I usually just look for log entries with the same Node ID over and over. Those are “chatty” nodes that usually are generating too much traffic.

I also mentioned I think you can start a new thread to ask for the best way to do automations to get the polling better, and not do them all at once. I’m not great with Home Assistant automations (use Node Red primarily) so starting a new thread will likely get you a very good response on that.

This smells like ESXi vmkusb issues, give this a shot.

You need to disable the new vmkusb module and load the older legacy USB drivers with this command:

esxcli system module set -m=vmkusb -e=FALSE

Execute this on the ESXi host via SSH.

Hi, thanks, since I stopped polling that error is totally gone… I think and thought this had nothing to do with esxi luckily!

Could just be a symptom of the vmkusb driver.