All my zwave controls stop working after a while (after a reboot)

all my zwave controls stop working after a while (after a reboot).

The zwave network itself seems to be working, as I still get regular data packets from the energy monitor meter. Nonetheless, any switch (for lights, for example) does not do anything.

I have to reboot in order to get full control of my house again, but that does not last for long (not sure how long does it take until the controls are unresponsive).

I know it is a long shot from this description, but is anyone experiencing the same problem? Any clues of what could be causing this?

FYI: Iā€™m first using the zwave JS addon now i swiched to zwave JS UI Current version: 1.2.0
The problem is the same.
Home Assistant 2022.10.5
Supervisor 2022.10.0
Operating System 9.3
Frontend-versie: 20221010.0 - latest

My English isnā€™t great, so I hope itā€™s clear.
If thereā€™s anything I can do Iā€™d like to hear about it.

  1. Z-Wave Stick and FW Version?
  2. Stick on ~3ā€™ USB Cable?
  3. Do they ā€˜allā€™ go down at the same time?
  4. Are you able to ping them in this state?
  5. Anything in the zwave-js logs?

I just had a similar problem, however most of my ~80 devices would fail on reboot and zwave-js UI would become almost unresponsive (very slow). In my case it was a Zooz ZEN20 Version 3 power strip failing that brought the entire network down.

My Zwave stick is Aeotec Z-Stick Gen5 ZW090 Firmware 1.0
the USB stick is behind a USB extension cable
After a reboot everything works, but then after a while everything stops. it looks like this is happening all at once.
This stopping seems to happen randomly.
To find out, I made an automation that switches an empty switch every 2 minutes, then I can check the log when things go wrong.
Because I donā€™t know when it will go wrong, I havenā€™t seen anything in the log yet.
What I do see is that on the network image all connections are gone. there are only devices.

The strange thing is, everything just worked for half a year (in Z-wave JS)
i didnā€™t change anything in z-wave, just the updates
then the problem started. in the end I restored an older backup and converted everything to Z-wave JS UI and that also worked fine for a week until I did a host restart. then it started again.

Pinging doesnā€™t work either.

yesterday it was enough to restart the zwave JS UI plugin, today not, also a Home assistant restart does not work, only a host reboot.

what setup for OS and device? Using a VM or dedicated machine?
You could check some of your devices in network graph and use ā€˜check healthā€™ on few of them. If heath is fine, then its on other layer and could it be comms between ha and JS server (or some other places) but that needs to be checked step by step

i am running Home Assistant Operating System on a pi4 (dedicated machine)
If the problem comes back Iā€™ll try what you say.

I have a similar HA setup, running on a Raspberry Pi3B+ with a HUSBZB interface for Zwave and ZHA devices and using ZWave-JS UI. Even using MQTT to interact with LinkTap irrigation valves.

Occasionally and unpredictably, everything becomes non-responsive. Perhaps once a week this happens. The UI still works, but switching devices on/off has no effect. Sensors donā€™t do anything, etc. Restarting by using the RESTART button in the UI has no effect. But turning the power off and back on restores everything to a working state again.

Iā€™ve looked at the logs after such an event, and there are lots of errors but they seem to be a result of the lock-up rather than a cause. When a lockup occurs, one thing I noticed is that the light on the RPi stays as a solid green, rather than flashing as it normally does. This may indicate that the RPi is running out of memory for some reason - e.g., as discussed in pi 3 - Raspberry Pi Green LED Always On - Raspberry Pi Stack Exchange So perhaps this is a Pi problemā€¦?

I have 100+ devices, and thereā€™s now 19 integrations ā€“ I may have been too willing to let HA add a new integration when it discovered a new device (Roku, Onvif, Sonos, Vizio, Synology, Cast, DLNA, and even a ā€œiBeacon Trackerā€ which just appeared a week or two ago).

Seems like it would be a huge job to sort out whatā€™s happening with such an intermittent failure. Iā€™ve been thinking the easiest solution might be a (mechanical) timer to turn the power off/on every day at 3am and see if that has any effect.

Hi, I now see that everything has stopped working again at 11am today.
I did the test with the health check, with the following results.

I see this during those tests in the log

2022-10-30 14:53:24.074 INFO Z-WAVE: Node 42: value updated: 50-0-value-66049 0 => 1.997
2022-10-30 14:53:25.224 INFO Z-WAVE: Node 38: value updated: 50-0-value-66049 0 => 0.974
2022-10-30 14:53:26.074 INFO Z-WAVE: Node 42: value updated: 50-0-value-66049 1.997 => 0
2022-10-30 14:53:33.223 INFO Z-WAVE: Node 38: value updated: 50-0-value-66049 0.974 => 0
2022-10-30T13:53:33.528Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30T13:53:33.987Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30T13:53:33.998Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30T13:53:34.032Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30T13:53:34.053Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30T13:53:34.057Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30T13:53:34.066Z DRIVER Dropping message because it could not be deserialized: The command class Inclusion Controller is not implemented (ZW0303)
2022-10-30 14:53:39.074 INFO Z-WAVE: Node 42: value updated: 50-0-value-66049 0 => 1.663
2022-10-30 14:53:40.071 INFO Z-WAVE: Node 42: value updated: 50-0-value-66049 1.663 => 0
2022-10-30 14:53:42.225 INFO Z-WAVE: Node 38: value updated: 50-0-value-66049 0 => 1.31
2022-10-30 14:53:44.902 INFO Z-WAVE: Node 41: value updated: 50-0-value-66049 0 => 0.927

This one belongs to the last picture

2022-10-30 14:59:53.769 ERROR Z-WAVE-SERVER: Timeout while waiting for an ACK from the controller (ZW0200)
ZWaveError: Timeout while waiting for an ACK from the controller (ZW0200)
at Driver.sendMessage (/opt/node_modules/zwave-js/src/lib/driver/Driver.ts:3990:23)
at Driver.sendCommandInternal (/opt/node_modules/zwave-js/src/lib/driver/Driver.ts:4181:28)
at Driver.sendCommand (/opt/node_modules/zwave-js/src/lib/driver/Driver.ts:4296:15)
at BinarySwitchCCAPI.set (/opt/node_modules/@zwave-js/cc/src/cc/BinarySwitchCC.ts:131:24)
at Proxy.BinarySwitchCCAPI. (/opt/node_modules/@zwave-js/cc/src/cc/BinarySwitchCC.ts:146:29)
at ZWaveNode.setValue (/opt/node_modules/zwave-js/src/lib/node/Node.ts:932:29)
at NodeMessageHandler.handle (/opt/node_modules/@zwave-js/server/dist/lib/node/message_handler.js:23:38)
at Object.node (/opt/node_modules/@zwave-js/server/dist/lib/server.js:40:96)
at Client.receiveMessage (/opt/node_modules/@zwave-js/server/dist/lib/server.js:105:99)
at WebSocket. (/opt/node_modules/@zwave-js/server/dist/lib/server.js:49:45)

My HA locked up again sometime during the night. I had to power on/off to get it working again. After it restarted I looked at the logs. The first few entries from the reboot are below. Iā€™m wondering if the first entry about the sqlite3 database is indicating a problem, or if thatā€™s just expected because I had to just turn off the power to regain control. Then it seems that everything is ā€œtaking over 10 secondsā€. But eventually itā€™s all working again.

Anybody see something in here that might indicate whatā€™s causing the lockups?

/Jack

Log from after a power cycle to recover HA operation:

2022-10-31 07:22:49.881 WARNING (Recorder) [homeassistant.components.recorder.util] The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly

2022-10-31 07:22:50.080 WARNING (Recorder) [homeassistant.components.recorder.util] Ended unfinished session (id=68 from 2022-10-29 17:50:10.046258)

2022-10-31 07:23:37.773 WARNING (MainThread) [homeassistant.setup] Setup of input_boolean is taking over 10 seconds.

2022-10-31 07:23:38.328 WARNING (MainThread) [homeassistant.setup] Setup of zone is taking over 10 seconds.

2022-10-31 07:23:38.332 WARNING (MainThread) [homeassistant.setup] Setup of input_button is taking over 10 seconds.

2022-10-31 07:23:38.337 WARNING (MainThread) [homeassistant.setup] Setup of input_text is taking over 10 seconds.

2022-10-31 07:23:38.341 WARNING (MainThread) [homeassistant.setup] Setup of schedule is taking over 10 seconds.

2022-10-31 07:23:38.345 WARNING (MainThread) [homeassistant.setup] Setup of timer is taking over 10 seconds.

2022-10-31 07:23:38.349 WARNING (MainThread) [homeassistant.setup] Setup of group is taking over 10 seconds.

2022-10-31 07:23:38.353 WARNING (MainThread) [homeassistant.setup] Setup of counter is taking over 10 seconds.

2022-10-31 07:23:38.357 WARNING (MainThread) [homeassistant.setup] Setup of input_datetime is taking over 10 seconds.

2022-10-31 07:23:38.361 WARNING (MainThread) [homeassistant.setup] Setup of input_select is taking over 10 seconds.

2022-10-31 07:23:38.365 WARNING (MainThread) [homeassistant.components.scene] Setup of scene platform homeassistant is taking over 10 seconds.

2022-10-31 07:23:38.368 WARNING (MainThread) [homeassistant.setup] Setup of input_number is taking over 10 seconds.

2022-10-31 07:23:38.372 WARNING (MainThread) [homeassistant.setup] Setup of system_health is taking over 10 seconds.

2022-10-31 07:23:38.377 WARNING (MainThread) [homeassistant.setup] Setup of media_source is taking over 10 seconds.

2022-10-31 07:23:38.381 WARNING (MainThread) [homeassistant.setup] Setup of logbook is taking over 10 seconds.

2022-10-31 07:23:38.385 WARNING (MainThread) [homeassistant.setup] Setup of application_credentials is taking over 10 seconds.

2022-10-31 07:23:53.520 WARNING (MainThread) [homeassistant.setup] Setup of script is taking over 10 seconds.

2022-10-31 07:24:07.847 WARNING (MainThread) [homeassistant.config_entries] Config entry ā€˜Canon MG6200 series-3ā€™ for ipp integration not ready yet: Invalid response from API: Timeout occurred while connecting to IPP server.; Retrying in background

2022-10-31 07:24:08.404 WARNING (MainThread) [homeassistant.config_entries] Config entry ā€˜Z-Wave JSā€™ for zwave_js integration not ready yet: Failed to connect: ; Retrying in background

2022-10-31 07:24:24.743 WARNING (MainThread) [homeassistant.components.weather] Setup of weather platform met is taking over 10 seconds.

does seem that your zwave network is not really stable and stops routing commands. Not something easy to resolve. I have had some devices breaking up and causing similar instablility. The USB cable could also have a drop on power delivery sometimes (especially on a raspberry).
Not sure where i would start here.
Could you show your network graph ? And are there any ā€˜neo coolcamā€™ devices in your network? Also, lots of power measuring devices on a frequent reporting schedule?

Not a solution, but a fix. My sister had a similar issue. An automation to restart Zwave JS every day at 3AM and it is no longer an issue.

more a workaround :slight_smile:
for larger zwave networks and battery operated devices, this is not really advicable

My current theory is that there is a memory problem. My RPi is a 1GB version, and HA memory utilization, as shown by the graph under System/Hardware, runs at around 85-89%. So any kind of occasional activity that uses more memory might drive the Pi into ā€œswappingā€ behavior using the SD Card, which leads to the symptom I see where the yellow LED is constantly on. Perhaps thereā€™s some kind of ā€œmemory leakā€ in some code of some integration that triggers the failure when something happens, or the system runs long enough to use up all the memory. Not easy to track such things downā€¦

Also, itā€™s not just ZWave that stops working. Everything stops, including Zigbee and MQTT interactions. Canā€™t even connect to the HA GUI any more. So the only fix is to cycle power.

I suspect the real fix will be to replace the RPi with one that has more memory. Iā€™ve been intending to do that, but waiting for the hardware supply issues to get resolved so you can actually buy RPis again at normal prices.

Meanwhile, the 3AM reset looks like the way to goā€¦

The entire installation has been running without any problems for years, now there is of course always a moment when a malfunction starts.
I have a lot of neo coolcam devices.


In any case, I didnā€™t consciously set the power measuring devices to send everything at the same time

I had already thought about that too, but that doesnā€™t solve it for me, because sometimes I can restart the addon, but sometimes I have to reboot the whole pi.

For me the NEO devices ( only had the wall plug switches) caused instability for me. Also a few broke down after year and a bit and causing even more network shit. maybe i just had a few bad ones and you were more lucky :slight_smile:

o o i hope so :grimacing:

I also recently added the Zooz power strip. It worked fine for a while but when I installed 2022.10 thatā€™s when my zwave network became unresponsive. Did you exclude it entirely? Iā€™ve also read the power meters drive a lot of traffic, so with 5 going from one device it could be overloading. I have about 40 zwave devices. I may try to remove it from the recorder first. Iā€™ll report back if that or full exclusion fixes the issue.

I have 3; version 3 power strips. Beside the one that failed totally, there is a HUGE bug (Zooz acknowledged and is trying to fix)

The bug is these things will simply 'Lock Up" and stop working (buttons stop working and communication stops) You have to power cycle them to get them back on-line and working for a while.

When I reported it soon after I received them (these 3 were discounted due to my Version 2 failed units), they offered to take them back for store credit or wait until a fix is found. Well, Iā€™m thinking now I made the wrong choice. I have a feeling itā€™s a hardware issue and canā€™t be fixed.

They also wanted me to pay warranty repair shipping for the one that totally failed, until I complained.

These strips, (version 2 and 3) are the worst Zooz product Iā€™ve ever had to deal with, and thatā€™s saying a lot since their Door Sensors are incredibly fragile.

/rant

1 Like

Did this issue get any attention or fixed. I read something about a ā€œwork-aroundā€ but that was just a reboot/restart. Has anyone from the Nabu Casa or HA team looked into this.

Now Iā€™ve got devices that suddenly over the last week dropped and became unknown (no manufacturer name, no model #, but NOT dead.

The ZwaveJS UI is behaving very irratic - When a node is included, the add-on freezes for about 20 seconds at the screen that a new node was added.

Nothing has changed in months on my system, I have not added anything new or changed any settings, as Iā€™ve stated a number of times, Iā€™m content with what I have, so therefore there was NO user error so to speak.

All three of my Schlage door deadbolts have turn to ā€œUnknownā€ and they do not respond to any commands. 3 Aeotech extenders have returned ā€œUnknownā€ however, when re-interview nothing happens; Inclusion mode, nothing happens; Exclusion mode, the device is removed.

Iā€™m at a loss of where to go from here.

Edit: Also, when I select the Zwave JS UI to inclusion, walk through the steps of inclusion, the system seems to continually go in a loop (Only with the Schlage deadbolt) and restarts the include mode over and over. It does this regardless of if I select any type of security or none.

Something is wrong!!!

Iā€™m also seeing similar issues. out of the blue a large majority of my devices show up dead. not sure what is going onā€¦