Help: Z-Wave JS UI Addon breaking my system

Yes, power saving is not enabled in my host (/sys/module/usbcore/parameters/autosuspend contains a -1). I guess if that was the issue, anyway, I would see the controller not working after some time, rather than issues at any time, when I try to stop the addon.

And by crashing the system I indeed mean the HAOS in the VM Host (i.e. the whole VM freezes and I need to kill the process)

Maybe try to switch to QEMU instead of VirtualBox?

I don’t agree. The whole point of virtual stuff is sharing resources. If memory or disk is bad it’s all over the place. Just check smart on the drive and perform a ramdisk check via a live cd/usb.
Main thing of trouble shooting and finding the culprit is to cross off possible trouble makers.

I’ve never used QEMU. I wonder how it works. Would I be able to migrate my whole assistant to another emulator through a backup?

If you mean the HA backup, yes that’s the way to go. Make a backup, setup the new VM in QEMU, restore the HA backup.

OK, I’ll play by the book here, you are right.

My BIOS has support for memtest and I ran it. Ram is alright. I doubt a ram issue would only affect one thing and always the same thing, because of how ram gets allocated… but still worth the try as you pointed.

I’ve also smarted the physical volume of my file system and it’s healthy and with no logs of error. I have a Virtual Volume created on top (LVM gives me a lot of flexibility). Again, since the virtual HDs (images) of the VM are fully allocated in their own fille, I think a Hard Drive issue would not affect equally two different Virtual Machines, but it was worth the try.

It’s worth noting that this server runs a lot of other things, not only Home Assistant, so I keep it pretty healthy and I think I would have noticed HD or RAM issues.

Worth trying! investigating now :slight_smile:

Follow-up: Looks like it’s not “as easy as it seems”. My system seems to slow down a lot since I’ve installed qemu, not sure if just because qemu itself or because something I did wrong (suspicious about how I’m bridging the network). Also, my Home Assistant OS is stuck loading at " A start job is running for HAOS swap". And when it reaches there, the thing is the VM “pauses” due to an I/O error… so not sure how to proceed here.

I think I’m going to put a hold and wait until I get my new ZStick and I upgrade to the newest firmware there, or even to 700 series… and if that doesn’t work I’ll come back to trying a different emulator.

Just to make sure. You did evaluate the the S.M.A.R.T. attributes of the drive and the lifespan?
Which version of HA are you running (HAOS or…)

I don’t have that many options for this Hard Drive, unfortunately. I just checked that there were no errors logged, no critical errors and that health assessment passed.

I also did a fsck on the file system… that took longer and I don’t have the output, but the SMART output is the following:

root@madre:~# smartctl -l error  /dev/nvme0n1p3

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-28-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
root@madre:~# smartctl -H  /dev/nvme0n1p3

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-28-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
root@madre:~# smartctl -a  /dev/nvme0n1p3

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-28-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDC PC SN530 SDBPNPZ-256G-1002
Serial Number:                      2135FU474613
Firmware Version:                   21106000
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 256.060.514.304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256.060.514.304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 4a49595984
Local Time is:                      Tue Aug 13 09:40:07 2024 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.50W    1.80W       -    0  0  0  0        0       0
 1 +     2.40W    1.60W       -    0  0  0  0        0       0
 2 +     1.90W    1.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     3900   11000
 4 -   0.0050W       -        -    4  4  4  4     5000   39000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    3.807.215 [1,94 TB]
Data Units Written:                 13.910.073 [7,12 TB]
Host Read Commands:                 25.010.185
Host Write Commands:                416.061.161
Controller Busy Time:               2.025
Power Cycles:                       111
Power On Hours:                     6.408
Unsafe Shutdowns:                   63
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

I’m running Home Assistant OS 12.4 on top of a VirtualBox VM. My original HAOS image was way earlier (HAOS 7.2) and it’s been getting updated in-place. But I also tried with a fresh 12.4 HAOS image (the one available right now in the HA website) and the issue was exactly the same.

I’ve been getting the home assistant OS logs. I wonder if accesing the Home Assistant docker would give me additional logs that may be useful.

What do you think?

The container? Doubtful.

Yup. Basically I would ssh into Home Assistant OS (using the Advanced SSH addon) and then from there run docker exec to run bash inside the docker, so I would be effectively ssh-ing into the Home Assistant itself. Would I find anything there?

Is there another docker container specific to the Zwave JS UI Addon, that I may be able to ssh into and get additional data?

Every addon is a container, yes. But you can see the container logs under Settings/System/Logs, just switch between containers with the dropdown in the top right corner.

I have fresh updates.

I finally got my Zstick Gen5+ and I have migrated from Gen5 to Gen5+, i.e. to firmware 5.2.

After that, I still get the main issue (whenever I try to stop the ZWave JS UI, it makes the whole HAOS crash) but I was able to update ZWave JS UI to the last version (by setting it not to boot at init, rebooting, and then updating before initing) and I’m not getting the old error of an invalid discovery/type in the s6-overlay (whatever that means).

So at least I got partial success.

Once I was in 5.2 I tried plugging the USB stick directly into my server (I had it through a hub, because for some reason with the Gen5 I was having troubles when plugging git directly). I had hopes that this may fix the HAOS freeze when stopping ZWave JS UI, but it did not help.

Next step is trying to migrate to Gen7 and see if that changes anything.

BTW something I may have never mentioned: If I unplug the USB stick before stopping the ZWave JS UI addon, then it does not crash. It has to do with how ZWave JS UI is trying to disconnect from the stick.

It is very unlikely that migrating to a 700 series stick will work any better. At this point, it’s pretty certain the issue lies somewhere with your VM setup. Have you tried setting up a new HAOS VM in QEMU and restoring the HA backup into that?

1 Like

I did try but did not succeed, I did not manage to run HAOS on QEMU on Debian, for some reason. I think it had to do with how I was creating the network bridge.

I did try on a fresh new HAOS image on VirtualBox and, even without reloading the backup, I was having the same issue.

I think I just solved it by enabling USB 3.0 on Virtual Box (I had it as USB 2.0 before). I’m pretty sure in the past I had to set it to USB 2.0 for some reason (maybe related to Aeotec ZStick gen5 vs gen 5+?)

So, to summarise in case someone ever gets here

Error 1 - After updating ZWave JS UI, the addon will not start
I was getting the following error

s6-rc-compile: fatal: invalid /etc/s6-overlay/s6-rc.d/discovery/type: must be oneshot, longrun, or bundle
s6-rc: fatal: unable to take locks: No such file or directory
s6-linux-init-shutdownd: warning: /run/s6/basedir/scripts/rc.shutdown exited 111
Fi

I sorted it out by

  • Updating Aeotec ZWave Stick to Gen5+ / Firmware 1.2. (Not sure if that may have been solved by some of the Home Assistant Core updates, but I don’t think so)

Error 2 - HAOS crash when trying to stop ZWave JS UI / disconnect from the ZStick
Whenever I tried to stop ZWave JS UI addon (including any attempt to reboot the system or to update the ZWave Addon), the whole HAOS would crash, no errors in any of the logs.

I sorted it out by

  • Upgrading Aeotec ZStick Gen 5 to Gen5+ (firmware 1.1 to firmware 1.2)
  • Setting up the USB controller as USB 3.0 on VirtualBox (I had USB 2.0 before, which I think was a requirement at some point, maybe only for firmware 1.1)

Hope it helps someone in the future, and thanks everyone for your ideas and suggestions, I don’t think I would have been able to debug without you (I learned a lot in this process).

1 Like

You may have done it to avoid usb 3 interference on the stick. (USB3 is known to interfere with zigbee and some ZWave sticks.) this is where the recommendation to use a short extension or a USB2 hub to get the stick off the USB3 bus comes from.

Maybe… I did indeed have a USB 2.0 hub before for some reason, and maybe on that hub things did not work if I set it up as USB 3.0.

But at this point everything seems to be working fine again.

I wonder, since I already got a Gen7 ZStick. Is there any value in switching to it now that I have everything working?

(totally personal opinion)

Theres a lot of known issues with the 700 series sticks.

So much early on I reverted from a 700 back to that very 500 you have (on current firmware) and it works great.

You’ve been through the wringer, I’d sit and wait a bit before intentionally changing anything else

2 Likes