OZW_Log stops logging after a few hours, Z-Wave devices stop reporting to Home Assistant

ozwcp
zwave
Tags: #<Tag:0x00007f1b93b35f50> #<Tag:0x00007f1b93b35c80>

#1

Hi All,

I’ve been using Home-Assistant and Z-Wave for about 6 months now. I’m running a Raspberry Pi 3B with:

  • Raspian Stretch OS
  • Home assistant (0.84.6 current) in a python virtualenv
  • Razberry daughter card plugged into the GPIO header.

I have five connected z-wave devices so far, four Aeotec MultiSensors (three mains powered, one on battery) and one Aeotec SmartSwitch.
I also have a number of other components in the mix including:

  • A systemmonitor getting the memory and disk usage of the pi.
  • A bom sensor getting weather data from the internet.
  • A custom udp sensor listening for data from an Arduino.

By and large the system has been reliable, the most challenging problem was switching the SQLlite DB to MySQL and troubleshooting some referential integrity issues. Aside from that, everything works fine.

Execpt for this problem…

Upon a fresh boot of the Pi, home-assistant starts up OK, all the sensors including z-wave devices and others send data in a timely fashion. For the z wave devices, the battery Multisensor reports in half hour intervals, while the powered sensors report every thirty seconds.

I check the home-assistant GUI to see all is well, then I go to sleep (or work).

Unfortunately, when I get back home to check it, I almost always find that after a few hours of successful operation, Home-Assistant has stopped receiving data from ALL z-wave devices; the charts have flatlined and the last sensor report from any z-wave device was something like “5 hours ago”.

Checking the OZW_Log, I find that the timestamp of the last line of the OZW_Log corresponds exactly with the time that home-assistant stopped receiving z-wave sensor data.
There is nothing in the OZW_Log I can see that would point towards a crash or error. Here are the last several lines of the log before the last outage:

2019-01-08 23:53:07.650 Info, Node006, Received SensorMultiLevel report from node 6, instance 1, Ultraviolet: value=0
2019-01-08 23:53:07.650 Detail, Node006, Refreshed Value: old value=0, new value=0, type=decimal
2019-01-08 23:53:07.650 Detail, Node006, Changes to this value are not verified
2019-01-08 23:53:07.650 Detail, Node006, Notification: ValueChanged //LAST LINE

And here’s the first few lines after a heal:
2019-01-09 19:49:25.283 Detail, Node001, Queuing (Controller) Request Node Neighbor Update
2019-01-09 19:49:25.304 Detail, Node004, Queuing (Controller) Request Node Neighbor Update
2019-01-09 19:49:25.304 Detail, Node005, Queuing (Controller) Request Node Neighbor Update
2019-01-09 19:49:25.304 Detail, Node006, Queuing (Controller) Request Node Neighbor Update
2019-01-09 19:49:25.305 Detail, Node007, Queuing (Controller) Request Node Neighbor Update
2019-01-09 19:49:25.305 Detail, Node008, Queuing (Controller) Request Node Neighbor Update
2019-01-09 19:49:25.310 Info, Requesting Neighbor Update for node 1
]
Note that all other devices continue to display in Home assistant as normal.

See the graph below for a picture of what’s going on.

`
This graph has all the temperature sensors that I currently have; The “Room” z-wave senors were unresponsive since 08/01/2019 23:53:07 until I did a heal of the network through the homeassistant GUI at 2019-01-09 19:49:25. Not that the Front Yard sensor restored a little later, presumably because it is on battery and only wakes up every 30 minutes. The BOM sensor I included works from an API call to a website and continued to function while the z-wave sensors were down.

I also noticed that the Home Assistant would sometimes restore the sensors before I could check again. I later discovered this was due to the auto-heal function which was enabled by default. I disabled it.

I have also tried setting the polling interval to 120000ms (2 minutes) to attempt to automatically restore the devices if they didn’t report themselves. This has not changed the behaviour.

Here’s my z-wave config out of home-assistant at this present time. Like I said, there’s nothing wrong with the configuration of the sensors per-se, just the fact that they all stop at the same time.

zwave:
  usb_path: /dev/ttyAMA0
  network_key: "Secret"
  #config_path: /srv/homeassistant/lib/python3.5/site-packages/python_openzwave/ozw_config
  debug: False
  polling_interval: 120000
  autoheal: False

Also relevant, the SQL config. I have it purging stuff older than14 days to keep the DB manageable within the limited resources of the Pi

 recorder:
 purge_keep_days: 14
 db_url: mysql://hass:[email protected]:3307/hass_db?charset=utf8

The fact that they all fail at once implies there’s either an error in OZW which is not writing to the logs, or a hardware issue with the Razberry board itself. I do not know how to begin troubleshooting these devices short of ordering replacement parts or factory resetting the whole system, something I am not yet prepared to do.

So I’m currently out of options. I am sure there must be some way to dig into OZW or the Razberry controller, but I have not discovered a method to do that yet.

Right now I’m documenting the timestamps between when OZW_Starts and stops logging, hopefully trying to observe any common conditions that might be causing the issue, but nothing stands out as of yet.

I’ll post my findings here as I troubleshoot.

In the meantime, does anyone have any ideas what could be going on of where else I could check?

Thanks in advance :slight_smile:


#2

What do you have to do to get it working again? Restart Home Assistant or restart the whole Pi? Does the /dev/ttyAMA0 still exist? Any debg lines in dmesg?


#3

Just about every time someone reports issues like this it ends up being the power supply.


#4

@walt
The quickest way to get them back online is to do a heal of the network in Z-Wave config, however in practice this only keeps them up for about an hour or so. Doing a sudo reboot on the pi from ssh will reset everything; as of today it’s been going strong for 23 hours since a full reboot. I haven’t tried restarting Home Assistant only, but my hunch is it won’t work. Will test next time it fails.

Also checked dmesg but nothing notable; will be sure to check output when the sensors fail again. Will also check /dev/ttyAMA0 next time it fails.

@firstof9
Not ruling out a power supply issue, hoping to troubleshoot software side before buying more parts. Do you have any references to these other issues that I could check out?


#5

A quick search of the forums and you’ll find them, keywords: “Zwave raspberry power supply”


#6

Ok, so after a sudo reboot it went for 26h57m58s before OZW stopped.

I checked dmesg; the only line I could see that relates is the intial config message:
[ 0.830837] 3f201000.serial: ttyAMA0 at MMIO 0x3f201000 (irq = 87, base_baud = 0) is a PL011 rev2

Also, ttyAMA0 is still present in /dev

I just tried a restart via the HA GUI, so here are the preliminary findings from three restoration methods:

Heal Network
From HA GUI, go Configuration->Z-Wave and click “Heal Network”.
This will cause OZW_Log to resume without clearing the past logs. The sensors will restore.
From two attempts, the time alive was 1h16m14s, then 1h09m00s.

Sudo Reboot
SSH into the pi and reboot the whole device.
This will cause OZW_Log to start fresh.
From one attempt, the time alive was 26h57m58

HA Server restart
From the HA GUI, go to Configuration -> General and click “Restart”.
This will cause OZW_Log to start fresh.
From one attempt, the time alive is 10m50s

I’m thinking I need to look for any low level logs around serial communication. Other than that, I’m going to make another image with the original z-way software and no home-assistant to try and narrow down the problem.


#7

Ok, so what I did actually was:
Created a new raspbian stretch image on a different SD card,
Installed home assistant and followed the Razberry instructions exactly.

I then copied my Z-wave config from a backup, also the customize.yaml file to place in the folder of the fresh install.

I got my Z-wave devices displayed again, although two are now named “Unknown Node 6” and “Unknown Node 8”

Nonetheless, the sensors have been reporting for a full 36 hours now with no interruptions, so this suggests that there’s either misconfiguration/bug in the original system, or the hardware is underpowered for what I’m trying to do with it.

For now, I will go back to the original SD card and start retracing my steps.