Status of ZHA integration?

I like to monitor the health (where possible) of all the various components of my HA setup. At times, all of my Zigbee devices “go down” for a period of minutes, and then start talking again. Just now, ZHA had a problem and I had to reload it.

  • Is there a way of determining the health of the ZHA integration ?

If so, I would send myself some sort of alert (probably email), and also mask the alerts for all the individual Zigbee devices.

You’re assuming that ZHA is the root cause of the issue when it could be any number of reasons

Anything from high system cpu utilisation / overlapping the Zigbee channel with a saturated wifi channel etc / location of the Zigbee coordinator / lack of Zigbee routing etc etc

The worrying thing for me is the blackouts that you’re currently experiencing and I would concentrate on fixing that instead of resorting to reloading the integration from time to time

I agree with @_dev_null , but a “dirty hack” I use on two integrations (Tibber, Speedtest) that go unavailable from time to time is to look for stale data and then reload. You could check if you have some ZHA related sensor that don’t go stale for a longer period of time (lux, temperature maybe?) and reload based on that. If the senors report unavailable you can use that too. Here’s an example of how I’ve done it on the Speedtest integration, haven’t tried on addons but might work.

alias: Reload Speedtest if we have stale readings
description: ""
trigger:
  - platform: template
    value_template: >-
      {{ now() - states.sensor.speedtest_ping.last_updated >
      timedelta(minutes=180) }}
condition: []
action:
  - service: homeassistant.reload_config_entry
    data: {}
    target:
      entity_id: sensor.speedtest_ping
  - service: notify.mobile_app_iphone
    data:
      message: speedtest reload
      title: speedtest
mode: single

Because every single Zigbee device becomes unavailable simultaneously, and they all become available again simultaneously, I thought the first spot to look would be the coordinator and ZHA. I had been hoping there was an entity for the coordinator and/or ZHA that could be monitored that would give some indication of health.
Only once have I found that ZHA needed to be reloaded - other times, by the time I notice the flood of “failed” and “now OK again” emails, everything is the picture of health.

The System Monitor on Pi4B usually sits around 15% and peaks around 20%, so I don’t think it would be CPU loading.

Sorry, we’re only working with limited information
So are you saying that you’ve always had to intervene and restart the ZHA integration in order for it to start working again?

I think we need full details of your setup, specs, add ons, versions… Was ZHA ever stable and if so under what configuration

@steerage250 it would also help to enable debug logs

image

I’ve only had ZHA and zigbee running for about a month. My system runs on a Pi4. I have a Nabu Casa SkyConnect coordinator. I have a LoraTap signal extender, and 3 Tradfri Outlets as repeaters. I have about half-a-dozen zigbee sensors.

If any device (repeater or end device) reports “unavailable” for 2 minutes, the system sends me an email. Similarly, when a device that has been unavailable becomes “not unavailable” another email is sent.

I get a flood of emails where all devices report they are unavailable (for which there is a 2 minute delay), then a flood of emails as they all report they are OK again.

The outages happen once every day-or-so, and might last around 5 minutes.

I have enabled debugging ready for the next outage.

sorry - I have only once had to reload ZHA (normally the system just comes good by itself. I have never been present during an outage to do any investigation.

Sounds as if you only have four router devices, which should be able to handle half-a-dozen sensors, but it does suggest that most of the Zigbee traffic is following the same small number of routes - it would be quite possible for a single failure to bring down the whole network.

I don’t suppose someone else in the household could be switching off power to something? :grin: Or does wi-fi traffic (yours or your neighbour’s) change? Someone posted recently that interference from their smart meter brought things to a halt.

Zigbee works best when every light and every socket in the house is a router, so that messages can always find an alternative path. It’s not a good choice for point to point communication between a small number of devices.

The only single point of failure is the coordinator or ZHA - or wireless interference. (The routers all go down at the same time as the end devices).

I can’t imagine the wifi traffic suddenly changing - but always possible I suppose.

My 4 end-devices are connected to 3 different routers.

I think I’ve got a reasonable mesh happening

Perfect, keep us posted :+1:

Oh, I forgot to add… Do you have any add-ons runnings.

Thanks

The ZHA “map” shows all discovered routes between devices, not just the ones it’s actually using. From the colours, it looks as if there is only one error-free connection to the coordinator (the one going downwards), so most of the traffic will be going that way. If you’re happy with the LoraTap, I would add a couple more near the coordinator. (They’re not really “extenders”, by the way, in the wi-fi sense. Their function is to make the mesh denser so there are more possible routes for messages to take.)

You might also try checking how much noise there is on the channel you are using. Finding a channel that doesn’t clash with wi-fi can improve any Zigbee network. There’s a good blog post on choosing a channel here.

With ZHA you can check channels by going to the ZHA card in Devices & Services, clicking on the three dots next to Configure and selecting Download diagnostics. Right at the end of the file you get will be something like this:

    "energy_scan": {
      "11": 73.50699819621309,
      "12": 3.2311094587038967,
      "13": 70.89933442360993,
      "14": 91.05606689948522,
      "15": 85.82097888710312,
      "16": 65.26028270288712,
      "17": 7.659755505061292,
      "18": 43.057636198227904,
      "19": 1.1664179210724432,
      "20": 62.257682586134884,
      "21": 2.509919386096536,
      "22": 3.6632469452765037,
      "23": 4.15070068297423,
      "24": 0.9017765778954641,
      "25": 1.5075412082833717,
      "26": 21.09014924761344

The percentages show the amount of noise on each one, including Zigbee traffic, wi-fi traffic (your neighbour’s as well as yours) and other interference.

Edit: I assume you’re following all the advice about putting the SkyConnect on the end of a long USB cable etc. There’s lots of good stuff here:

That you all for the time you are putting-in to help me with my Zigbee issues.

I will delve deeper into Stiltjack’s last posts re wifi interference and report back.

I have moved my 3 WAPs to 2.4GHz wifi channels 1 and 6, leaving the upper end of the band free. Any signals from neighbours wifi on ch 11 is about 80dB down. My Zigbee coordinator was on ch20 (default?), and I have now moved it to ch 24. This is the latest energy scan - I presume a small number is good ?
“energy_scan”: {
“11”: 93.76433891498253,
“12”: 92.0598007161209,
“13”: 91.05606689948522,
“14”: 87.33047519856483,
“15”: 28.30261646762903,
“16”: 10.914542804728702,
“17”: 59.15797905332195,
“18”: 52.75969252664325,
“19”: 10.914542804728702,
“20”: 2.2107128772756957,
“21”: 2.84844209578687,
“22”: 2.84844209578687,
“23”: 3.2311094587038967,
“24”: 10.914542804728702,
“25”: 12.244260188723507,
“26”: 70.89933442360993

My SkyConnect is on the end of the supplied USB cable (500mm?)

Yes indeed.

You might try a longer cable. Mine is 2m, which allows me to place the dongle well away from any other electrical equipment. It also makes it possible to turn the dongle easily - that sometimes improves the signal. You need to make sure it’s connected to a USB 2.0 port on the Pi (not the blue one which is USB 3.0), although oddly a USB 3.0 extension cable is better because it has more shielding.

If you have anything else connected to the Pi, like an SSD, you might consider using a powered USB hub - the Pi can struggle to deliver enough power through its USB ports. Again, it needs to be USB 2.0.

I agree interference can play a big factor with a ZigBee network

I have a coral stick, wifi router and an rtl433 stick all in close proximity with my Zigbee coordinator. If I move the rtl433 (despite not theoretically being a wireless polluter) stick 5 cms it will take down part of the Zigbee network
As soon as I got the coral it also took down Zigbee, get at least a metres worth of space between these devices

Out of curiosity I’m not really convinced with the skyconnect stick at least my sonoff has an antenna :grinning:

Longer cable coming-up.

Yes, connected to a USB 2 port. Also have a rotating USB drive connected to the Pi.

Finally caught a Zigbee outage, and Stiltjack - I see references to undervoltage being detected !!
I will try to rotating hard drive on a powered hub as you suggested.

I was looking at the list of most used integrations and noticed this

Might be of use to you

It was that very integration that reported the issue in the logs.

I have now configured it to send me an email if it ever happens