I like to monitor the health (where possible) of all the various components of my HA setup. At times, all of my Zigbee devices “go down” for a period of minutes, and then start talking again. Just now, ZHA had a problem and I had to reload it.
Is there a way of determining the health of the ZHA integration ?
If so, I would send myself some sort of alert (probably email), and also mask the alerts for all the individual Zigbee devices.
You’re assuming that ZHA is the root cause of the issue when it could be any number of reasons
Anything from high system cpu utilisation / overlapping the Zigbee channel with a saturated wifi channel etc / location of the Zigbee coordinator / lack of Zigbee routing etc etc
The worrying thing for me is the blackouts that you’re currently experiencing and I would concentrate on fixing that instead of resorting to reloading the integration from time to time
I agree with @_dev_null , but a “dirty hack” I use on two integrations (Tibber, Speedtest) that go unavailable from time to time is to look for stale data and then reload. You could check if you have some ZHA related sensor that don’t go stale for a longer period of time (lux, temperature maybe?) and reload based on that. If the senors report unavailable you can use that too. Here’s an example of how I’ve done it on the Speedtest integration, haven’t tried on addons but might work.
alias: Reload Speedtest if we have stale readings
description: ""
trigger:
- platform: template
value_template: >-
{{ now() - states.sensor.speedtest_ping.last_updated >
timedelta(minutes=180) }}
condition: []
action:
- service: homeassistant.reload_config_entry
data: {}
target:
entity_id: sensor.speedtest_ping
- service: notify.mobile_app_iphone
data:
message: speedtest reload
title: speedtest
mode: single
Because every single Zigbee device becomes unavailable simultaneously, and they all become available again simultaneously, I thought the first spot to look would be the coordinator and ZHA. I had been hoping there was an entity for the coordinator and/or ZHA that could be monitored that would give some indication of health.
Only once have I found that ZHA needed to be reloaded - other times, by the time I notice the flood of “failed” and “now OK again” emails, everything is the picture of health.
The System Monitor on Pi4B usually sits around 15% and peaks around 20%, so I don’t think it would be CPU loading.
Sorry, we’re only working with limited information
So are you saying that you’ve always had to intervene and restart the ZHA integration in order for it to start working again?
I think we need full details of your setup, specs, add ons, versions… Was ZHA ever stable and if so under what configuration
I’ve only had ZHA and zigbee running for about a month. My system runs on a Pi4. I have a Nabu Casa SkyConnect coordinator. I have a LoraTap signal extender, and 3 Tradfri Outlets as repeaters. I have about half-a-dozen zigbee sensors.
If any device (repeater or end device) reports “unavailable” for 2 minutes, the system sends me an email. Similarly, when a device that has been unavailable becomes “not unavailable” another email is sent.
I get a flood of emails where all devices report they are unavailable (for which there is a 2 minute delay), then a flood of emails as they all report they are OK again.
The outages happen once every day-or-so, and might last around 5 minutes.
I have enabled debugging ready for the next outage.
sorry - I have only once had to reload ZHA (normally the system just comes good by itself. I have never been present during an outage to do any investigation.
Sounds as if you only have four router devices, which should be able to handle half-a-dozen sensors, but it does suggest that most of the Zigbee traffic is following the same small number of routes - it would be quite possible for a single failure to bring down the whole network.
I don’t suppose someone else in the household could be switching off power to something? Or does wi-fi traffic (yours or your neighbour’s) change? Someone posted recently that interference from their smart meter brought things to a halt.
Zigbee works best when every light and every socket in the house is a router, so that messages can always find an alternative path. It’s not a good choice for point to point communication between a small number of devices.
The ZHA “map” shows all discovered routes between devices, not just the ones it’s actually using. From the colours, it looks as if there is only one error-free connection to the coordinator (the one going downwards), so most of the traffic will be going that way. If you’re happy with the LoraTap, I would add a couple more near the coordinator. (They’re not really “extenders”, by the way, in the wi-fi sense. Their function is to make the mesh denser so there are more possible routes for messages to take.)
You might also try checking how much noise there is on the channel you are using. Finding a channel that doesn’t clash with wi-fi can improve any Zigbee network. There’s a good blog post on choosing a channel here.
With ZHA you can check channels by going to the ZHA card in Devices & Services, clicking on the three dots next to Configure and selecting Download diagnostics. Right at the end of the file you get will be something like this:
The percentages show the amount of noise on each one, including Zigbee traffic, wi-fi traffic (your neighbour’s as well as yours) and other interference.
Edit: I assume you’re following all the advice about putting the SkyConnect on the end of a long USB cable etc. There’s lots of good stuff here:
I have moved my 3 WAPs to 2.4GHz wifi channels 1 and 6, leaving the upper end of the band free. Any signals from neighbours wifi on ch 11 is about 80dB down. My Zigbee coordinator was on ch20 (default?), and I have now moved it to ch 24. This is the latest energy scan - I presume a small number is good ?
“energy_scan”: {
“11”: 93.76433891498253,
“12”: 92.0598007161209,
“13”: 91.05606689948522,
“14”: 87.33047519856483,
“15”: 28.30261646762903,
“16”: 10.914542804728702,
“17”: 59.15797905332195,
“18”: 52.75969252664325,
“19”: 10.914542804728702,
“20”: 2.2107128772756957,
“21”: 2.84844209578687,
“22”: 2.84844209578687,
“23”: 3.2311094587038967,
“24”: 10.914542804728702,
“25”: 12.244260188723507,
“26”: 70.89933442360993
My SkyConnect is on the end of the supplied USB cable (500mm?)
You might try a longer cable. Mine is 2m, which allows me to place the dongle well away from any other electrical equipment. It also makes it possible to turn the dongle easily - that sometimes improves the signal. You need to make sure it’s connected to a USB 2.0 port on the Pi (not the blue one which is USB 3.0), although oddly a USB 3.0 extension cable is better because it has more shielding.
If you have anything else connected to the Pi, like an SSD, you might consider using a powered USB hub - the Pi can struggle to deliver enough power through its USB ports. Again, it needs to be USB 2.0.
I agree interference can play a big factor with a ZigBee network
I have a coral stick, wifi router and an rtl433 stick all in close proximity with my Zigbee coordinator. If I move the rtl433 (despite not theoretically being a wireless polluter) stick 5 cms it will take down part of the Zigbee network
As soon as I got the coral it also took down Zigbee, get at least a metres worth of space between these devices
Out of curiosity I’m not really convinced with the skyconnect stick at least my sonoff has an antenna
Finally caught a Zigbee outage, and Stiltjack - I see references to undervoltage being detected !!
I will try to rotating hard drive on a powered hub as you suggested.