Best way to troubleshoot ZHA losing all devices?

@fleskefjes is right. Your two routers and the coordinator will be handling messages to all 18 battery-powered devices. They may be capable of that in theory, but there will be no alternative paths for messages to take - in the case of interference, for example.

Normally you would start by building a robust network of routers and add the battery devices afterwards.

At the moment you might expect to see errors relating to message delivery failures after multiple attempts, or to timeouts.

If you want to see how much interference there could be on the channel you’re using you can download diagnostics from the SkyConnect. Towards the end of the report will be a bit like this:

    "energy_scan": {
      "11": 52.75969252664325,
      "12": 88.70042934643088,
      "13": 84.164247274957,
      "14": 15.32285793082191,
      "15": 52.75969252664325,
      "16": 31.01324838787301,
      "17": 80.38447947821754,
      "18": 82.35373987514762,
      "19": 15.32285793082191,
      "20": 80.38447947821754,
      "21": 1.9464625152460222,
      "22": 2.84844209578687,
      "23": 46.26944564832987,
      "24": 1.5075412082833717,
      "25": 2.2107128772756957,
      "26": 52.75969252664325

The percentages represent everything on the channel - Zigbee, your wi-fi, your neighbour’s wi-fi etc, etc.

But the problem is almost certainly too few routers. How many you need will depend entirely on the structure and layout of your home, but I would expect it to work out at a dozen or more.

Incidentally, you should expect all your end devices to be unavailable sometimes - after a restart, for example. The coordinator has to wait for them to check in, which can take an hour or more with some devices.

Do read the community guides - there’s lots of sensible stuff there. Zigbee is not a simple point to point thing - it can take a lot of tuning to get it right.

1 Like

Thanks for the suggestion about energy_scan. Mine looks way different from yours.

    "energy_scan": {
      "11": 0.3649532476334485,
      "12": 0.6967547825628676,
      "13": 0.792717332355823,
      "14": 0.5380922496244791,
      "15": 0.41540864658928767,
      "16": 0.5380922496244791,
      "17": 0.6123372955913717,
      "18": 0.6967547825628676,
      "19": 0.792717332355823,
      "20": 0.2816331001848671,
      "21": 0.3649532476334485,
      "22": 0.3649532476334485,
      "23": 0.2816331001848671,
      "24": 0.24738567181200594,
      "25": 0.2816331001848671,
      "26": 0.24738567181200594
    },

I did two scans a couple hours apart, and they are similar. I’m not sure what conclusion to draw from my numbers.

You are both suggesting that I need on the order of a router or more per battery device. That’s different from what I was naively expecting. If it’s really true, I’d be more likely to just punt these Zigbee sensors and go back to the wifi sensors I was using previously.

Thnaks again for your input.

1 Like

0 on every channel? I doubt you’d get that unless you’re living in a concrete bunker without any WiFi.

Double check the coordinator antenna for hardware issues. At the very least you should be getting a higher number on the channel your ZigBee is on.

Edit: just spotted you’re using multiprotocol which is no longer recommended due to instability issues like this. You might need to flash ZigBee only firmware back on your dongle

3 Likes

Looks like we were wrong… :face_exhaling:

2 Likes

Hey I took a 1% chance that I might be wrong! :smiley:

Even if this is caused by a faulty stick I still stick with the recommendation of more routers though :slight_smile:

3 Likes

Yeah you should not bw using the Multi-PAN RCP / multiprotocol firmware. Disable multiprotocol and flash the EmberZNet NCP firmware instead. Buy a seperate radio dongle for Thread protocol.

That is not so, it does not sound like you read and understood it as it also covers best practices and actions that everyone should take regardless of setup, and if you now only have two Zigbee Router devices then you are not following those best practice tips, because Zigbee relies heavily on Zigbee Routers and mesh networking (which battery devices can not do on their own). It also mentions that multiprotocol firmware is not recommend and that should use a dedicated radio dongle with NCP firmware for Zigbee-> Zigbee networks: how to guide for avoiding interference + optimize using Zigbee Router devices (repeaters/extenders) to get a stable mesh network with best possible range and coverage

1 Like

That energy scan is 100% weird. This is mine:-

    "energy_scan": {
      "11": 1.0256846852618655,
      "12": 2.509919386096536,
      "13": 2.2107128772756957,
      "14": 2.84844209578687,
      "15": 82.35373987514762,
      "16": 1.0256846852618655,
      "17": 2.84844209578687,
      "18": 6.789392891308996,
      "19": 85.82097888710312,
      "20": 75.96022321405563,
      "21": 82.35373987514762,
      "22": 75.96022321405563,
      "23": 85.82097888710312,
      "24": 84.164247274957,
      "25": 96.64469941013013,
      "26": 68.14622793558128

You can clearly see my zigbee network on channel 15, and my 2.4 wifi on the higher channels

Thanks for all of the recent input. It definitely gives me some things to try. I don’t want to sound defensive, but it seems like some of the advice about interference and distance and such didn’t pay attention to the fact that it’s all of the devices coming and going at the same time. It’s not just some subset with marginal connections. So, maybe you didn’t catch that detail, or maybe you were thinking that it could happen that way with a non-optimal network.

0 on every channel? I doubt you’d get that unless you’re living in a concrete bunker without any WiFi. Double check the coordinator antenna for hardware issues. At the very least you should be getting a higher number on the channel your ZigBee is on.

All I can say is that I’ve dumped those diagnostics multiple times, and they are always in that ballpark. My Zigbees are using channel 11. The SkyConnect dongle does not have an externally visible antenna.

Yeah you should not bw using the Multi-PAN RCP / multiprotocol firmware. Disable multiprotocol and flash the EmberZNet NCP firmware instead.

Fair enough. I will give those a try. I currently only have Zigbee devices, no threads. I’ve avoided stuff like this so far because I’m not sure what will trigger the need to re-pair all my devices. A couple of them are in inconvenient locations. The advice about the firmware seems solid enough that it’s worth it even if I have to re-pair.

if you now only have two Zigbee Router devices then you are not following those best practice tips

First, let me say that’s a great guide, and I appreciate the effort that went into it. I’ve just re-read it specifically to see what it said about routers. The graphic shows a lot of routers, and in one place the text recommends a “swarm” of routers. On the other hand, it also says “Personally, I suggest buying and adding at least three such devices.” Well, 2 is not so very different from 3 :slight_smile:.

If you really need about as many router devices as battery devices (leaving aside the need due to distance or walls or interference), then maybe Zigbee is not for me. All of my “real” devices (mostly contact and motion sensors) are battery powered.

That energy scan is 100% weird

I wonder if SkyConnect isn’t the best coordinator to use. I got it because I figured the HA people behind it would have figured out all the best tricks and so on. Or maybe part of the firmware situation also leads to these strange energy scan results. Beats me.

Thanks again to everyone for the input. I’m off to update my firmware, etc, and will report back with the outcome.

Ah, missed the Skyconnect part. I’d still suspect a dodgy antenna, but first I would flash it with zigbee-only firmware. That’s the only way to narrow down whether it’s a software issue or a hardware issue.

I’m back to the Zigbee firmware on the SkyConnect. Now the wait to see what happens (while hoping it doesn’t happen … how do I prove a negative again? :slight_smile:)

BTW, I went down some wrong pathways for the firmware flashing before I finally discovered the very convenient built-in described here: Home Assistant Connect ZBT-1 (Disable multiprotocol support)

1 Like

Good luck! Out of curiosity, now that you’re on zigbee-only firmware, what does the energy scan look like?

Because if you have interference is affecting the Zigbee Coordinator then it will affect all your Zigbee devices, so if the reception to Zigbee Coordinator gets interfered with then you will see such symptoms, which is why one of the main advice is to connect your Zigbee Coordinator to a long USB extension cable to a USB 2.0 port or USB 2.0 hub (and do not use USB 3.0) as well as move it away from all possible sources of interference.

Hence why it is so important to understand that Zigbee not only is extremly sensitive to EMI/RMI/EMF interference but the signals are also weak and have very poor radio propagation (i.e. bad signal penetration of building materials that are in walls), plus the fact that the communcation protocol uses short messages so if those are not recieved then it will only try resending it so many times before giving up.

Maybe I need to clearify that I personally recommend at least three DEDICATED Zigbee Router devices. All dedicated Zigbee Router devices work MUCH better as router devices than non-dedicated. If you are using dedicated Zigbee Router devices then you do not need as many, which is great when you are just getting started, before you build out your Zigbee network by adding loads of mains-powered Zigbee devices that acts as Zigbee routers.

If you are not using dedicated Zigbee Router devices then you should have MANY more than three! Recommend that should then at the very least have one Zigbee Router device close to each and every battery-powered device, however preferably add two Zigbee Router devices close to each and every battery-powered device for redundancy, so that way all battery-powered devices do not need to communicate directly to the Zigbee Coordinator but can always go through a Zigbee Router device that is close to them.

Again, Zigbee relies heavily on mesh networking to extend range/coverage and your battery-powered devices do not extend the Zigbee network mesh at all, they can only make use of the Zigbee network mesh there are Zigbee Router devices that are joined and a part of that same Zigbee network mesh.

So the general best practise is to have more mains-powered product acting as Zigbee Routers devices if you do have not added a few dedicated “known good” or “known great” Zigbee Router devices. However if you have added at least a few dedicated “known good” or “known great” Zigbee Router devices then you do not need as many non-dedicated Zigbee Router devices.

Also be aware that all products that act as Zigbee Router devices are not created equally. Some are bad, some are OK, while others are good or even great.

Another common symptom of not having a mains-powered Zigbee Router device close to each battery-powered device is that it will drain the battery much quicker because they have to resend messages if reception if poor. This can be the difference of having your batteries discharge in a couple of months instead of in a couple of years.

That is actually not all you need to do. The firmwware that ships with SkyConnect is a “Multi-PAN RCP” (multiprotocol) firmware image and not the EmberZNet NCP firmware that you want to have when only using Zigbee. So you have to both disable the multiprotocol addon AND then also need to manually flash the Silicon Labs EmberZNet NCP firmware image, see:

and

Specifically one of these under the EmberZNet directory there:

Then use either the official SL Web Tools flasher (or some other compatible programmer application):

More info what is the here:

Summery of the three different firmware variants available for Silicon Labs based adapters;

  • EmberZNet NCP = Zigbee NCP (Network Co-Processor) is used as a dedicated Zigbee Coordinator for Zigbee-only environments, for direct use with Zigbee2MQTT, Home Assistant’s ZHA integration, other Zigpy based Zigbee Gateway implementations, or other Zigbee gateways/frameworks that support the EZSP (EmberZNet Serial Protocol) interface.
  • OpenThread RCP firmware (experimental) = This Thread RCP (Radio Co-Processor) is used directly as a dedicated Thread Border Router in Thread-only environments, used for OpenThread Border Router add-on or wpantund.
  • RCP Multi-PAN (no longer recommended) = Multiprotocol firmware for concurrent communication over Zigbee and Thread via Home Assistant SiliconLabs Multiprotocol add-on.

Again, be aware that the RCP MultiPAN in multiprotocol mode is no longer recommended because running multi-protocol with multiple active networks on a single radio adapter has proven to not be stable when using Zigbee and Thread network protocols simultaneously on the same radio adapter, it also increases the complexity of software component dependencies needed, so if already using RCP Multi-PAN then it is highly recommended that you plan to migrate to separate dedicated radio adapters instead, (using Zigbee NCP and Thread RCP firmware respectively), even if using RCP MultiPAN on a single radio adapter dongle has been working fine for you so far.

Further External reference explaining these different co-processor designs at a high level:

1 Like

In my case, my 2 routers are doing nothing else, which is what I think you mean by “dedicated”. Or did you mean something else, like some specific flavor of firmware?

Still, it’s counter to my intuition that it seems to be so easy to overwhelm non-dedicated router nodes. (I understand completely about wanting more routers due to distance or path redundancy factors.) But I certainly won’t argue against the voice of experience.

Interesting. One of the wrong paths I mentioned was just flashing the NCP firmware. Although the flashing step worked, HA could not initialize the SkyConnect. After I discovered that GUI procedure, I did that. After my command line flashing, multi-protocol was marked disabled in GUI, so I enabled it and then disabled it. After that, HA could initialize the SkyConnect and all my previous pairings were still good.

I’ve just done the additional command line flashing step that you described. Things still look good at present.

I hope some of your advice makes its way into the upstream documentation. Searching forum postings for the right way to do things is obviously not ideal. Thanks again for your efforts.

It now looks a lot more like what people expect:

"energy_scan": {
      "11": 84.164247274957,
      "12": 49.512515447068886,
      "13": 70.89933442360993,
      "14": 87.33047519856483,
      "15": 98.21983128611214,
      "16": 99.06269548719737,
      "17": 99.56814169794553,
      "18": 99.44062726818147,
      "19": 98.21983128611214,
      "20": 84.164247274957,
      "21": 62.257682586134884,
      "22": 73.50699819621309,
      "23": 75.96022321405563,
      "24": 12.244260188723507,
      "25": 49.512515447068886,
      "26": 98.62178092672917
    },
1 Like

Now THAT’S a proper energy scan!

This is a free and open-source project which means that anyone in the community can submit contributions of changes to the documentation, including yourself.

No you have products that are primarly designed to do something else (like a light or smart plug being their main function) but as a bonus they they also non-dedicated Zigbee Router. What i mean by dedicated Zigbee Router device is something that is desiged to only be a repeater/extender and nothing else. Yes those dedicated Zigbee Router devices have not only optimised firmware but are also using a better radio SoC chip as well as having better antennas.

No offence but it feels like you are spending more effort trying to misunderstand on purpose in the hope that you do not have to make any changes yourself. Sorry but I do not have time or effort to go down that rabbithole, so do what you like, choose to follow our advice or not, but personally i will stop reaponding to this thread now.

That’s not the case at all. I’m sorry if anything I’ve said has seemed that way. In any case, your time is your own, and I appreciate the effort you have taken already.

I am trying to make sure I don’t unnecessarily buy unnecessary equipment for my relatively simple Zigbee network. So I’m trying to get an understanding of various things. It might turn out that using Zigbee is not a good idea for my home. I don’t know yet.

It’s been about a week with no problems so far. I am cautiously optimistic. :slight_smile: