Zigbee network suddenly very unstable

Our Zigbee network has been very unstable for a couple of days now.
Every now and then, most (but not all) devices become “unavailable”. Lamps, plugs, buttons… everything.

A HA restart solves the problem for a while, but a while later it’s like this again.
I have a Home Assistant Green with a SkyConnect for Zigbee. And I’ve been working on all this for less than half a year.
It all worked so nicely without any problems, but now it doesn’t anymore.

I think this issue started with the March HA core update, but I’m not sure.

What could be wrong?

First welcome, and sorry you’re having issues. But, honestly asking ‘what could be wrong’?

Sunspots? Hippos? New neighbor moved in and lit up a new wifi network? (#3 and maybe #2 could happen) No clue. We need a lot more to go on.

Fortunately the community built a great zigbee guide and a great zigbee troubleshooting guide here as part of the cookbook…

Scroll down to both zigbee and troubleshooting (zigbee has its own troubleshooting section)

1 Like

Thanks for the link. I’ve searched that page for ‘zigbee’ and I can’t find any troubleshooting guides. The “troubleshooting” section doesn’t list anything zigbee related. Do you have any direct links to the zigbee troubleshooting guide you’re referring to?

Start with this. Zigbee network optimization: a how-to guide for avoiding radio frequency interference + adding Zigbee Router devices (repeaters/extenders) to get a stable Zigbee network mesh with best possible range and coverage by fully utilizing Zigbee mesh networking

1 Like

I… Did.

This is literally halfway down the page I linked:


???

1 Like

Yes. I saw that. You said great there was “a great zigbee troubleshooting guide here as part of the cookbook”, and “zigbee has its own troubleshooting section”. I see lots of articles about zigbee , but I don’t see anything that looks like a zigbee troubleshooting guide.

You have a direct link now. Start there, if you follow the guide and it doesn’t get better you need to produce logs and more information for anyone to be able to help.

Ok, I understand that my message may have been too general.

But my HA setup has not changed for a few weeks. No new devices, no removed devices.
And Zigbee was running fine next to our wifi, for which the setup is also still the same as before.

But to try something, I now changed the 2.4 GHz wifi channel from 1 to 11.
Since Zigbee is on channel 15, they should not interfere with each other.

I hope this has the desired effect on the stability of our Zigbee network.

Have you look through the guide and made sure you are doing best practice like an extension cable and so on?

This is a misnomer. YOU have not changed anything.

Something changed we just haven’t figured it out yet.

I said #3 on my list was a real possibility because my neighbor getting a shiny new wifi AP knocked my entire zigbee network out one afternoon and it took me a week to figure out what was going on.

Yes it was very much too general and the zigbee guide is designed to walk you thru best practice so when you’re done you know what questions to ask and how to answer ours. It will only take twenty minutes to read the whole thing tk skim and get. An idea of how complicated what your ask really was.

Edit:
Zigbee network changes are major events and not to be taken lightly. Be prepared for more issues as the network stabilizes (hours depending on how many hops you have) and some older devices don’t take to channel changes and need tk be manually re-added. (read: be prepared for it to get worse before it gets better)

We don’t have neighbors. :grinning:

I changed the Wi-Fi channel, not the Zigbee channel. So that shouldn’t influence any Zigbee devices?

1 Like

Man I wis… (don’t tell my neighbor)

Yes wifi first.

Do this run through the best practices config guide with yihr setup so yoj know how far out of ‘best’ you may be before you start changing a lot of stuff. As you read it you’ll figure out some likelies. (oh maybe a routing device here or channel x or a USB extension)

Then we can absolutely help with working through it.

Thank you @fleskefjes . That’s probably the best link for OP to use to troubleshoot their stability issue.

My issue is different than OP’s. I found this thread searching for other users reporting zha problems after recent HA updates. That’s why I was interested in the zigbee troubleshooting guide previously mentioned. It seems that’s not a specific named article as much as a general reference to the zigbee section in the cookbook.

I won’t get into troubleshooting my own issue here since it’s off-topic for this thread. I just wanted to inquire about the referenced support document. :slightly_smiling_face:

1 Like

If the guide doesn’t help you @timborino we’re glad to help, you can create a new thread and specify what you’re having issues with and provide logs :slight_smile:

1 Like

You don’t say whether you’re using the ZHA integration. If you are, you can get a report on how busy the various channels are - better than making haphazard changes.

On the ZHA integration page, select the “three dot” menu and “Download diagnostics”. You’ll get a long json file, and somewhere towards the end will be something like this:

    "energy_scan": {
      "11": 55.9836862725909,
      "12": 59.15797905332195,
      "13": 55.9836862725909,
      "14": 3.6632469452765037,
      "15": 36.830390267097734,
      "16": 3.2311094587038967,
      "17": 6.789392891308996,
      "18": 15.32285793082191,
      "19": 7.659755505061292,
      "20": 84.164247274957,
      "21": 87.33047519856483,
      "22": 87.33047519856483,
      "23": 92.95959997754716,
      "24": 80.38447947821754,
      "25": 6.789392891308996,
      "26": 97.7033852118351
    },

These are your channels and the numbers are the percentage of noise on each one. “Noise” means everything - Zigbee traffic, interference from your wi-fi, your neighbour’s wi-fi, your microwave, etc. etc. - and something completely unrelated to Zigbee can contribute to it. An extra Bluetooth proxy, for example.

Okay, I’ve read through that entire Zigbee section in de cookbook.

Looked up my ZHA network visualisation:

As you can see, I have one coordinator (square box) and lots of routers (oval).
(the 2 red ovals are plugs that are currently unplugged)

Yes, I’m using ZHA.
Regretfully, downloading diagnostics doesn’t work:

Logger: homeassistant.components.diagnostics
Source: components/diagnostics/__init__.py:219
integration: Diagnostics (documentation, issues)
First occurred: 7:02:17 PM (5 occurrences)
Last logged: 8:40:40 PM

Failed to serialize to JSON: config_entry/01JBY70BXZNVNQYSX5HWT0ZHEW. Bad data at $.data.application_state.network_info.nwk_addresses<key: a4:c1:38:09:81:a3:f5:ab>=a4:c1:38:09:81:a3:f5:ab(<class 'zigpy.types.named.EUI64'>, $.data.application_state.network_info.nwk_addresses<key: 00:15:bc:00:1a:10:8c:df>=00:15:bc:00:1a:10:8c:df(<class 'zigpy.types.named.EUI64'>, $.data.application_state.network_info.nwk_addresses<key: 0c:ef:f6:ff:fe:59:20:fb>=0c:ef:f6:ff:fe:59:20:fb(<class 'zigpy.types.named.EUI64'>, $.data.application_state.network_info.nwk_addresses<key: 0c:ef:f6:ff:fe:59:a3:47>=0c:ef:f6:ff:fe:59:a3:47(<class 'zigpy.types.named.EUI64'>, $.data.application_state.counters.ezsp_counters.MAC_RX_BROADCAST=MAC_RX_BROADCAST = 262140(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.MAC_TX_BROADCAST=MAC_TX_BROADCAST = 27612(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.MAC_RX_UNICAST=MAC_RX_UNICAST = 118804(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.MAC_TX_UNICAST_SUCCESS=MAC_TX_UNICAST_SUCCESS = 86296(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.MAC_TX_UNICAST_RETRY=MAC_TX_UNICAST_RETRY = 67539(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.MAC_TX_UNICAST_FAILED=MAC_TX_UNICAST_FAILED = 17264(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DATA_RX_BROADCAST=APS_DATA_RX_BROADCAST = 3285(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DATA_TX_BROADCAST=APS_DATA_TX_BROADCAST = 3179(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DATA_RX_UNICAST=APS_DATA_RX_UNICAST = 31572(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DATA_TX_UNICAST_SUCCESS=APS_DATA_TX_UNICAST_SUCCESS = 23801(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DATA_TX_UNICAST_RETRY=APS_DATA_TX_UNICAST_RETRY = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DATA_TX_UNICAST_FAILED=APS_DATA_TX_UNICAST_FAILED = 2510(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.ROUTE_DISCOVERY_INITIATED=ROUTE_DISCOVERY_INITIATED = 2152(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.NEIGHBOR_ADDED=NEIGHBOR_ADDED = 20(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.NEIGHBOR_REMOVED=NEIGHBOR_REMOVED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.NEIGHBOR_STALE=NEIGHBOR_STALE = 395(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.JOIN_INDICATION=JOIN_INDICATION = 8018(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.CHILD_REMOVED=CHILD_REMOVED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.ASH_OVERFLOW_ERROR=ASH_OVERFLOW_ERROR = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.ASH_FRAMING_ERROR=ASH_FRAMING_ERROR = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.ASH_OVERRUN_ERROR=ASH_OVERRUN_ERROR = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.NWK_FRAME_COUNTER_FAILURE=NWK_FRAME_COUNTER_FAILURE = 4(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_FRAME_COUNTER_FAILURE=APS_FRAME_COUNTER_FAILURE = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.UTILITY=UTILITY = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_LINK_KEY_NOT_AUTHORIZED=APS_LINK_KEY_NOT_AUTHORIZED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.NWK_DECRYPTION_FAILURE=NWK_DECRYPTION_FAILURE = 1(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.APS_DECRYPTION_FAILURE=APS_DECRYPTION_FAILURE = 2402(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.ALLOCATE_PACKET_BUFFER_FAILURE=ALLOCATE_PACKET_BUFFER_FAILURE = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.RELAYED_UNICAST=RELAYED_UNICAST = 5(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PHY_TO_MAC_QUEUE_LIMIT_REACHED=PHY_TO_MAC_QUEUE_LIMIT_REACHED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PACKET_VALIDATE_LIBRARY_DROPPED_COUNT=PACKET_VALIDATE_LIBRARY_DROPPED_COUNT = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.TYPE_NWK_RETRY_OVERFLOW=TYPE_NWK_RETRY_OVERFLOW = 1430(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PHY_CCA_FAIL_COUNT=PHY_CCA_FAIL_COUNT = 19036(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.BROADCAST_TABLE_FULL=BROADCAST_TABLE_FULL = 177036(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PTA_LO_PRI_REQUESTED=PTA_LO_PRI_REQUESTED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PTA_HI_PRI_REQUESTED=PTA_HI_PRI_REQUESTED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PTA_LO_PRI_DENIED=PTA_LO_PRI_DENIED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PTA_HI_PRI_DENIED=PTA_HI_PRI_DENIED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PTA_LO_PRI_TX_ABORTED=PTA_LO_PRI_TX_ABORTED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.PTA_HI_PRI_TX_ABORTED=PTA_HI_PRI_TX_ABORTED = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.ADDRESS_CONFLICT_SENT=ADDRESS_CONFLICT_SENT = 0(<class 'zigpy.state.Counter'>, $.data.application_state.counters.ezsp_counters.EZSP_FREE_BUFFERS=EZSP_FREE_BUFFERS = 250(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.unicast_rx=unicast_rx = 31576(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.unicast_tx_success=unicast_tx_success = 12446(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.broadcast_rx=broadcast_rx = 3279(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.broadcast_tx_success_unexpected=broadcast_tx_success_unexpected = 3177(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.unicast_tx_success_unexpected=unicast_tx_success_unexpected = 11357(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.unicast_tx_failure_unexpected=unicast_tx_failure_unexpected = 2112(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.unicast_tx_failure=unicast_tx_failure = 398(<class 'zigpy.state.Counter'>, $.data.application_state.counters.controller_app_counters.broadcast_tx_failure_unexpected=broadcast_tx_failure_unexpected = 2(<class 'zigpy.state.Counter'>

Try avoiding unplugged / unpowered router devices. Some of your devices only have a single connection and I bet that is the problem.

2 Likes

Those plugs were unplugged this afternoon. They were active up until then.

And lots of devices become randomly unavailable, also ones that have multiple connections according to the map.

I managed to download those diagnostics, and this is the energy scan:

My Zigbee network runs on channel 15.