ZHA: bunch of devices need to be re-paired after a power outage

Hey everyone,
I’ve had this issue a while now.

Whenever i have a power outage, at least 10-15 of my devices go offline and i need to re-pair each and everyone of them.
I cannot figure out why this happens.
The only thing I notice (although i cant be 100% sure) is that the devices that disconnect are the ones more than 1 hop away from the zha bridge.

Where should i start looking?
Any tips on how to start troubleshooting would be greatly appreciated.
Thanks in advance for any input.
K.

What are the ‘router’ devices, aka the devices in the middle? Any pattern to them, all the same? Another thought is that perhaps these router devices come back on line after the coordinator, as a result the end devices have already gone down the rabbit hole of connecting directly to the coordinator. I am far from a zigbee mesh expert, but I have found that some of my devices do display some ‘different’ behaviors when it comes to mesh routing. Some Aqara products fit into this category, for example.

Re-pair them at their current location. That way the device will pair up with the closest repeater (plugged in zigbee device) rather than the coordinator.

1 Like

Definitely sounds like a mesh issue.

How many devices of what types?
(model number matters)

You use ZHA, what kind of coordinator stick?

Considering it eventually gets running, we’ll skip 2.4 Ghz wireless or USB interference - those would probably just stop you dead.

What do you usually have to do to get it working again?

I use a Conbee II running on a Pi3b+. My “mesh” consists of 6 strategically placed zigbee plug in outlets connecting about 40 light bulbs (all Cree) and a couple dozen battery powered sensors; motion, door, environment and a half dozen multi-switch remotes (all Aqara). The lights will act as routers also and although I don’t try to directly force them to be used as a router, they automatically do so and do seem to help out. This is all on a 150’ x 70’ lot with a 40’ x 60’ house plucked in the middle. The coordinator is smack in the middle of the house and the routers are pretty much on the perimeter of it (each corner and two long walls in the middle. I have good connectivity even all the way out to the vibration sensor stuck in the mailbox at the road.

1 Like

Thanks for all the replies. Replying to everyone here:

I have left everything as it was after the power failure, in order to figure out what’s up.

Here is a list of my devices, sorted by last seen:

Most are Xiaomi/ Aqara.
First two i think were out of battery since last month, so ignore them.
The ones that were last seen on 9th of Jan - i dont know what happened, i was out of the house.

Here is a pic of my the current map, it looks like some devices are orphans:

I see that some of the router devices are offline:

Yes.

My coordinator (sonoff zb bridge flashed with tasmota) seems offline too, but that is not accurate, as some of my devices do work currently.

They have been paired in their location initially, and after the last power failure when i had a similar issue

This could happen any time, but i do not think it is the issue, i’ve had a partial power failure (one floor of the house) while the coordinator remained online. Still had the issue in the affected devices.

I put HA into device adding mode, then go to each device and press their button for several seconds.
They get discovered and recognized with their previous name and settings.

Is this a matter of distance / connectivity? If i run around and re-pair everything, they will all work with no issue … until there’s a power outage. Then the random stuff starte

My first “eyebrow raise” is; Why is your coordinator showing offline? Is that intermittent? Is it offline now?

No idea, I never noticed it before.
When I got the screenshot i checked the devices that were still connected and they were working normally.
It looks online now.

image

Try putting this in your configuration.yaml file. The topology_scan_period: is in minutes for the coordinator to verify your mesh. If it eats up your response times, you can move it up to a few hours or just get rid of it.

zha:
  zigpy_config:
    topology_scan_period: 30


I entered it. Will check and report back.

Thing is that right now, even if i manually re-pair all the routers, the end devices wont connect, they need to be re-paired too, which does not make any sense because half of my house they came back online nicely.

Oh… I forgot one thing. If you have an recorder.yaml file, exclude the domain button, or your database will get really big.

exclude:
  domains:
    - button

Try increasing the time for your router devices before they become unavailable.

My IKEA repeaters report every +4 hours and the initial value was lower than that, so the router’s would go offline and take out all the children with it.

I’m also running a flashed Zb Bridge and everything has been reliable for me once I made that change.

1 Like

image
My configuration was like this.
Changed to your values to see if it solves the issue.
Thank you.

Yeah mine were initially like yours, but since increasing the values very happy. Have a look at how often the check-in of your router devices are and you can adjust accordingly. I found with my IKEA repeaters it was 4 hours, so entered a value a little higher. Report back how you go and if it has helped stabilise your ZigBee mesh. :+1:

1 Like

How do i figure this thing out? From the tasmota console?
image
All i get are these stuff.

That’s showing commands being sent and received. Looks like it’s doing stuff, which is good.

In the ZHA integration, clicking on a device will bring up some details like this.


Which devices do you have as routers in your ZigBee setup?

I had a new power failure yesterday.
Unfortunately I have the same issues:
everything below the red line is offline


(more or less, the same devices that do not reconnect after a power failure)

any ideas?
:frowning: im sick of re-pairing everything

EDIT:
I am also getting this error in logs, i dont know if it’s relevant:

Logger: homeassistant.components.zha.core.channels.base
Source: components/zha/core/channels/base.py:428
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 1:31:08 PM (18 occurrences)
Last logged: 1:32:51 PM

[0xE96A:22:0x000c]: async_initialize: all attempts have failed: [DeliveryError('[0xe96a:22:0x000c]: Message send failure'), DeliveryError('[0xe96a:22:0x000c]: Message send failure'), DeliveryError('[0xe96a:22:0x000c]: Message send failure'), DeliveryError('[0xe96a:22:0x000c]: Message send failure')]
[0x8708:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('[0x8708:1:0x0006]: Message send failure'), DeliveryError('[0x8708:1:0x0006]: Message send failure'), DeliveryError('[0x8708:1:0x0006]: Message send failure'), DeliveryError('[0x8708:1:0x0006]: Message send failure')]
[0x8708:21:0x000c]: async_initialize: all attempts have failed: [DeliveryError('[0x8708:21:0x000c]: Message send failure'), DeliveryError('[0x8708:21:0x000c]: Message send failure'), DeliveryError('[0x8708:21:0x000c]: Message send failure'), DeliveryError('[0x8708:21:0x000c]: Message send failure')]
[0x8708:1:0x0002]: async_initialize: all attempts have failed: [DeliveryError('[0x8708:1:0x0002]: Message send failure'), DeliveryError('[0x8708:1:0x0002]: Message send failure'), DeliveryError('[0x8708:1:0x0002]: Message send failure'), DeliveryError('[0x8708:1:0x0002]: Message send failure')]
[0x8708:22:0x000c]: async_initialize: all attempts have failed: [DeliveryError('[0x8708:22:0x000c]: Message send failure'), DeliveryError('[0x8708:22:0x000c]: Message send failure'), DeliveryError('[0x8708:22:0x000c]: Message send failure'), DeliveryError('[0x8708:22:0x000c]: Message send failure')]

[0xE96A:22:0x000c] I believe is your DN BM Main Room xiaomi plug. What I am suggesting is probably a PIA exercise… However, if you were to unplug this device when it was in the ‘no connected’ state and then move it to right next to your coordinator and plug it back in WITHOUT repairing. Give it some time and see if it connects from there. If it does, this might point to some issue with devices that are between the failing devices and the coordinator.

1 Like

I just removed everything, deleted the db and reinstalled ZHA.
I just finished renaming the last device… took all afternoon :slight_smile:

We ll see what happens now

I have the same issue, almost same setup:
ZHA + Conbee II + 5x IKEA Repeater

Whenever I have to shutdown the electric circuit of a power plug where an IKEA repeater is plugged in, it does not reconnect thereafter. I don’t think it is a timeout issue, as I usually switch back on the circuit breaker after a few seconds, max 1-2 minutes.

The “Last seen” timestamp in HA just stops updating. I have to re-pair the repeater and some of the devices that were connected to said repeater.

I now changed the topology_scan_period and the “Consider mains & battery powered devices unavailable after (seconds)” values as suggested in this thread. Hope this will help, but I doubt it. This is a reconnection issue and not a timeout issue it seems to me…

By the way, what does it even mean for a device to “become unavailable after (seconds)”? Does it mean it’s just listed as unavailable but once it’s seen again it will reconnect automatically? Or I will have to re-pair any Zigbee device that runs into a such a timeout?

Thanks :slight_smile: