Daily Zigbee Coordinator Backup Blueprint - ZNP - ezsp/bellows

le_top · January 8, 2022, 5:08pm

You’ll be able to do a daily backup of your ZNP Coordinator data.
That may be usefull to restore your network to another Coordinator when it fails for some unfortunate reason.

DAILY BACKUP OF ZNP DONGLE (not ezsp/bellows): which facilitates installation of zha-toolkit/backup_znp.yaml at 34e5a5013759d9b00adaf1aaeda702ad608d62b0 · mdeweerd/zha-toolkit · GitHub .

You need to install the zha_custom component providing the backup service GitHub - mdeweerd/zha-toolkit: Zigbee Home Assistant Toolkit - service for "rare" Zigbee operations using ZHA on Home Assistant .

NOTE/EDIT: The ‘zha_custom’ fork is renamed to ‘zha-toolkit’ - the blueprint link above has been updated accordingly.

EDIT: I created a blueprint (and code) to backup any supported coordinator (the toolkit detects the method to use). Supports only ZNP and bellows/ezsp.
General blueprint: . In included the generation of zha_backup_success, zha_backup_failed and zha_backup_done events to the blueprint.

Note - the ZHA team (puddly) is working on a next generation library and recommends using the CLI utility instead instead for migration as it uses more recent code for EZSP and ZNP radios.

Danny2100 · January 9, 2022, 11:37am

Does this run even with zha running? Or does zha need to be stopped?

le_top · January 9, 2022, 12:26pm

Backup runs with ZHA running. Restoring with ZHA running also works (but a restart is required after that).

The backup ran for me this night at 4AM and I used the blueprint to set this up.
The main difference between the backups I made already is the frame counter.
Daily backups will also help determine by how much the frame counter normally increases to set a better offset during restore. Mine increased about 50000 in one day.

Danny2100 · January 9, 2022, 3:31pm

If the frame counter is too far back it will take longer on restore for the network to get set up?

le_top · January 9, 2022, 3:37pm

I suppose that it will restore eventuelly when the expected frame counter is reached.

I am not 100% sure, but I think it is there to protect from replays, so if the router or end device is expecting a minimum value, then your coordinator has to send at least as many packets before the target accepts the packet. That could be longer than a day as the coordinator might be sending less messages in case of communication issues.

Danny2100 · January 11, 2022, 9:38pm

I’m a little surprised this is not blown up yet I didn’t try it but this is something people were asking for and it doesn’t appear to be gaining traction.

le_top · January 11, 2022, 9:50pm

You mean that the HA system should break because of the backup.

Anyway, I am using it in two locations and happily backup daily.

[core-ssh local]$ diff nwk_backup_10.json  nwk_backup_11.json
--- nwk_backup_10.json
+++ nwk_backup_11.json
@@ -4,7 +4,7 @@
         "format": "zigpy/open-coordinator-backup",
         "source": "[email protected]",
         "internal": {
-            "creation_time": "2022-01-10T04:00:02+01:00",
+            "creation_time": "2022-01-11T04:00:03+01:00",
             "zstack": {
                 "version": 3.0
             }
@@ -22,7 +22,7 @@
     "network_key": {
         "key": "d232aca2c434309bd0e0dae5c074315c",
         "sequence_number": 0,
-        "frame_counter": 4741430
+        "frame_counter": 4791727
     },
     "devices": [
         {

Danny2100 · January 12, 2022, 10:08am

Blown up as in a ton of comments that people are going to use it not broken whoops lol that was a bad choice of words. I am going to set this up today.

Danny2100 · January 13, 2022, 2:15pm

I didn’t realize there was another post with nearly 500 views I thought this was the only one.

I just set it up and it ran its first backup.

If I wanted to restore the configuration to another stick do I use the nwk_backup or the nvram_backup? When would I use either of those?

le_top · January 13, 2022, 2:50pm

To restore, the nwk_backup is better IMHO. There is a parameter that allows increasing the TX Counter.
It’s also the method that was tested successfully, while the NVRAM restore did not work perfectly.

I am mainly backing up both because we can and you never know what kind of information can be useful.

500 views on the zha_custom post is not too bad, but with hindsight I expected the component to be used “more”. I dropped a message about the blueprint here so that things are annonced where you might look for them - proof is that you didn’t see the other post earlier !

Danny2100 · January 14, 2022, 12:14am

Sounds good I just wasn’t sure if the Nvram was better as I never did an Nvram restore when switching between znp coordinators.

Yep I didn’t see the other post until days later. I’m glad I saw this though as this will be very helpful to restore backups if the coordinator dies which I have had quite a few hiccups with my sonoff ZigBee 3.0 coordinator. It will be especially helpful because zha does not have to be stopped in order for a backup to be performed which is a massive plus. I ended up getting 4 coordinators as backups in case a dongle dies as I rely on ZigBee for all of my lighting. Over 100 devices.

MoridinTX · January 14, 2022, 12:23am

When I try to import the blueprint, i get an error “Unsupported domain” and it will not import.

I am on HA Blue, 2021.12.8. Any suggestions? I have never tried to import a blueprint before.

le_top · January 14, 2022, 7:26am

AFAIK Zigbee provides the possibility to have more than one coordinator in the network.
That should be to ensure coordinator redundancy - but it’s not something I see supported in the current (public) implementations.

le_top · January 14, 2022, 7:28am

I think that you need to install the zha_custom component first.
Each (custom) component is considered a domain.

Danny2100 · January 14, 2022, 9:48am

I never knew that. I wonder how that works? When would one take over to avoid interference?

MoridinTX · January 14, 2022, 10:43am

Thanks, I did try this(installing zha_custom first) as well. Not really sure why it would not work, so I manually added the automation, from looking at your blueprint. I’ll work on figuring out the why later, after I get some coffee… Thanks for the work on this, the first backup ran successfully.

[Solved] Adding (automation: !include automations.yaml) to my configuration.yaml was the issue. I had several automations in my automations.yaml, but they did not show up when going to Configure. After restarting, all the automations show up, and I am able to import a blueprint.

le_top · January 14, 2022, 12:49pm

When you look here and there on the web you’ll read that multiple coordinators in a Zigbee network is not possible, but the Zigbee Spericiation (2.3.2.3.6 MAC Capability Flags Fields) indicates there is a bit "Alternate PAN Coordinator “this node is capable of becoming a PAN coordinator.”

So in full, there is only one PAN Coordinator at at time but it looks like there can be devices that could be come the coordinator.
Possibly this is a MAC sub-layer requirement and this was not specified for the Zigbee network.

I could not yet find an official document explaining how that works, but there was mention of it in microchip’s code.

So I assumed it’s possible because of that mention in the specification.

(Un)fortunately there is a paper about it though.
This suggests that there is no official method, but that it’s not impossible to do.

I had a quick look and it proposes to have a second device which acts as “backup router” that “pings” the coordinator and takes over after several unsuccessful pings.
The coordinator also sends network settings to the backup on a regular basis.

The switch implies that the backup “restarts” to the coordinator software and reuses the coordinator’s IEEE address.

So based on that, the Zigbee 2008 specification did not define how an “online” backup would work. And I do not think that has changed in the latest documents.

So there is no official method for a backup coordinator, but there is a proposal for what could be called a hack.

Danny2100 · January 14, 2022, 2:10pm

Interesting I’m certainly not a developer I wonder if puddly the zha developer in the future can work on that functionality.

le_top · January 14, 2022, 4:34pm

If the restore functionnality can be done just with the service, then an automation might (almost) suffice:

Very regular backups of the original key;
Some method for monitoring - trigger in case of failure;

When triggered:

Ensure communication is disabled on the old key;
Change the port configuration;
Restore to the new key, already installed (but disabled);
Do the appropriate actions to enable the new key (could imply a restart of the core).

Edit (following revisit of my post, evolution of ZHA/Zigpy):

ZHA now has an interactive procedure to migrate from one key to another which worked well for me. This could be used to implement some failover mecanism. However, the original zigbee coordinator would be out of order and can not be reset, so there would be the risk that it becomes online again so that would still need to be carefully avoided.

Hedda · January 18, 2022, 8:47am

While I am no longer using a Silicon Labs based Zigbee Coordinator myself any longer I was wondering if you be willing to extend this Blueprint to also support Silabs backup via bellows CLI commands too?

https://github.com/zigpy/zigpy/wiki/Coordinator-Backup-and-Migration

https://github.com/zigpy/bellows/blob/dev/README.md#nvram-backup-and-restore

https://community.home-assistant.io/t/backup-your-zha-husbzb-1-stick-and-even-seamlessly-migrate-to-a-new-stick-without-re-pairing/229044/