[Help!] Corrupted Zigbee Database

Hi there, I hope someone can help me with problems that recently started cropping up with my Zigbee network.

My ~130 device Zigbee network using ZHA and a TubesZB CC2652 coordinator is the backbone of my HA setup and has largely been operating stably for years. About a week ago, I started to get strange behaviour when trying to pair new devices. After a device successfully pairs, it would be able to communicate with HA (send/receive commands). However, after a few seconds, ZHA would restart the pairing process. This would loop until the pairing process times out. Some never-before-seen errors appeared in the logs:

  • Cancelling previous initialization task for device xx:xx:xx:xx:xx:xx:xx:xx
  • [0x1265:1:0x1000]: Couldn’t get list of groups: Device has re-joined the network
Error doing job: Exception in callback Gateway.device_initialized.<locals>.<lambda>() at /usr/local/lib/python3.13/site-packages/zha/application/gateway.py:457 (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/asyncio/events.py", line 89, in _run
    self._context.run(self._callback, *self._args)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/zha/application/gateway.py", line 457, in <lambda>
    init_task.add_done_callback(lambda _: self._device_init_tasks.pop(device.ieee))
                                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: xx:xx:xx:xx:xx:xx:xx:xx

I tried restarting HA but then ZHA completely refuses to start, complaining of a corrupted Zigbee database. Restoring a backup from before the restart allows ZHA to start properly, but now it seems that I can’t add more devices to my network. I’m also unsure if I can restart HA safely.

I tried to use this zigpy-cli command to repair the database using the Advanced SSH Terminal Add-on and a virtual environment, but it complains of FileNotFoundError: [Errno 2] No such file or directory: 'sqlite3'. I then tried to copy the database into a Windows machine, install zigpy-cli WSL in a virtual environment, and run the command again. I can see the process starts but then it fails with sql error: no such table: sqlite_dbpage (1).

At this point, I feel like I am out of options. Can anyone guide me through the process to repair zigbee.db? Or am I stuck with trying to restore the database from backups, failing which I need to rebuild the Zigbee network from scratch?

Please let me know if I can provide more information. I can collect some more logs if it helps!

Just reporting back here in case anyone encounters the same issue. It has been solved by re-flashing the firmware onto the coordinator (using TubesZB’s tools). That seems to have unblocked whatever was preventing joining previously. I also did delete all traces of the problematic device’s IEEE from zigbee.db and the coordinator backup json that is needed to reconfigure the coordinator after flashing…but I think the firmware flash rather than the database cleaning did the trick, because other devices that also displayed the same behaviour were able to rejoin my network after the flash and without the database cleaning.