ZIGBEE2MQTT Crashing, Restarting, network failing

I wonder if this was a result of trying to add a device…

The very last entry in Z2M before it was terminated:

info  2023-08-19 20:41:32: Accepting joining not in blocklist device '0x001e5e090215f768'

Supervisor

23-08-19 20:41:32 WARNING (MainThread) [supervisor.addons.addon] Watchdog found addon Zigbee2MQTT is failed, restarting...

Coincidence?

Likely not a coincidence. What is that device.

The only time I’ve ever has Zigbee2MQTT actually crash was when attempting to load a customization for a specific type of device. You may have just found the culprit.

Can you remove that device? If not can you revert to a backup?

If you google that message. This thread at the Zigbee2MQTT forum seems useful as does the link cited in the thread.

Are you factory resetting each device before you add them back? I think you have some bad baggage. Also, I would recommend AGAINST doing any restores from you backups, again I think you have some :poop: :poop: that need to be cleaned out.

Also, I know many are successful running zigbee2mqtt inside their Home Assistant setup, however I just find running them in separate docker containers or on separate VM’s or physical machines a more robust way to go.

It’s a plug in socket,
https://www.zigbee2mqtt.io/devices/1613V.html#hive-1613v

It’s connected to a shower pump.

I’ve got several spares of these.

I actually joined it to the network, then thought I don’t really need power monitoring for a shower pump so I set it to delete from the network having disabled join, such that it was left in a searching state for when it was next powered on for something else.

I then realised later that I had put this in to automate a routine that detects if the shower is being used and the water tank temp drops below a certain temperature, it then turns the boiler on to bring the tank back to temp.

I’d added a couple more routers by that point. I plugged it in. Realised something wasn’t working so held the power button to force a join. When finished I realised Z2M had restarted.

All has been well since.

1 Like

I’m not seeing this join message issue now. I was earlier with around 10 devices continually making such join messages. That was before changing channel and network key.

What do you mean by factory resetting though? Since reset all devices are gone from device view, but their names and unique identifier string are still in the database. So to bring them back I have to hold whatever button for majority of devices, a few hue bulbs I use a hue remote to reset them, and one or two things have to do a multi-on-off-on-off cycle of physical power. So if that is factory resetting, then yes - there is no other way for them to join the network with new channel and network key.

The join message as it was restarted was when I plugged that device in, which had been removed already. After restarting it joined and completed correctly after a bit of back and forth.

Thanks for the info, hope you are feeling some progress!

In the zigbee2mqtt forum thread and linked article, that I and the other chap posted to you shows how complex zigbee network can be, devices can cache they ‘join request’ and credentials, this is supported to allow end device to go to sleep and wake up and do things, routers have some similar interesting properties as well to support ‘sleepy end devices’.

All that said, it sounds like by changing the network key (and maybe the channel) that kaboshed all those cached creds. That said, figuring out how to do a factory reset of a zigbee device is a good thing to have in your quiver and I try to do it when a device shows odd behavior or I am moving it to another location.

Allons y!

What is it that you do for a typical reset? Is it like I do, and hold that re-pair button, and or force delete it from the controller, then re-pair, or is there something else perhaps?

All is stable this morning. I’ll add a few more devices.

From one disaster to another… Restarted home assistant and all automations have disappeared :frowning:


Log Details (ERROR)


Logger: homeassistant.setup
Source: helpers/entity_registry.py:1364 
First occurred: 13:30:21 (1 occurrences) 
Last logged: 13:30:21

Error during setup of component automation
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/setup.py", line 288, in _async_setup_component
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 259, in async_setup
    await _async_process_config(hass, config, component)
  File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 976, in _async_process_config
    entities = await _create_automation_entities(hass, updated_automation_configs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 860, in _create_automation_entities
    cond_func = await _async_process_if(hass, name, config_block)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 989, in _async_process_if
    checks.append(await condition.async_from_config(hass, if_config))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/condition.py", line 246, in async_from_config
    return cast(ConditionCheckerType, await factory(hass, config))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/device_automation/condition.py", line 55, in async_condition_from_config
    return trace_condition_function(platform.async_condition_from_config(hass, config))
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/binary_sensor/device_condition.py", line 317, in async_condition_from_config
    state_config = condition.state_validate_config(hass, state_config)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/condition.py", line 1022, in state_validate_config
    config[CONF_ENTITY_ID] = er.async_validate_entity_ids(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_registry.py", line 1395, in async_validate_entity_ids
    return [async_validate_entity_id(registry, item) for item in entity_ids_or_uuids]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_registry.py", line 1395, in <listcomp>
    return [async_validate_entity_id(registry, item) for item in entity_ids_or_uuids]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_registry.py", line 1364, in async_validate_entity_id
    raise vol.Invalid(f"Unknown entity registry entry {entity_id_or_uuid}")
voluptuous.error.Invalid: Unknown entity registry entry d72cdf30922dec23c129342137418eb0

Will start a new topic for this - restore from last night back has not helped.

Restored from a week old backup and then went through the channel change, netwoork key, and PAN change again.

So far nothing crashed, but I have lost a couple of devies - again light dimmer switches.

I still see many times “Accepting joining” from devices - all of which I removed, added to an echo with zigbee, left them some time, deleted, then joined back to Z2M. It seems to mostly be hue motion sensors.

It is a complete mystery to me.

It is difficult to follow your steps from your writing.

  1. Not sure what you ‘restored from a week old backup’. Zigbee2MQTT config? Home Assistant config? Both?
  2. ‘went through the channel change, network key and PAN change again’. Did you change them to new values or same value from some prior evolution?
  3. ‘added to an echo with zigbee’? WTF :upside_down_face:. So you have another zigbee network, frst time I think I’ve seen you share this info? And what purpose do these steps yield for your setup?

The mystery seems to be why you are trying to paint a Jackson Pollock :wink:

You seem to have a fairly complex Zigbee setup. It is hard to do, I’ve been in similar boats before, I think you have to be regimented in your changes, let things settle, and divide into simpler parts. Stop the spaghetti :spaghetti: testing. I do not think you have shared the full list of your 40 + devices, however from what I’ve seen in your posts, from Tuya to Hue Motion Sensors you seem to have picked some of the more might I say ‘difficult/odd/abbynormal’ devices. I would recommend reviewing the Zigbee2MQTT forums for others experiences with each of your devices, lots of knowledge in these forums and folks experiences.

1 Like
  1. All automations had disappeared in home assistant. Adding a new automation still noting would show. Restoring the automations.yaml and configuration.yaml on their own did not work to restore them. As I had made a change firstly test moving some devices to ZHA, and I had changed some automations, scripts, templates and dashboards to support the new entity name formats as well as finding a kind of bug along the way, I decided the best option was to do a full restore back one week.

  2. I changed the channel again to 25, the PAN and Network keys to something completly new.

  3. In post 10 you kindly shared a link to something about this “accepting joining” info line from the debug that was showing. Like : info 2023-05-11 13:25:08: Accepting joining not in blocklist device ‘0xbc33acfffe26e4a7’

At least some of the way through the post Koenkk mentions if you join the device to a totally different zigbee network it should clear out the network details. I mentioned I could join this to an echo with zigbee to achieve this, then re-join it to my actual zigbee network (this in theory may make it have the right details and stop this), however it does not. There is nothing normally connected to the echo, so I assume it doesn’t actually run anything actively.

The end of that post actually mentions this behaviour of accepting joining is realtively normal and it’s when a device can’t reach it’s parent.

  1. I have around 100 devcices. All worked perfectly in harmony until a few weeks ago. Most motion sensors are centred around Hue, which I also use for some plugins without power monitoring - I find those seem to be pretty robust, hive for power monitoring sockets (although have used Auror and Samsung before), Aqara for buttons, 2 or 3 tuya radar sensors (as they were cheap and I couldn’t be bothered to build more ESP based ones), aurroa for wired sockets and some dimmers, couple of tuya buttons (from LoraTap, which I’ve found to be perhaps the most reliable button, and can hop instantly to any parent), Xiaomi contact sensors, some sonoff relays, Develco smoke alarms, Gledopto LED controllers. Which do you consider to be oddball, and or what are your go to devices for categories of motion / microwave / switching / plugs / contact sensors / buttons.

I’m pretty sure this was all down to zigbee interferance. I think I mentioned above, that when I was on channel 11, that simply turning on the solar optimiser controller, it would reset all the Aurora plugs, some aurroa lights, and a few other brands. I suspect the solar optimiser controller jumped channels and or upped its transmit power follwing a software update or similar. I will not be using Tigo again, that is for sure!

So far though, no more crashing. I’m not convinced about the Aurora Aone dimmers at the moment, but will see, if one plays up it will be removed and replaced, but there seems limited suitable devices in this space.