Managing a 100+ device ZHA ZigBee network and device drop offs

I’ve invested in analysing and optimising my ZigBee network as much as possible to improve stability and reduce drop offs. But unfortunately with over 100 devices I’m still seeing daily issues I need to tend to. I feel like I have a strong router backbone and shouldn’t need any more dedicated routers.

Do I accept that there will always be a few devices to tend to daily or is it possible to have a bulletproof solid network with 100+ devices?

Typical offenders that drop off include:
Tuya soil moisture sensors in the garden
Xiaomi Illuminance Lux sensors (a nightmare to pair again)
Lockwood/Yale locks
Hue remotes

With the exception of the garden sensors, all other sensors are in good proximity to a router or dedicated router.

Other thought is whether I should increase the time before ZHA considers the device to be unavailable. I do get some false negatives especially with IKEA Tradfri remotes.

Attached is how I track the health of my dedicated routers. It’s a two storey house but I’ve just overlayed on the ground level. Only have two dedicated routers upstairs along with my coordinator Sonoff Dongle Max. FYI LQI is higher with the Dongle Max compared with the Dongle P I was using.

Hello JJW.AU,
Happy Forum Anniversary!

All the things you are mentioning are battery powered end devices.
Yes 100 is a lot, Especially if 70 of them are battery or something.
I would look at the mesh and make sure there aren’t any nodes with too many children. I don’t know how many is too many, but look for ones that look like they may be in trouble and add a router there.

Splitting the network into 2 meshes are not going to help disconnects (IMO), if that is what you were thinking. You would have 2 meshes to try to keep the stragglers happy with. If you were having nodes that miss commands, long delays, other dtat type problems then 2 coordinators might be the fix.

2 Likes

Hi @Sir_Goodenough

I got AI to help generate a template to count up my battery vs router devices:

{% set zha_devices = integration_entities('zha') | map('device_id') | unique | list %}
{% set ns = namespace(battery=0, mains=0) %}

{% for device in zha_devices %}
  {# Get all entities for this device #}
  {% set entities = device_entities(device) %}
  
  {# Check if any entity for this device has device_class: battery #}
  {% if entities | select('is_state_attr', 'device_class', 'battery') | list | count > 0 %}
    {% set ns.battery = ns.battery + 1 %}
  {% else %}
    {% set ns.mains = ns.mains + 1 %}
  {% endif %}
{% endfor %}

🔋 Battery Devices: {{ ns.battery }}
🔌 Mains Devices: {{ ns.mains }}
Total ZHA Devices: {{ zha_devices | count }}

Output is:
:battery: Battery Devices: 88
:electric_plug: Mains Devices: 58
Total ZHA Devices: 146

This seems fairly accurate (assumption is all the non-battery devices are mains powered and act as routers - I’ve removed and replaced my Sengled bulbs as I know they don’t act as routers).

It feels like I should have enough router coverage especially with quite a few dedicated routers (there’s a mix of Sonoff, generic Dongle E / P clones, Tuya (?)).

Possibly my next step is to rationalise down my end devices especially temperature/humidity sensors and potentially look at ‘all-in-one’ devices to still get the same entity functions but with less actual devices. Perhaps that will help especially given you can get some pretty decent devices for very cheap off Aliexpress thesedays. They’re also AAA powered which means cheaper running costs and longer battery life.

With some more help from AI, here’s a code snippet to copy-paste into a Markdown card to place somewhere on your dashboard to give you a nice health status of online vs offline devices on your Zigbee network broken down by room so you can easily find the culprits to fix. It also in effect shows you a ratio of routers to battery devices in each room as a byproduct :slight_smile:


{%- set zha_devices = integration_entities('zha') | map('device_id') | unique | list -%}
{%- set ns = namespace(
    bat_tot=0, bat_on=0, bat_off=0,
    main_tot=0, main_on=0, main_off=0,
    rows=[],
    offline_list=[]
) -%}

{#- --- PROCESS DEVICES (Silent Loop) --- -#}
{%- for device in zha_devices -%}
  {%- set entities = device_entities(device) -%}
  
  {#- FILTER: Only look at 'Real' controls. Ignore buttons/updates. -#}
  {%- set real_entities = entities | select('match', '^(light|switch|sensor|binary_sensor|lock|cover|climate|fan)\..*') | list -%}
  {%- set check_list = real_entities if real_entities | count > 0 else entities -%}

  {#- OFFLINE LOGIC: If ANY primary entity is unavailable, the device is Offline. -#}
  {%- set offline_entities = check_list | select('is_state', 'unavailable') | list -%}
  {%- set is_offline = offline_entities | count > 0 -%}
  
  {#- BATTERY CHECK -#}
  {%- set is_battery = entities | select('is_state_attr', 'device_class', 'battery') | list | count > 0 -%}
  
  {#- DATA PREP -#}
  {%- set area = area_name(device) | default('Unassigned', true) -%}
  {%- set dev_name = device_attr(device, 'name_by_user') | default(device_attr(device, 'name'), true) -%}
  {%- set type_str = "Bat" if is_battery else "Pwr" -%}
  
  {#- Update Counters -#}
  {%- if is_battery -%}
    {%- set ns.bat_tot = ns.bat_tot + 1 -%}
    {%- if is_offline -%}
      {%- set ns.bat_off = ns.bat_off + 1 -%}
    {%- else -%}
      {%- set ns.bat_on = ns.bat_on + 1 -%}
    {%- endif -%}
  {%- else -%}
    {%- set ns.main_tot = ns.main_tot + 1 -%}
    {%- if is_offline -%}
      {%- set ns.main_off = ns.main_off + 1 -%}
    {%- else -%}
      {%- set ns.main_on = ns.main_on + 1 -%}
    {%- endif -%}
  {%- endif -%}

  {#- Add to row list for Summary -#}
  {%- set status_str = "OFF" if is_offline else " OK" -%}
  {%- set ns.rows = ns.rows + [{'area': area, 'type': type_str, 'status': status_str}] -%}

  {#- Add to Offline List if dead -#}
  {%- if is_offline -%}
    {%- set ns.offline_list = ns.offline_list + [{'name': dev_name, 'area': area, 'type': type_str}] -%}
  {%- endif -%}

{%- endfor -%}

{#- --- RENDER OUTPUT --- -#}
## 📡 Zigbee Network Health
```text
SUMMARY
---------------------------------
TYPE      | TOTAL |  OK  | OFF
---------------------------------
Mains(Pwr)|  {{ '%-5d' % ns.main_tot }}| {{ '%-5d' % ns.main_on }}| {{ ns.main_off }}
Battery   |  {{ '%-5d' % ns.bat_tot }}| {{ '%-5d' % ns.bat_on }}| {{ ns.bat_off }}
---------------------------------

DETAILS (Offline/Online count)
---------------------------------
ROOM          | PWR (OK/OFF) | BAT (OK/OFF)
---------------------------------
{% for area in ns.rows | map(attribute='area') | unique | sort %}
  {%- set area_devices = ns.rows | selectattr('area', 'eq', area) | list -%}
  {%- set pwr_ok = area_devices | selectattr('type','eq','Pwr') | selectattr('status','eq',' OK') | list | count -%}
  {%- set pwr_off = area_devices | selectattr('type','eq','Pwr') | selectattr('status','eq','OFF') | list | count -%}
  {%- set bat_ok = area_devices | selectattr('type','eq','Bat') | selectattr('status','eq',' OK') | list | count -%}
  {%- set bat_off = area_devices | selectattr('type','eq','Bat') | selectattr('status','eq','OFF') | list | count -%}
  {%- set area_print = (area[:12] + '..') if area|length > 14 else area -%}
{{ '%-14s' % area_print }}| {{ '%2d' % pwr_ok }} / {{ '%-2d' % pwr_off }}    | {{ '%2d' % bat_ok }} / {{ '%-2d' % bat_off }}
{% endfor %}
```
### 💀 Offline Devices
{% if ns.offline_list | count > 0 %}
```text
{{ '%-25s' % 'DEVICE NAME' }} | {{ '%-25s' % 'ROOM' }} | TYPE
---------------------------------------------------------------
{% for dev in ns.offline_list | sort(attribute='area') %}
  {#- Truncate name to 23 chars to fit column -#}
  {%- set name_print = (dev.name[:23] + '..') if dev.name|length > 25 else dev.name -%}
  {#- Truncate area to 23 chars to fit column -#}
  {%- set area_print = (dev.area[:23] + '..') if dev.area|length > 25 else dev.area -%}
{{ '%-25s' % name_print }} | {{ '%-25s' % area_print }} | {{ dev.type }}
{% endfor %}
```
{% else %}
🎉 **All devices are online!**
{% endif %}

Example of what this looks like: