Some lightbulbs drop off (tradfri)

Hello,

I have HAssistant, running on NUC, with ZHA installed and I use tasmotized sonoff zigbee bridge for communication with zigbee devices (all latest avaliable components).
I have managed to add 36 devices (mostly IKEA Tradfri GU10 bulbs, some E27, some xiaomi sensors and buttons).

My problems are:

  • some lights ocassionaly drop from the zigbee network and can not be accessed (turned on or off); they were added to my zigbee network and were recognized with zha component
  • my log file shows a lot of “Discarding %s event” which happens in zigpy (I do not know if this is related and if yes how or what to do to fix it)

image

I have enabled tradfri OTA so I assume they are up to date.

This is my network visualization (text descriptions likely stopped working in some of the recent updates):

I am not sure if I am hitting sonoff zigbee gateway limit on number of devices connected (and if yes, how can I use some of the bulbs as a router, I have tried to add devices through existing bulbs but I can not see the difference later on in the config).

Any ideas on where to look and what to try in order to reliably control all of the lights?

From your network graph picture, it appears that most of your devices (the ovals) are acting as routers. There is a lot of active conversation about bulbs that act as routers and do a poor job and bulbs that do not act as routers. As far as I can tell, there is not a way to configure a zigbee device to change from routing or not without a firmware update. So whatever your Ikea bulbs are doing, you will have to work with.

Two things I thought about that you might consider if you have not already:

  1. what is the link quality of your bulbs over time? are some of them in metal cans that are causing low LQI?
  2. Any chance that some of the bulbs are getting powered off from mains at times? If so, I am not an expert here, it appears that it takes a while (maybe hours) for a network to reconfigure.

I have been experimenting with ZHA and a Sonoff Zigbee Bridge with HA 2021.1.5 currently. ZHA does seem to be ‘evolving’ with each release and perhaps some changes that might have large effect on some network configs. You might follow the issues logs on Github for bellows, and other zigbee components that HA uses. My configuration.yaml section below, have a look at the app layer source routing and review the links for ‘tons and tons’ (there words not mine) of options for especially bellows, sound like you are already reviewing the voluminous logs when you turn on debug, but also below are the logs I have been toggling to watch and learn. I wrote a ragged python app that I have linked below that records the status, links, RSSI and LQI for each link for all the devices on my network to a SQLITE database. I can then look at each device over time and see what I see. It also tries to display a realtime dump of the status of devices, this would be pretty noisy for larger network. As I said it is python program by a poor porgrammer, me… so take it with a grain, but maybe it might spur some ideas for you. You can run it on any linux machine that has visibility to the HA server, you canNOT run it inside HA, but you can run it beside HA on the HA server. I use TMUX to do this, good hunting! :

    # https://github.com/zigpy

    homeassistant.components.zha: warning
    bellows: warning
    bellows.zigbee.application: warning
    bellows.ezsp: warning
    zigpy: warning
    zigpy.ota: debug
    zhaquirks: warning

    # homeassistant.components.zha: debug
    # bellows: debug
    # bellows.zigbee.application: debug
    # bellows.ezsp: debug
    # zigpy: debug
    # zigpy.ota.image: debug
    # zhaquirks: debug

# Zigbee
# https://community.home-assistant.io/t/zha-conbee-ii-source-routing/270711
# https://github.com/zigpy/bellows/blob/3861c9af340763ea030737dd017cb251e7da2ff6/bellows/ezsp/v8/config.py
# https://github.com/zigpy/bellows
# https://github.com/home-assistant/home-assistant.io/issues/15307
# https://github.com/home-assistant/home-assistant.io/blob/f52cc941cea69b4b9c1df680ba7a79009df998b4/source/_integrations/zha.markdown


zha:
  # Used by bellows
  zigpy_config:
    source_routing: true
    ezsp_config:
        CONFIG_MAX_END_DEVICE_CHILDREN: 16
        CONFIG_SOURCE_ROUTE_TABLE_SIZE: 24
    # OTA config. Nothing is enabled by default
    ota:
      ikea_provider: true                       # Auto update Trådfri devices
      ledvance_provider: true                   # Auto update LEDVANCE devices
      otau_directory: /config/zigbee-ota        # Utilize .ota files to update everything else

# python web socket query tool:

https://github.com/deepcoder/ha-zha-query-tools

1 Like

This is a handy utility as well, shows near realtime state of all of your devices, you can sort by LQI, RSSI, Status. But it is ephemeral, which why I went down the rabbit hole of writing a program to capture this over time:

# configuration.yaml
    # https://github.com/dmulcahey/zha-network-card
    - type: module
      url: /local/custom-lovelace/zha-network-card/zha-network-card.js

# ui-lovelace-some-page.yaml

title: ZHA
# icon: mdi:home-outline

cards:

# https://github.com/dmulcahey/zha-network-card
clickable: true
columns:
  - name: Name
    prop: name
  - attr: available
    id: available
    modify: x || "false"
    name: Online
  - attr: manufacturer
    name: Manufacturer
  - attr: manufacturer_code
    name: Manufacture Code
  - attr: model
    name: Model
  - attr: ieee
    name: IEEE
  # - attr: device_reg_id
  #   name: Device Reg ID
  - attr: device_type
    name: Device Type
  - name: NWK
    prop: nwk
  - attr: rssi
    name: RSSI
  - attr: lqi
    name: LQI
  - attr: last_seen
    name: Last Seen
  - attr: power_source
    name: Power Source
  - attr: quirk_class
    name: Quirk
  - attr: quirk_applied
    name: Quirk Applied
sort_by: available
type: 'custom:zha-network-card'


1 Like

Thanks for the detailed explanation.

I was thinking that maybe link quality might be bad, current position of my bridge is basically in the corner of the apartment while later on it will be in a more central location. But in general, distance from two most distant bulbs is less than 10m, and generally, distance between two neighbouring bulbs is 3-4m at most (without physical barier between them).

Bulbs are in metal casing, interesting is that I have four bulbs on a single rail and one has LQI of 200 (which should be near perfect) while other one just centimeters away is at about LQI 100. It also happens that three out of four bulbs work correctly and just one decides to misbehave.

I know that there is a limit of number of devices connected to the bridge and I was wondering if I can control the path between my bulb and my bridge (so I can choose which “router” will be used as intermediate connection). It seems that I can not influence that.

I will catalogue LQI values to see if I can improve connection quality.

But I would like to understand this log error message (it repeats a lot and my google search did not produce any result).

If you power cycle the bulbs, do they work fine for awhile?

no, they remain uncontrolable (although they paired correctly and are registered in home assistant)

If you can go in at command prompt level and make a copy of your home-assistant.log with these errors in them, and then look at the detail entries using a text editor. It is sometime hard to extract the full detail of log entries from the HA user interface to the logs. From what I find at github from the info you showed on the log entry. These multiple repeating errors seem to be something around the small SQLITE database that ZHA uses to keep track of the zigbee network and devices. Again hard to tell what is going on without more detail of log entries. Is there a problem with the SQLLITE database, that might be, or is something happening so fast in ZHA that ZHA can not write to database. A number of possibilities. The zigbee.db is a small database and does not keep data over time.

From the graph, you network does not look ‘unhealthy’, but devil is in details.

Remember that LQI is a point to point measure so you will have one or two LQI values for each neighbor that a device is connecting to. You get a LQI value for a connection TO a neighbor and possibly another LQI value for a connection FROM a neighbor. So for each bulb you will a ‘cloud’ of LQI around it grin. I have some device that have a really crappy LQI value to another far away device, but good LQI with other closer devices.

Based on your description, it does not sound like you have any signal barriers, walls, distance, etc. playing into problem.

You can add devices ‘VIA’ other devices. I am again a noob here, but this appear to be a way to hint to your network the relative placement of devices. Your coordinator and routers will still change routes on their own and it does not appear that you force any particular route to stay in place. But again I am learning.

To your question about firmware versions of your bulbs. See picture below, do your bulbs show firmware version similar? So devices do and some don’t on ZHA in my experience. Are they all the same? I have yet to see a firmware upgrade occur on my ZHA network. You can turn on the debug level for this log entry to see OTA detail, so something might be of interest there if you bulbs are not all at same firmware level:

logger:
  default: warning
  logs:
    asyncio: info
    homeassistant.core: info
    zigpy.ota: debug

There should be a new firmware for EFR32MG1P in the sonoff bridge that fixes some issues. Look for 6.7.8 on the tasmota website.

Also enable source routing in zigpy.

Add the following to your configuration.yaml:

zha:
  zigpy_config:
    source_routing: true
1 Like

Curious, if you do this firmware update to the Sonoff Zigbee coordinator do you lose your network and have to rebuilld it? Thx!

No, you keep the network. You can however backup everything before hand:

You can even restore the backup on a different hub, or EZSP stick (Nortek GoControl HUSBZB-1 / Elelabs ELU013) or even TI stick and migrate your network seamlessly.

Good to know, thanks for the info and link!

Now, that is a real option :wink:

--i-understand-i-can-update-eui64-only-once-and-i-still-want-to-do-it

Thanks, I noticed new firmware but it was “release candidate”, I see now that readme suggest 6.7.8 as preffered firmware. Will update.

What is the difference between current networking and source routing? What will change?

When I download only zigbee firmware, tasmota reports error (file signature error or something like that), I have done OTA update and I can only guess that along with tasmota part, zigbee was updated as well.

I have moved to source routing and used your ezsp_config data and after some reseting and rejoining my bulbs seem to work fully ok now. I will test this a little bit more in the coming days but currently I have 100% of desired functionality which is great!

Now when I look at my zigbee network some bulbs are connected to other bulbs and they are connected to the bridge (which is what I wanted to achieve), which basically means that either when I hit 32 zigbee devices on sonoff bridge I hit a limit of some kind OR that change to source routing meant something (I have read some articles about source routing but franky could not figure out difference between default and source routing (in zigbee network)).

I have not managed to create card for cheching all zigbee devices in one spot, but I will likely do that in next few days (or at least try).

I will report back in a day or two to confirm that my setup still works as expected.

Thank you!

Can I kindly ask for someone to explain what ‘source routing’ is? I have already attempted to search for this info but have come up short. thanks in advance.

That is some super news! Nothing better to do that ‘I’m bad’ dance in front of the significant other, after she was giving you the ‘eye’ as the lights flashed on and off :grinning:

Do you attribute the improvements to the Sonoff Zigbee bridge firmware upgrade or to setting the source routing at the ZHA level, or both or ¯_(ツ)_/¯ ?

Congrats, tech that works, what a concept!

Maybe this write up might help. I am still learning, but as I understand it Zigbee networks allow for several different routing methods for the network and it’s devices to decide on how to get a packet from the source node to the destination node (or nodes). Some of the methods are old and deprecated, I think there is a tree routing one that is for the books.

AODV = Ad-hoc On-demand Distance Vector

Digi - Source routing
https://www.digi.com/resources/documentation/Digidocs/90002002/Concepts/c_zb_source_routing.htm?TocPath=Transmission%2C%20addressing%2C%20and%20routing|RF%20packet%20routing|Source%20routing|_____0#:~:text=Zigbee%20source%20routing%20helps%20solve,specify%20routes%20for%20many%20remotes.&text=A%20remote%20device%20sends%20an%20RF%20data%20packet%20to%20the%20data%20collector.

If I have to guess, I would say that change of routing parameters solved the problem.

I am not entirely sure that I have upgraded ezsp firmware (I do not know how to check that, firmware version is not reported in HA, maybe there is some console command for tasmota to get that info, will check).

p.s. One other conclusion from two days of playing with lights is that they perform much better if lights are grouped on ZHA level vs. home assistant group level.

Thanks for the link. I sorta understand it but not 100%.
What I could not glean from the link is if it’s ‘better’ to have source routing on?
My ZHA network with approx 30 devices is very stable so not sure that I want to tempt fate by turning source routing on unless there is a specific advantage.
thanks

Your conclusion that control at the ZHA level of commanding groups of zigbee devices makes sense. I think this was/is one of the big selling points of zigbee for pro lighting, low latency and devices moving in unison.

Here is something you can try to get version, but be careful! I am not sure how cool it is to execute zha bellows commands within HA while HA is running it’s own bellows instance, I have received a python timeout error a couple times but HA and ZHA seem to recover and continue. This shows that I am running 6.7.6.0 firmware in my Sonoff Zigbee Tasmota’ed hub. I am running HA in a docker container, so I can basically ssh into it via docker command line command or portainer gui as shown. The version is also dumped in the log if you turn the right debug on, shown below:

# https://github.com/zigpy/bellows


export EZSP_DEVICE=socket://192.168.2.190:8888

bellows info

[60:a4:23:ff:fe:00:00:00]
[0x0000]
[<EmberNetworkStatus.JOINED_NETWORK: 2>]
[<EmberStatus.SUCCESS: 0>, <EmberNodeType.COORDINATOR: 1>, EmberNetworkParameters(extendedPanId=cc:cc:cc:cc:e3:ab:00:00, panId=0x3498, radioTxPower=20, radioChannel=11, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)]
[<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.64|32|HAVE_TRUST_CENTER_LINK_KEY|8|GLOBAL_LINK_KEY: 124>, trustCenterLongAddress=60:a4:23:ff:fe:00:00:00)]
Manufacturer: 
Board name: 
EmberZNet version: 6.7.6.0 build 327

2021-01-25 09:03:59 INFO (MainThread) [bellows.zigbee.application] EZSP Radio manufacturer: 
2021-01-25 09:03:59 INFO (MainThread) [bellows.zigbee.application] EZSP Radio board name: 
2021-01-25 09:03:59 INFO (MainThread) [bellows.zigbee.application] EmberZNet version: 6.7.6.0 build 327