Zigbee (ZHA) general understanding when creating network

I think it depends on the device. My understanding is that routers will find their own connections, so adding them through the co-ordinator is fine as that will change over time. End devices may stay where you put them, unless the parent router becomes unavailable - I have read that this varies with the manufacturer - so it may be worth adding them via the nearest router with a strong signal.

One thing that affects network performance is the proportion of routers to end devices - you don’t say how many of each you have. There isn’t a magic ratio - it will depend on your home and the devices you have (lightbulbs are not necessarily the best routers), but it may be worth adding a “repeater” router in each room - that is, a router that doesn’t do anything else.

I assume that you already have your SkyConnect on a USB extension cable to avoid interference. A longer one than the one in the box doesn’t do any harm - mine’s 2m.

I wouldn’t be in too much of a hurry to change channels. Some people report good results, but it isn’t a magic bullet, and you have to re-pair all the devices, which is a pain. You also need to be aware that zigbee channels are not the same as wi-fi channels. There’s a post about it here.

The trouble is, a hub like Philips Hue has a much bigger antenna than your SkyConnect. You need to build a stronger network to compensate. “Rock solid” takes a while to achieve, and it isn’t entirely under your control - the network has a life of its own.

I assume you’ve looked at this:

1 Like

Note that not all products are Zigbee Router devices, however, as a general rule most mains-powered products are Zigbee Router devices (but not all), while all battery-powered devices will always be end devices (e.i. non-routers).

As mentioned, suggest start by reading and try following all the tips collected here as that should take care of most things → Zigbee networks: how to guide for avoiding interference + optimizing using Zigbee Router devices (repeaters/extenders) to get best possible range and coverage

It is primary a feature because there are some devices with buggy firmware that will not automatically move to a closer Zigbee Router device, but it can also be used because it can make it easier to add/join/pair a device that is far away.

By design all Zigbee devices should constantly evaluate its neighbours and and automaticlly migrate to a better Zigbee Router, and most devices will do so about once every 24-hours if are better choice for it.

So you can not force a device to keep staying with a specific Zigbee Router unless it has a badly written firmware with a such bug that prevents it from migrating.

You should not really need to bother with how the Zigbee network meshes, it should all be done automagically, and the only thing you should need to consider is adding enough Zigbee Router devices.

Normally you have no control what so ever over how a Zigbee network meshes, as the only thing you can and should do is to add more Zigbee Router devices.

3 Likes

Thanks to both of you for input. Much appreciated

To update with some input. I do have the Skyconnect on 2m usb extension cord not on usb3.

As for devices. I have as rule of thumb a smart plug (Aubess) in each room, plus some main lights and then a battery operated button e.g. Also in some rooms I have an IKEA repeater as well. But I didn’t consider that some main lights or other routers perhaps wasn’t good at it. Is the IKEA repeater a good choice for a router and the smart plug ?
When I do look at the zigbee network overview there is a lot of red lines which there shouldn’t be as everything should have a short distance to a router etc. So perhaps there is something to it with the type of router I use.
The main devices I use is philips lights, e14,e27, light strips, buttons, sensors) and Ikea e14, GU10, buttons, repeater ) and then I have the Metered Aubess 16A smart plug as mentioned in each room.

Regarding wifi, I Did read up a lot on Unfi network and zigbee so as mentioned I have moved all AP’s to channel 1 & 6 and away from channel 11. So zigbee 15 should be inbetween wifi channel 1 & 6 but that’s why I thought about zigbee channel 25 as it should then be free from my wifi disturbance. But again it’s big hassle to redo. Again I don’t know why I have so much interference and delays when I don’t have any neighbours that can irritere me. But yeah maybe I should stay on 15 and see if I should look more on the device part!

I have almost managed to get rid of my red lines (not quite), but as the network changes from hour to hour I’m not sure that’s completely achieveable - they come and go.

For comparison, I have five smallish rooms spread over three floors. The co-ordinator is on the middle floor. On each floor I have a “repeater” router, plus three or four sockets/smart plugs and around five lights. That seems to produce a network strong enbough to support buttons, dimmer switches and sensors.

I found it helpful to create an entities card with the LQI of all my routers.

The numbers individually are not very useful and they change all the time, but it does give you a real time overview of the network’s health - I find that when I add a new router, a couple of hours later they have all increased a little.

You mention Philips buttons and sensors. If you’re having problems with the motion sensors this may be the sensors themselves rather than the network. There is a known issue where they leave the network, then try to rejoin as a different device (which the network doesn’t allow). There is a simple workaround here.

1 Like

Thanks again for input.
It’s a good idea to create a card for the entities. Will try that. What devices do you use for routers besides the smart plugs. Anything you can recommend. ?

I know all to well about the Philips sensors. I am currently replacing them with esp presence sensors instead.

Note that if you have Zigbee lightbulbs make sure that they are never powered off via an off switch. Zigbee Router devices need to be always available as otherwise devices connected to them will loose connection until they recover (which usually take an hour or so) and that will mess upp your mesh network.

IKEA Trådfri Signal Repeater is a good Zigbee Router, while their IKEA Trådfri Wireless Outlet Plug is not a good router but that could be compensated by having a lot of them.

If you want some great Zigbee Router then flash Sonoff ZBDongle-E and ZBDongle-P USB dongles with Zigbee Router firmware and then power them with USB changers.

1 Like

Should maybe add that this is another reason why I and many others prefer to use smart switches (that replace dumb switch) to control dumb lightbulbs (instead of using dumb switches that can switch off and on smart lightbulbs).

Unless you specifically buy Zigbee lightbulbs that are designed to only be Zigbee End Devices (and not Zigbee Router devices) thyen you never want to have a smart lightbulb connected to a dumb switch, as then you risk someone switching it off.

Zigbee lightbulbs can still have their place in your home if you remove any dumb switch from them, for example for stand-alone corded stand lights like window lights and floor lights that have a dumb switch on their cord that can relatively easily be removed.

Since all smart switches are mains powered most of them also act as a Zigbee Router, which is great because if you replace all your wall switches then you should get very good coverage in your whole home.

The main reason why you may prefer to use smart switches is that it enables you to still manually control each individual lights/switches, even when Home Assistant is not available.

I second this. I did start my smart home with smart lights everywhere to find out that I actually wanted smart switches all along. I am slowly trancending to switches in some rooms where applicable. But if I could redo my house it would for sure look differently. And yes, it was a pain in the beginning telling people not to turn of the dum switches :slight_smile: but they get it now. They use voice or lights mostly everywhere turn on by motion or presence. Thanks for the input.

1 Like

Got so many good inputs here, don’t know how it works do I have to point to a solution or can I leave it. There were several comments with valuable feedback on this, so wouldn’t want to only point to once specific if not needed.

Following ALL the tips here SHOULD very likley resolve most issues or at the very least rule out other common issues → Zigbee networks: how to guide for avoiding interference and optimize for getting better range + coverage

The exception is if really if you have one or more bad Zigbee Router with buggy firmware that is not passing along all messages correctly, if that is the root cause then you need open a new issue to Home Assistant core repository on GitHub and provide debug logs for diagnosics analysis to get help determen which is the bad Zigbee Router device(s) → https://www.home-assistant.io/integrations/zha#reporting-issues

Don’t worry about it - lots of ideas here for all of us. :grin:

1 Like

Thanks for the tips above. In honesty, the ZHA network is still the least reliable part of my HA setup and a regular cause of frustration.

E.g. this morning my family told me that the zigbee lamps (ikea) that are supposed to be triggered to turn on with the ceiling light switch (sonoff), just stopped working. In fact everything zigbee had just stopped working. Christmas lights all off etc.

I have now done specific things to mitigate the issues, but they don’t permanently fix anything. For example putting the sky connect stick on a long cable near the middle of the house, and adding an ikea zigbee repeater and an ikea outlet (which I understand acts as a zigbee repeater router too), within 1m of the sky connect. My idea was that these would act as redundant repeaters routers and make everything else super reliable. But I was wrong. This morning they were both offline.

Rather than inviting people to list lots of suggestions like ‘change channels’, ‘don’t use usb3’, ‘leave everything on’, I’d like to list my observations and what I conclude from them. I’d really value specific comments if my conclusions are wrong, and also why. Please don’t take these as complaints. My aim is to clarify them - possibly these are bugs that could be fixed (and I’m willing to help).

  1. Observation: Most of the time I can resolve things by restarting the ZHA integration within HA
    Conclusion: The ZHA integration is not sufficiently able to detect network problems and resolve them by itself in all cases in a timely manner.

  2. Observation: Clicking reconfigure on a device in HA often results in a ‘reconfiguration complete’ success message, but the device is still inoperable via HA and its last seen time is still >12h in the past.
    Conclusion: ‘Reconfigure’ doesn’t verify the wireless connection to the device. A green success message does not mean that it is fully working.

  3. Observation: Some devices only seem to rejoin the network (e.g. my two coordinators routers above), if I either power cycle them or press the button with a paper clip.
    Conclusion: The combination of my ZHA devices, their firmware, the sky connect stick, its firmware, the ZHA integration or HA is not able to coerce some devices to rejoin the network in a timely manner. Given that the network can be reliable for days then fail into this state and not self recover, I interpret that this is nothing to do with locations, number of devices, or wireless channel configuration. Simply that this combination is unable to self recover (for some reason)

  4. Observation: The ZHA ‘network visualisation’ shows some devices with an orange line. But these also show as ‘offline’ on the visualisation, display as ‘unavailable’ in ZHA, and have a ‘last seen’ value of e.g. >2 weeks ago.
    Conclusion: The network strength line is not guaranteed to be a live or even recent value. It represents what the strength was at some point in the past, even though that connection may now be lost.

  5. Observation: Updating the HA version on my raspberry Pi 4 almost always causes some or all devices on my zigbee network to go offline requiring some of the above steps to get them back online.
    Conclusion: the combination of ZHA, skyconnect stick, my devices and all of their firmware, is not able to reliably cope with the [network restart?] sequence that occurs during a HA version update.

Anyone got any thoughts or ideas on the above? Have I misinterpreted something?

Sorry to hear you’re having so much trouble.

I agree with your conclusion. However, I’m not sure that detecting network problems is really the function of the integration. Zigbee is by design self-regulating and self-healing - a bit of a black box. ZHA allows you to pair, group and bind devices, and provides controls for monitoring them and turning them on and off. It can’t resolve problems which the devices themselves should be addressing.

I’ve probably misread this, but “two coordinators”? You can only have one per network.

Again, I agree with your conclusion, but a Zigbee network is self-regulating, you can’t coerce it. That’s not what Home Assistant and ZHA are supposed to do.

The ZHA network visualisation was originally a separate card - now depreciated and absorbed into the integration. There’s a post from the developer here in which he describes the lines between devices as showing all the possible paths through the Zigbee mesh. As far as I know, this is still the case with the current integration, so your conclusion is correct.

This has not been my experience (ZHA and Pi4 like you). Some battery-powered sensors may show as unavailable for a few minutes, but they all respond as soon as they have something to report (movement, door opening, etc.). My conclusion is that they’re just sleeping (as they’re supposed to).

Not much help, I’m afraid - sorry. When I started Zigbee I was rather taken aback by the fact that it seemed to have a mind of its own (and also, I confess, by the number of “magic” solutions being passed around on the forum). Still, the “more routers, less interference” approach did seem to work. Nowadays, when something goes wrong the usual suspects are individual devices not following the spec properly.

An afterthought:

Do you have many Zigbee groups? Apparently it’s very easy to overload a Zigbee network by sending a lot of group commands, particularly in quick succession with an automation. The reason is that, whilst messages to devices follow the most direct route possible, messages to groups are blasted out to every node in the network just to make sure every member of the group receives them. (This is a Zigbee thing, not ZHA.)

Thanks for your reply!

Ok, fair enough, but if restarting the integration fixes it, that suggests either something is wrong with the integration, or the act of restarting the integration triggers something else (such as restarting the network) which helps to fix things. Either way I’d want the integration to be a bit more helpful, by telling me about the problem and potentially attempting to solve it.

Ok, so that bit’s clearly not working in my case. In practice I find that even remotes bound directly to lights stop working when my network goes down, so it’s like the network itself is broken, and all communication stops. It’s not just ZHA thinking it’s broken.

Sorry, my bad, I meant routers, corrected.

None with multiple zigbee devices. I created some groups with just one device for convenience of automations (I find it easier to edit a group than a complex automation with lots of rules)

But following on from your point, if it’s an issue that simultaneous messages break the network, doesn’t that mean the coordinator firmware needs to be improved, perhaps by spacing them out or waiting for the network to be quiet after each one?

Where I’m heading: I feel that way better diagnostic information is needed. Which devices does ZHA or the coordinator actually think it has a working connection to right now? Which has it given up on and when/why? Is it trying to self recover? How is it getting on? Does it need me to intervene? Perhaps with that information we could find patterns then maybe experiment by blacklisting a specific router or device.

It feels highly likely that I have some misbehaving devices/firmware. But that on its own is not enough to help. If any of my 28 devices could break the network, then I’m making my network less reliable and harder to debug the more I expand it. This works against the ‘more devices=stronger network’ approach, and I don’t have any tools to fix it.

Just had a brief look at Certified Products Search | IOT - CSA-IOT which seems to be where you can check what is ZB certified (many of my devices aren’t). It feels like we as a community could do more to help the average consumer here. Are there any open source compliance testing suites?

Zigbee is an open standard, which means anyone can put it on the box - there’s no certification (as there is with z-wave, for example).

Sorry to be boring, but as your problems are so widespread it seems unlikely to be one or two devices (though they may not be helping!). I would go for the “more routers/less interference” approach. Check your wi-fi channels (bearing in mind that the numbering for wi-fi and zigbee channels is not the same). Over time remove all your devices (routers and end devices) one by one, starting with the one furthest from the co-ordinator, until you reach a point where the network is stable. Add an extra router at that point then work back out adding routers when you encounter instability.

28 devices is not many - if about half of them are routers they may be spread too thinly.

I also struggled a lot with my zigbee network the last months and was nearly giving up on it but then i did the following and that really saved the day for me:

I performed energy scans of the zigbee channels. It is very easy to do in ZHA:
Settings → Devices → SkyConnect (or whichever ZHA device) → ⋮ menu → Download diagnostics

At the bottom you can see a list of all channels and how busy they all are at the moment. There i saw that the channel that i was operating on the wohle time was really busy with > 80% all the time. I then switched to another empty channel and voila … no problems since then whatsoever!

So apparently something i dont know interferes heavily with some channels but not with others. Maybe that is also the case for you?

Probably your neighbour’s wi-fi… :roll_eyes:

Nope…my next neighbours are really far away and a scan for wlan networks does not show any sending at the lower channels. the interference on that low channel must be comming from somewhere else within my home, but good luck to find that source… hovever it is suficient to know what channels to avoid as long as there are some remaining that are free :slight_smile:

Indeed, it could be that I have a channel interference issue that makes my network stop working for a short period.

It works for days or weeks without problems, so I’m certain that I don’t have some background continuous disturbance that’s blocking everything.

But in the event of a failure, my observation is that the self recovery does not work in practice.

Part of my job involves EMC testing where we deliberately subject things to interference to see what they do. For immunity testing there are 3 possible criteria for the item being tested:
-A: keeps working during the test
-B: stops working during the test but self recovers afterwards
-C: stops working during the test and needs user intervention to recover

I thought zigbee was supposed to be B. But in practice for me it’s C.

So for a moment, please ignore interference. Let’s assume something disturbed my network and now that disturbance has gone. Now I’m expecting the zigbee mesh to self heal and find a new route, for it to re-find its routers. But it doesn’t. It sits there for days in a broken state, until I do one of the above interventions. Then it immediately works.

My conclusion: the self-healing feature is broken (in my combination of coordinator+devices+ZHA+HA) and needs fixing, regardless of what anyone says about interference.

So my feeling is - to get a permanent reliable solution, I should look at what should be happening during self healing, and try to work out why that’s not happening. To me this has almost nothing to do with 2.4GHz channels.