Zigbee & third reality device off-line repeated cycling & login failure

I’ve had this weird (to me:-) ) sequence of events. I’ve been discussing it with the ZHA team but it has its challenges and now I wonder if it is completely an integration issue or not. I do have this as an open issue on the zha integration page - but wanted to expose it to a broader audience in case anyone had some ideas.

The zha issue discussion is at (for further background):

I have about 20 zigbee devices, multiple brands and device types. Of those 20 - 10 are Third Reality Smart Power Monitoring plugs.

They worked perfectly for many months, and then they started going off-line and then back on-line. All of them at once , and all together each time. Nothing else goes offline though. Here is the log of them doing this:

2024-01-28 15:38 - goes offline for 21.9 hours
2024-01-29 13:35 - comes back online for .7 hours
2024-01-29 14:15 - goes offline for 52.2 hours
2024-01-31 18:24 - comes back online for 40.5 hours
2024-02-02 10:57 - goes offline for 110.5 hours
2024-02-07 01:24 - comes back online for 43.5 hrs
2024-02-08 21:05 - goes offline for 122 hours
2024-02-12 03:38 - comes back online for 31 hours
2024-02-13 10:00 - goes offline (time approx - might be with 2024.2.1 upgrade)

Nothing else seems to be happening when this goes offline or online - I can’t see any pattern of other events.

I just upgraded to 2024.2.1 today (Feb 13 2024) to give me the ota firmware upgrade capability - but I need the devices to be online before I can do a firmware upgrade - so I’m waiting for them to come back online sometime.

I’m skeptical about this being a firmware issue as I know of several instances (both personally and through online discussion groups) of people with similar configurations to me that do not have this problem. Matter of fact, it would appear I am the only person on any HA forum to have this problem.

Recently I noticed another item related to this, but I’m not if there is a cause & effect relationship here. The zha team asked me for the integration “download diagnostics” file from when the devices were online and when they were offline. When the devices were online, no problem, I got the diagnostics file and posted it to the issue forum. However, when these devices were off-line - I could not get a diagnostics file. Using a chrome browser session, it tried to download the file - but it reported that it didn’t exist. At the same time that I tried to do this, I got an HA notification of a failed login. I can repeat this behaviour - login error and no diagnostics file when these devices are off-line - all good when the devices are on-line.

And the log shows:

Logger: homeassistant.components.http.ban
Source: components/http/ban.py:129
Integration: HTTP (documentation, issues)
First occurred: 8:59:12 AM (4 occurrences)
Last logged: 10:52:04 AM

Login attempt or request with invalid authentication from localhost (127.0.0.1). Requested URL: '/api/diagnostics/config_entry/16f8552c2e8d924ff5a1b58eabde4e02?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3Mi…

I don’t know if the diagnostics file issue and the device offline issue have a cause & effect relationship (and which would be the cause side)- but they always occur at the same time.

One other piece of information - I am doing this all remotely. The house has been empty since November and will remain so until mid April. Thus I have certain limitations as to what I can do (i.e. I can’t physically touch or reset anything )

If anyone has any ideas on how to further troubleshoot this - I am all ears!

thanks,
Ken

There is no such thing.
It’s ZHA (the builtin zigbee support) or Z2M (zigbee2mqtt, a 3rd party project).

From the bug report, I understand you’re using ZHA

You nailed it my opinion. ZHA gave me issue with ThirdReality. ZigBee2MQTT works fine.

Correct - I have edited and corrected.

What kind of third reality issues did your have?

I have this issue with Z2M. All plugs are on the latest .74 firmware. For example, a plug that is directly connected to my coordinator (SLZB-06M) with an LQI of 194 goes offline sometimes. Zigbee commands sent to it all time out. This also happens with plugs that are connected to other routers. Some plugs seem to get temporarily fixed if I unplug them and plug them back in.

Z2M reports all of my plugs as 3RSP02028BZ.

Weirdly enough, I’ve also found that restarting Z2M appears to help sometimes. Similarly, restarting my coordinator helps sometimes. Other times, nothing helps and the plugs are just dead to the world. I’m beginning to suspect that the rapid power cycling issue that was fixed in the .71 firmware may have caused other problems, because I don’t recall this happening with my plugs when they were on the older versions.

zwavejs-ui has a test release that supports z-wave LR, and I’m honestly quite tempted to just transition my problematic plugs to z-wave.

Edit: although it appears that it may be an issue with Z2M 1.35+: A LOT of traffic and errors after 1.35 upgrade - "No network route (205)" and timeouts · Issue #20526 · Koenkk/zigbee2mqtt · GitHub

I concur, I am seeing these 3rd Reality plugs disappearing from Zigbee2MQTT version 1.36.0 since doing the firmware upgrade to v1.00.74. Model : 3RSP02028BZ.

1 Like

I am having issues with these plugs as well. I have over 110+ ZigBee devices (65 routers & 45 end devices), 17 of which are the Third Reality Smart Plugs Gen 2 (3RSP02028BZ) all on firmware 1.01.01. I am using Zigbee2MQTT.

Controller

Initially I thought my SMLIGHT SLZB-06 controller was hitting it’s upper limits of too many routers so I upgraded to the SLZB-MRU4 which did improve LQI but I continued to have the plugs dropping.

Zigbee Channel

Through some research, I discovered that in the United States, the FCC requires reduced transmit power levels on channel 25 and 26 (Source: Page 3, section 2, Wi-Fi Impact on Bluetooth and 802.15.4 Radios). Furthermore, in the past with older Zigbee devices were known to have issues on channel 25. So I switched to channel 20.

I’m not sure what helped more, the new controller or switching to channel 20 but my LQI has greatly improved.

Zigbee2MQTT log errors

Nevertheless, the issue persists, with the plugs dropping off. I get these errors in z2m when trying to toggle them on or off:

z2m: Publish 'set' 'state' to 'Office Plug' failed: 'Error: ZCL command 0x282c02bfffec5f1b/1 genOnOff.off({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (Timeout - 27830 - 1 - 92 - 6 - 11 after 10000ms)' 

Interference

While repairing all my devices, I had issues repairing the Aqara Temperature Sensors (WSDCGQ11LM), no matter what I did for 2 days, I could not repair them. Then I unplugged all my Third Reality plugs and sure enough I was able to reconnect all my Aqara Temperature Sensors.

Conclusion

I really think there is a firmware issue with these plugs or possibly a design issue. I am going to transition to the new Shelly Gen 4 plugs with Zigbee and see if that helps.

Zigbee routing continues to be a black art for me, it can really get you to the point of complete ‘throw the whole f$#ing thing out’. That said, after too many years of farting around with mesh networks and getting grey, my experience tells me that 80+ percent of my issues are caused by PEBKAC :disguised_face: . A golden rule for Zigbee is ‘Make one and only one change, then let the mesh network adapt for a period of time. Evaluate the results. Then either revert or move forward next single change.’ This is so hard to obey, however my experiences have taught me that not following is recipe for eventual failure and frustration.

My two Zigbee2MQTT networks (Coordinator ZStack3x0 Revision: 20230507) each with 10+ of the 3rd reality power monitoring plugs with version v1.00.98 firmware are stable after going around and reseting, re-adding and re-interviewing each of the 3rd reality devices using the ‘golden rule’.

Prior to this ‘fix’, I had upgraded the 3rd Reality plugs to v1.00.98 firmware and then violated the ‘golden rule’. My ‘gut’, with little evidence :clown_face: is that this firmware on these plugs retained routes to devices that I had removed and seemed to never purge them. My little evidence was that I could run the Zigbee2MQTT network ‘map’, in ‘data’ output mode (requires newer versions of Zigbee2MQTT) and see these routes to ‘gone’ devices still in the 3rd reality plugs.

After doing the above mentioned ‘reseting, re-adding and re-interviewing’ using the golden rule on each of the 3rd Reality plugs, the network now performing nicely.

Again I do not have as solid evidence as I would like to say this is the ‘answer’. However, I do note that 3rd Reality has a new firmware version out for these plugs. I have yet to try it on any of the plugs. And is the case with most Zigbee produce vendors there are no release notes that I can find (even those vendors with release notes, I find often do not have any kind of details to really pinpoint changes to specific problems).

To give some props to 3rd Reality, they seem to be watching their products in use and listening for issues. They release firmware upgrades that seem DIY and especially Zigbee2MQTT friendly at a useful pace.

Blah blah blah… all that said, IMHO I would try what I described above before ‘rip and replacing’.

Good hunting!

1 Like

Appreciate the detailed response. I am definitely approaching the point of: throw the whole f$#ing thing out!

Routing Table errors

I suspected that devices were retaining old routing tables and I took your recommendation and checked the Zigbee2MQTT network map with data output mode. Sure enough all 17x Third Reality plugs have this caution triangle with Failed: routingTable message:

The only other device that also throws up this error are 2x cheap Tuya 3 channel USB switches which isn’t surprising or important.

Golden rule

Curious, if you could expand on the golden rule and how much time between adding devices? For example, do you recommend I remove all the Third Reality plugs from Zigbee2MQTT and pair them one at a time, how long do you recommend I wait between adding each of the plugs?

Appreciate the help!

1 Like

I did a little research at zigbee2mqtt issue on github on the ‘failed: routingtable’ message. Again, I am FAR from an expert on zigbee and zigbee2mqtt, however from what I found on this specific message is that it is a warning from zigbee2mqtt that while not good it is not a deal breaker. As I understand it, the zigbee api (perhaps 3.0 version) has an api call where the coordinator can ask a router device to share it’s current routing table, apparently the 3rd Reality plug does not respond to this request. I have other router devices that also do not ‘share’ their routing table and show this error. From what I found in the Zigbe2mqtt issues discussion, this is not a fatal issue. If you look at the output if the ‘text’ version of the zigbee2mqtt map, the program seems to be able to deduce all the routes that routers ‘hold’ even without a response to this api call. As I indicated in my review of the output that you show, I found routes in my 3rd Reality plugs that were not valid, aka to non existent devices. After I did my ‘reset, re add and re interview’ step one by one on the 3rd Reality plugs, I was able to purge all these phantoms.

To the amount time between changes, so far my ‘seat of the pants’ number is a 5 minute minimum between changes as long as I have all my router device powered on. There is really nothing you can do about ‘end devices’, especially battery powered ‘end devices’. The wake up and ‘report in’ time delta is up the manufacture of each of these device and they rarely publish this info. So I focus on the router device and getting them stable and connected.

I did not remove the 3rd Reality plugs that were showing offline as a reset each of them. I just worked my way thru each unit. Since doing these, I no longer see any of the plugs going offline. Again, a ‘dark art’ for sure!!! I am pretty sure when the first ‘mesh network’ was documented, they incorrectly spelled MESS as MESH :wink: .

Good hunting!

All your posts have been extremely helpful mate! Zigbee is certainly a mess but luckly, my issues are only limited to Third Reality plugs. Go figure a router device that doesn’t adhere to the proper Zigbee standard…

Just so I get this right, here are the steps I am going to attempt, please correct me if anything is wrong or you have a better recommendation:

  1. Open Zigbee2MQTT, locate the plug, click Remove device.
  2. Go over to the physical plug, make sure it’s on, click and hold the On/Off button for 10 seconds.
  3. Back on Zigbee2MQTT, click Permit Join All.
  4. Once plug successfully pairs, Interview the plug.
  5. Wait 5 minutes.
  6. Repeat next plug.

Appreciate the help!

I have to acknowledge my appreciation as well, for this supportive head-banging you both have recently done.

I’m absorbing/learning, but I wanted to add that I see the same routing issue with the 2 INNR bulbs and 6 INNR outlets in my system. Things are very stable, however. Well, at the moment. :slight_smile: (outlet firmware ID: 1.9.29, 20250210)

That is basically it, wish I had a concrete formula that I could play back. If you go the total removal path, you might have to ‘force the removal’ in Zigbee2MQTT, since the device is ‘off line’ as far as Zigbee2MQTT is concerned.

My steps as my poor memory allows :
I did not remove the device in zigbee2mqtt. Rather, I put zigbee2mqtt in pairing mode and then ‘long pressed’ the button on the 3rd Reality plug. I then let the plug ‘re pair’ with zigbee2mqtt, after it appeared to ‘complete’, I took zigbee2mqtt out of ‘pairing’ mode. I let the network sit for a minute or so, then I ‘reinterviewed’ the plug, let it sit for a minute or so. Then I moved to my ‘golden rule’ wait period and had a beer :beer: . Then moved to the next plug, two ‘sixers’ of :beer: later in the repeat mode all the plugs were back on line and I totally :face_with_spiral_eyes: sloshed and feeling ‘whatever’…

With no real facts to support, however my gut feeling is that some routers get rather confused (maybe a particular firmware rev in the 3rd Reality plug) when you maybe add one or more routers while also removing one or more routers, without a ‘settling’ period between adds and removes, or maybe even with. Maybe it is if it is the same router device, but being ‘routed’ to via a different route. Many the ‘wrong’ route get ‘stuck’ in the device, Total guess.

Good hunting!

Appreciate the explanation.

Removing 4 plugs that were only being used for power monitoring really helped with the other plugs becoming Unavailable. I’m going try your method over the coming weeks and see how the ZigBee network stabilizes using your method.

1 Like

Hoping you get a stable setup! When it works it works :microphone::droplet:

Hoping you are able to get back to using the 3rd Reality power monitoring plug for power monitoring. With a stable network they seem to do a good job at power monitoring for ‘back of the envelope’ level watt hour data. They do not update very often by they default setting in zigbee2mqtt, but if you need a little faster power updating you can vary the setting for each device in zigbee2mqtt. Don’t go overboard however, none of the ‘consumer level’ power monitoring devices, zigbee or not, will capture ‘real’ power usage. You need to sample at 100’s or 1000’s of samples per minute to get accurate values, and that will never happen at a USD 10 price point for a monitoring device. 90%+ of home level powered device vary their ‘power levels’ using what is known as ‘bang bang’, basically turning on and off, very hard to get the true ‘data under the curve’ without a very high sampling rate.

2 Likes

Completely agree, not looking for accuracy or even speed, just looking to be aware what certain devices are consuming and for some automation to detect when they are actively used.

Thank you again for the pointers and your help!

1 Like

I did find the 3rd Reality firmware release notes, link below. Nothing noted for the plug we are discussing and our issue, however at lease 3rd Reality does do bug fixes and is somewhat transparent. Better than many vendors.

With doing firmware updated in Zigbee2MQTT, I have found that if a given OTA update fails, try looking for a newer release of Zigbee2MQTT, this often solves OTA failures for me 3rd Reality, Hue and others.

https://3reality.com/release-note/

Good hunting!

Ouch! here is a reason to wait on all releases …