ZHA fails with Silicon Labs 2.4.4 Update - HA Yellow

I have a Yellow device that is very stable.

Since the 2.4.4 update, ZHA goes offline every 24-36h. The fix is to restart the OS or device.

Is anyone else seeing this?

Also, is there an easy botton to get back to 2.4.3?

Thanks!

1 Like

Yes - same problem for me. HA Yellow and ZHA which will fail since the Silicon Labs update 2.4.4.

Here is the log:

Listening on port 9999 for connection...
Accepting connection.
Accepted connection 7.
otbr-agent[298]: 00:10:13.790 [W] Platform------: Error processing result: InvalidState
otbr-agent[298]: 00:10:13.790 [W] Platform------: Error waiting response: InvalidState
otbr-agent[298]: 00:10:13.790 [W] SubMac--------: RadioReceive() failed, error: InvalidState
otbr-agent[298]: 00:37:07.847 [W] Platform------: Handle transmit done failed: Parse
otbr-agent: ../../third_party/openthread/repo/src/core/mac/sub_mac.cpp:624: void ot::Mac::SubMac::HandleTransmitDone(ot::Mac::TxFrame&, ot::Mac::RxFrame*, ot::Error): Assertion `false' failed.
otbr-agent[298]: 00:37:07.848 [C] Platform------: ------------------ BEGINNING OF CRASH -------------
otbr-agent[298]: 00:37:07.849 [C] Platform------: *** FATAL ERROR: Caught signal: 6 (Aborted)
otbr-agent[298]: 00:37:07.855 [C] Platform------: # 0: /usr/sbin/otbr-agent(+0x20786c) [0x555c72786c]
otbr-agent[298]: 00:37:07.855 [C] Platform------: # 1: /usr/sbin/otbr-agent(+0x2079b0) [0x555c7279b0]
otbr-agent[298]: 00:37:07.855 [C] Platform------: # 2: linux-vdso.so.1 __kernel_rt_sigreturn+0x0 [0xb9a287bc]
otbr-agent[298]: 00:37:07.855 [C] Platform------: # 3: /lib/aarch64-linux-gnu/libc.so.6 gsignal+0xdc [0xb9579eac]
otbr-agent[298]: 00:37:07.855 [C] Platform------: # 4: /lib/aarch64-linux-gnu/libc.so.6 abort+0x108 [0xb9566aa0]
otbr-agent[298]: 00:37:07.855 [C] Platform------: # 5: /lib/aarch64-linux-gnu/libc.so.6(+0x2d478) [0x7fb9573478]
otbr-agent[298]: 00:37:07.856 [C] Platform------: # 6: /lib/aarch64-linux-gnu/libc.so.6(+0x2d4dc) [0x7fb95734dc]
otbr-agent[298]: 00:37:07.856 [C] Platform------: # 7: /usr/sbin/otbr-agent ot::Mac::SubMac::HandleTransmitDone(ot::Mac::TxFrame&, ot::Mac::RxFrame*, otError)+0xc8 [0x5c789bd0]
otbr-agent[298]: 00:37:07.856 [C] Platform------: # 8: /usr/sbin/otbr-agent ot::Radio::Callbacks::HandleTransmitDone(ot::Mac::TxFrame&, ot::Mac::RxFrame*, otError)+0x30 [0x5c823258]
otbr-agent[298]: 00:37:07.856 [C] Platform------: # 9: /usr/sbin/otbr-agent otPlatRadioTxDone+0x68 [0x5c7c755c]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #10: /usr/sbin/otbr-agent ot::Spinel::RadioSpinel<ot::Posix::VendorInterface>::TransmitDone(otRadioFrame*, otRadioFrame*, otError)+0x58 [0x5c721518]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #11: /usr/sbin/otbr-agent ot::Spinel::RadioSpinel<ot::Posix::VendorInterface>::ProcessRadioStateMachine()+0x80 [0x5c71f7e4]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #12: /usr/sbin/otbr-agent ot::Spinel::RadioSpinel<ot::Posix::VendorInterface>::Process(void const*)+0x8c [0x5c71d238]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #13: /usr/sbin/otbr-agent platformRadioProcess+0x20 [0x5c71a21c]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #14: /usr/sbin/otbr-agent otSysMainloopProcess+0x28 [0x5c725548]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #15: /usr/sbin/otbr-agent otbr::Ncp::ControllerOpenThread::Process(otSysMainloopContext const&)+0x2c [0x5c82dc78]
otbr-agent[298]: 00:37:07.857 [C] Platform------: #16: /usr/sbin/otbr-agent otbr::MainloopManager::Process(otSysMainloopContext const&)+0x7c [0x5c83284c]
otbr-agent[298]: 00:37:07.858 [C] Platform------: #17: /usr/sbin/otbr-agent otbr::Application::Run()+0x204 [0x5c6cee40]
otbr-agent[298]: 00:37:07.858 [C] Platform------: #18: /usr/sbin/otbr-agent(+0x1b0580) [0x555c6d0580]
otbr-agent[298]: 00:37:07.858 [C] Platform------: #19: /usr/sbin/otbr-agent main+0x88 [0x5c6d0740]
otbr-agent[298]: 00:37:07.858 [C] Platform------: #20: /lib/aarch64-linux-gnu/libc.so.6 __libc_start_main+0xe8 [0xb9566e18]
otbr-agent[298]: 00:37:07.858 [C] Platform------: #21: /usr/sbin/otbr-agent(+0x1ae8e8) [0x555c6ce8e8]
otbr-agent[298]: 00:37:07.858 [C] Platform------: ------------------ END OF CRASH ------------------
[11:26:11] INFO: otbr-agent ended with exit code 256 (signal 6)...
OTBR_FORWARD_INGRESS  all opt    in * out wpan0  ::/0  -> ::/0  
Chain OTBR_FORWARD_INGRESS (0 references)
target     prot opt source               destination         
DROP       all      anywhere             anywhere             PKTTYPE = unicast
DROP       all      anywhere             anywhere             match-set otbr-ingress-deny-src src
ACCEPT     all      anywhere             anywhere             match-set otbr-ingress-allow-dst dst
DROP       all      anywhere             anywhere             PKTTYPE = unicast
ACCEPT     all      anywhere             anywhere            
otbr-ingress-deny-src
otbr-ingress-deny-src-swap
otbr-ingress-allow-dst
otbr-ingress-allow-dst-swap
OTBR_FORWARD_EGRESS  all opt    in wpan0 out *  ::/0  -> ::/0  
[11:26:10:990163] Info : Endpoint socket #12: Client disconnected. 1 connections
[11:26:10:990317] Info : Client disconnected
Chain OTBR_FORWARD_EGRESS (0 references)
target     prot opt source               destination         
ACCEPT     all      anywhere             anywhere            
[11:26:11] INFO: OTBR firewall teardown completed.
[11:26:11] WARNING: otbr-agent exited with code 134 (by signal 6).
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service otbr-agent-rest-discovery: stopping
s6-rc: info: service zigbeed: stopping
s6-rc: info: service mdns: stopping
s6-rc: info: service otbr-agent-rest-discovery successfully stopped
s6-rc: info: service otbr-agent: stopping
Default: mDNSResponder (Engineering Build) (Jan 24 2024 17:58:11) stopping
[11:26:11] INFO: zigbeed ended with exit code 256 (signal 15)...
s6-rc: info: service zigbeed successfully stopped
[11:26:11] INFO: otbr-agent ended with exit code 256 (signal 15)...
[11:26:11] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service mdns successfully stopped
[11:26:11] INFO: OTBR firewall teardown completed.
[11:26:11] WARNING: otbr-agent exited with code 143 (by signal 15).
s6-rc: info: service otbr-agent successfully stopped
s6-rc: info: service cpcd: stopping
[11:26:11:485981] Info : Endpoint socket #12: Client disconnected. 0 connections
[11:26:11:486270] Info : Client disconnected
[11:26:11:653138] Info : Server core cleanup
[11:26:11:653279] Info : Daemon exiting with status EXIT_SUCCESS
Logger buffer size = 28672, highwater mark = 2498 : 8.71%. Lost logs : 0
[11:26:11] INFO: CPC ended with exit code 0 (signal 0)...
s6-rc: info: service cpcd successfully stopped
s6-rc: info: service cpcd-config: stopping
s6-rc: info: service cpcd-config successfully stopped
s6-rc: info: service universal-silabs-flasher: stopping
s6-rc: info: service universal-silabs-flasher successfully stopped
s6-rc: info: service banner: stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped

I found another post with the same problem, but on a non-Yellow box.

I also have a temp fix for myself, I just built an automation to reboot my box at 4a everyday, that seems to fix things. I feel like back in my help desk days… “Hey, did you reboot they system? Try that and call me back if you still have issues” LOL

1 Like

+1, I’m experiencing the same issue. Created an automation to reboot whenever several ZigBee devices become unavailable in the meantime, but this is suboptimal.

Same issue with HA Yellow, sometimes I need to reboot the unit twice in a day since the update to 2.4.4.

+1 from me as well.
I already tried disabling the multiprotocol support of the yellow, but I still need to restart frequently.

1 Like

Same here, ZHA becomes random unavaiable. How you set up the automation?

1 Like

@erger , easily in the automation GUI….

I can also confirm this problem. This issue is also in the previous version 2.4.3, which I had selected to skip. I was waiting for this next release hoping they had fixed it. I’ve reverted back to version 2.3.2 which is stable.

1 Like

I also have this problem, and I’d say the multiprotocol addon or OTBR is the issue here.
Disabling the OTBR options in the addon helped for some time, and then the issue returned. Even rebooting the yellow didn’t help.
Finally, I removed the multiprotocol addon and disabled it on the hardware as well OTBR integration, then reflash with silicon labs flasher addon to revert to Zigbee firmware. ZHA is more stable now.

As an FYI, I disabled my reboot automation to see if the ZHA was somehow fixed in the last few updates… Well, after 3 days, its seems to be back to working again.

It’s definitely still happening for me. I have an automation that reboots Home Assistant when all ZHA entities have been unavailable for > 1 minute, and it triggers between 2 and 8 times daily. It’s been getting worse - it was maybe once every day or two initially, but now it’s several times per day.

I returned to the 2.3.2 version. This one is stable. With al the later versions I lose connection after a few hours.This is also on a HA Yellow.