So there I was sitting comfortably. Perhaps too comfortably. Smug even. Thought I’d thought everything through. Gone from 0-50 devices in year in our new flat and performing pretty reliably.
Wasn’t sure what category to post this in. It might be my config. It might be hardware related. It might be O/S, so I plumped for “Configuration” as a catch all. Because for sure something spiked my zigbee network about 20-22 hours ago. Just like that. I don’t know the exact time unfortunatly because my automations only happen noticeably at night, when we realised that sunset must have been a while ago and the lights didn’t come on. That was about 10pm, but I thought it was just a one of those yygt glitches that happen from time to time. I took a look later and tried the usual reboot but then found that every device had gone offline.
I posted some detail here last night showing some of the error messages, but I have not become clever enough yet to know how to obtain and download all the relevant detailed logs to try to find the event responsible. I don’t think I got to them in time, although this seems to scream “culprit”
Your network is using the insecure Zigbee2MQTT network key!
which is odd because, although it’s installed, I have not yet had time to edumacate myself in the use of Z2M. Its not actually configured.
In addition to the error messages posted there, here is what might be a Z2M significant event:
Zigbee2MQTT:error 2023-07-01 22:02:47: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms)
at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:103:27)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:132:29)
at Zigbee.start (/app/lib/zigbee.ts:58:27)
at Controller.start (/app/lib/controller.ts:101:27)
at start (/app/index.js:107:5)
But not sure how to interpret
This is from Log Viewer:
Add-on version: 0.15.1
You are running the latest version of this add-on.
System: Home Assistant OS 10.3 (aarch64 / raspberrypi4-64)
Home Assistant Core: 2023.6.3
Home Assistant Supervisor: 2023.06.4
-----------------------------------------------------------
Please, share the above information when looking for help
or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service base-addon-banner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service base-addon-log-level: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service base-addon-log-level successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/nginx.sh
cont-init: info: /etc/cont-init.d/nginx.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun logviewer (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
s6-rc: info: service legacy-services successfully started
[22:02:15] INFO: Starting Log Viewer...
2023-07-01T21:02:19.313Z logview:debug start tailing /config/home-assistant.log
2023-07-01T21:02:19.328Z logview:info listening on port 4277 (HTTP)
[22:02:19] INFO: Starting NGINX...
2023-07-01T23:11:09.089Z logview:error 'change' event for /config/home-assistant.log. Error: ENOENT: no such file or directory, stat '/config/home-assistant.log'
2023-07-01T23:11:10.092Z logview:error watch for /config/home-assistant.log failed: Error: ENOENT: no such file or directory, stat '/config/home-assistant.log'
When I started this smart-home project a year ago, I wanted light and blind automation and some motion sensors. I had some prior experience having played with and enjoyed a SmartThings starter kit a few years back. My research led me to a focus on Zigbee, although I had a few wifi sockets already and had reliable experience with e-family cloud - Smart Life now I believe - at least thats how I have them integrated into HA as the brain.
The HA brain is installed on a Raspberry Pi 4B equipped with plenty storage and a Sonoff Zigbee 3.0 USB Dongle Plus. Theres a Sonoff Bridge to handle distance. The remaining devices are mainly IKEA Tradfri spots, and sensors. Linkind GU10 spots, Sonoff zbmini, Candeo dimmers, and more recent additions have been a ZY-M100 and a Somfy Connectivity kit via Overkiz. Curiously, the Somfy devices stayed connected and functioning throughout. Aside from Overkiz and (unused) MQTT integrations there’s Sonos, Tuya, ZHA (showing deug logging enabled I notice).
So nothing much exotic going on I would say and no great churn over the last couple months, other than the addition of Somfy Connectivity kit about a week before this event, and removal of a Tradfri repeater that seemed overkill alongside the Sonoff kit. A week when I upgraded to 10.6.3, and I may have noticed the odd wobble but nothing that I would consider a pre-cursor event on this scale. Like the England cricket team I am left stumped. It has seriously caused me to question my tech choice.
It has taken about six hours of rebuild to get about 80% of everything back. Worst thing is, I am the only person with the knowledge to put it all back together, what with some devices working only after power cycling one lighting and one ring circuit at the consumer board. Twice when you then find that some devices arent going to come back on board until you remove them from HA first. Other devices need you to remember whether they need buttons holding in, pressed for five or ten seconds, pressed two or four times, with or without holding the last one or, again, remove them from HA. Still havent magicked the smartthings plug back. The serious one is a zbmini wired into a hefty pendant lamp that’s going to need a couple ladders and people to get to the switch to reset it. That’s the one I really hate myself for configuring, but the only way to split a large lighting circuit without demolition.
So I am left with a trust issue. None of my research before or since suggested that such an event was possible. It “feels” like something happened to compromise the Zigbee network itself. Key change? Might Somfy be involved somehow? Zigbee2MQTT? Any insights into what might cause such a collapse, or other suggestions on how to rebuild trust much appreciated, otherwise I will start backing out and maybe introduce more PIR until something more reliable emerges in the marketplace.
Thanks for listening. You can wake up now!