ZIGBEE2MQTT Crashing, Restarting, network failing

I think the only next thing I can try is to change the zigbee channel again, and move my wifi down.

Is USB autosuspend feature enabled ? If so:

Disable the USB autosuspend feature, if cat /sys/module/usbcore/parameters/autosuspend returns 1 or 2 it is enabled; to disable execute:

sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="/&usbcore.autosuspend=-1 /' /etc/default/grub
update-grub
systemctl reboot
1 Like

Ok first. You need to slow down

Zigbee is not a race car. It’s more of a freight train. You put things in motion and wait.

In the last thirty minutes you’ve posted 4 different things. It takes time to see differences in the mesh. And when things break it takes time to recover. I used to have a periodic signal issue on the east side of the house and although it only happened a few moments at a time it took two to three hours for the mesh to fully recover on that side of the house.

Out of all of that what I see is the power restart event restored connectivity to your coordinator from your HA box. (which would also explain the entire network crashing.) this isn’t a signal issue. That’s a crash of your stick or your zigbee software

I’d love to slow down @NathanCu - what should I do?

I am still at square one, I cannot be sure on any pattern. All I can say is that with the sonoff stick Supervisor is terminating and restarting Zigbee2MQTTm mutliple times per day, and I always notice the same few devices which seem to not work,

Normally 2 or 3 of these: Aurora Lighting AU-A1ZBDSS control via MQTT | Zigbee2MQTT
or some of these: Aurora Lighting AU-A1ZB2WDM control via MQTT | Zigbee2MQTT
but having replaced some of these, am also finding it with Iolloi ID-UK21FW09 control via MQTT | Zigbee2MQTT

It is enabled, but I cannot seem to proceed with the next as it can’t find /etc/default/grub

I am running Advanced SSH & Web Terminal, with protected mode swtiched off

➜  ~ cat /sys/module/usbcore/parameters/autosuspend                                     
2
➜  ~ sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="/&usbcore.autosuspend=-1 /' /etc/default/grub
sed: /etc/default/grub: No such file or directory
➜  ~ 

Seems difficult on HA OS

Currently Zigbee2MQTT is lasting an average of 20 minutes before supervisor does this:

[33m23-08-19 15:34:17 WARNING (MainThread) [supervisor.addons.addon] Watchdog found addon Zigbee2MQTT is failed, restarting...e[0m
[32m23-08-19 15:34:17 INFO (SyncWorker_7) [supervisor.docker.manager] Cleaning addon_45df7312_zigbee2mqtt applicatione[0m
[32m23-08-19 15:34:17 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on zigbee2mqtt/zigbee2mqtt-amd64 with version 1.32.2-1e[0m

Frustrating for sure!

Do you have some type of physical connection between your solar controller and the computer with the Zigbee2MQTT software and dongle? Perhaps a RS-485 to USB cable connection? If so, my thought you might have some type of RF or grounding issue and that is causing issues with your zigbee coordinator device.

If not that, do you have some other zigbee network that was stood up as part of your solar install, or something else? From my brief googling, it does not look like your solar inverters are available with a zigbee interface. That said, if the log you shared a log from zigbee2mqtt only, it is odd that there are messages in it regarding your solar inverters. I would think the zigbee2mqtt log would only have entries related to zigbee devices. If you have other zigbee network and it is open to ‘joining’ devices or some of your zigbee2mqtt devices are somehow on both zigbee networks that would a possible issue.

I am very doubtful of any wifi and zigbee network interference, I am far from a solar expert, however I doubt any devices are generating enough traffic to cause issues. Zigbee is very robust and very low data space needy.

The solar inverter is connected to a pi via RS-486 running solar-assistant, that is connected with ethernet to a mesh router, it’s uplink is on channel 108, mosquitto is in bridge mode and exchanges data between home and solar assistant. Solar gear is connected to a seperate house electrical distribution board, and links back to main house at the grid meter.

The solar panels have Tigo optimisers. These communicate via zigbee also. When it was first installed I noticed that whenever the Tigo control unit was powered on or off, or when the panels were just going on or off with light, that devices in the house would reset.

I made a video of this happening here, I am plugging in the solar optimiser controller (Tigo TAP and CCN, which is out of shot) and you will see almost instanly many smart sockets reset to default off.

Tigo interferance

Tigo was impossible company to work with - they deny all issues and blame “the open source software”, or well anything else, but themslelves. They would not share any details of their system and accused me of being a hacker when I pushed for answers, noting they would cut off all warranty and support to me.

In the end I shifted the Zigbee channel I was using from the default 11 to 15, and this at the time stopped all issues. Everything was stable for 6 months, and nothing changed except for updating HA, Z2M and other stuff.

Inside I feel like perhaps Tigo changed their zigbee channel - they have decided they own all consumer purchased stuff and can do whatever they want. This is why I was also thinking to change the zigbee channel I’m using.

Wow, you’ve got a real snake’s nest there. :grimacing:

Another brief googling does seem to show others seeing some RF interference from some model of Tigo devices. Not clear on what RF technology your devices have, there does seem to be some Tigo devices using bluetooth. Also not clear when you talk about channel numbers. Wifi, Zigbee and Bluetooth use different numbering systems to align to physical RF 2.4 and 5.0 Ghz frequencies. For example, you refer to channel 108, if you are talking Wifi, i think that is a 5.0 Ghz channel.

I would divide your solar system and Home Assistant system on to two different physical computers and get each stable. Having an electrical connection to your solar stuff to your computer seems to offer an opportunity for RF noise to get into the computer and USB. USB is not a really stable electrical system and does not like grounding issues or noise .

It is a bit hard to tell from your video what is going on. FYI, music really does not add value :wink: . It appears you have a physical electrical / ground connection in common with whatever you plug in and the zigbee plugs. The one different kind plug does not seem to have an issue when you plug in. What brands? Are all the plugs connected to a zigbee network?

Good hunting!

Yes, there are lots of reports of Tigo causing issues - a lot especially in Germany, where interferance rules are a lot stricter. I’ve had several Germans reach out to me directly also and report similar experience with Tigo support.

Tigo uses wifi 2.4 GHz wifi, bluetooth and Zigbee for it’s controller, and zigbee for the panel level units. Tigo even suggested I must line my roof with metal to block the signals!

For my house wifi I run 2.4 Ghz, and 5 Ghz. Channel 108 is a backhaul 5ghz channel.

Solar assistant is on a seperate system - it’s on a pi 4B, and home assistant is on a NUC5i7RYH, completly different parts of the house and completly different electrical circuits, not even connected with an ethernet cable, as solar assistant is using a mesh wifi backhaul. Both are stable - it’s only the home assistant zigbee network that is an issue.

Hate to spend more of your coin…

That said I am still pushing ‘divide and conquer’ but this time maybe try standing up your zigbee2mqtt system on a separate computer. I would isolate the MQTT systems as well, let zigbee2mqtt have it’s own with no messages from any other system, just have it talk in isolation to Home Assistant and your other MQTT based communication systems have their own MQTT server. I realize this has nothing to do with RF interference, however my point would be to use Docker on this isolated system for zigbee2mqtt so that you can run 2 zigbee2mqtt instances with two separate zigbee dongles on different channels. Put some devices on one zigbee2mqtt systems and some on the second. See if you can get one of the two stable on some channel that is ‘out of the way’ of your solar stuff. If your solar stuff were so noise that it was hosing the whole 2.4 Ghz spectrum you would see wifi and bluetooth issues. Turn on a microwave oven and do a 2.4 Ghz wifi speed test on a computer right next to the microwave to see that kind of interference. I really cannot see how the solar stuff can hose every zigbee channel.

Wow, for the vendor to have these possible issues and attitude is a real pain considering the coin you spent with them I am guessing.

Good hunting!

1 Like

Here is a point that is ‘way out there on the spectrum of possibilities’ That said, you might set a your own encryption keys for your zigbee coordinators to use, see link below. You have to totally rebuild your zigbee network after changing this key, reset coordinator and ALL devices to factory and then readd/rebuild the network. Thinking that Tigo might have just taken the Zigbee2MQTT code at some point in time and are using the same defaults…

‘Zigbee2MQTT uses a known default encryption key (Zigbee Transport Key). Therefore it is recommended to use a different one.’
https://www.zigbee2mqtt.io/advanced/zigbee/03_secure_network.html

If you follow David Proffer’s advices to create 2 Zigbee network, be sure to assign a new IEEE-address to one of your dongles. If you migrated from the ZZH to the Sonoff, they will have the same IEEE-address, and that can create a lot of new problems.

1 Like

I never really got my head around mutiple MQTT bridges and brokers and the like, and while I do have a NAS running 24x7 with some docker images already there like pi-hole, I really don’t want to complicate things any more - perhaps I just dont have the head space for this now, work is tough at the moment and I need things to be easy right now… It was working before, it should work now…

I have 3 major systems that talk MQTT which is mosquitto running on the HA OS box. 1) Zigbee2MQTT, 2) An alarm2mqtt system (HACS), and 3) Solar-assistant. I also have some tasmota, appdaemon and others in there.

Now I changed Z2M to channel 25, maybe I should do the network key also… I suppose it makes no difference now… I remember I could simply re-pair the devices before and they just joined back with the same name, so hoping the same if I change the network encryption key also.

Z2M is seeing the updates from solar assistant in its debug log, but nothing has crashed… Everything is working - but of course no devices added so far. That’s one hour stable, and before it was 20 minutes or so.

I will change the key, then add the room closest to HA back, I may add some things from the bedroom as bedside lights and buttons are one of the few things that does not have a manual over ride.

Hope you are on a good track with channel 25!

I was just recommending simplifying as much as possible as I know nothing about your solar assistant zigbee. And the whole multple MQTT thing again, just to isolate. I have a single misquitto MQTT broker that servers a lots of different services just fine and runs nonstop for year +.

And doing the 2 zigbee2mqtt networks was just to speed the analyzsis of the various channels to see if you can find one ‘out of the way’ of your tigo stuff.

If you can add one or two of the problematic zigbee plugs and the one that looked to not be affected by the tigo stuff, then as you are doing, just let it sit for a while and see it’s stability. You could create a little bash script with mosquitto_pub to just toggle the plugs a couple time a minute and see how they run. Low tech, however just hearing a relay ‘click’ on a regular monotonic rate is a good test and kind of cathartic :wink:

So far there have been no issues since channel change and adding select devices back - 41 devices so far.

You only realise just how many things you have when you have to re-add them, change icons, check automations, check the voice assistant of choice and so on… I’m beat.

I went a bit further than I thought, have added quite a bit back, but have mostly avoided the brand that seemed to have the most issues when solar was first installed, I’m about 2/5ths through overall.

Solar assistnat doesn’t use zigbee, that’s just a RS485 reader of inverters and nice graphing tool that sends data over MQTT over LAN - It was an easy option as I already had a spare pi lying around. It’s only the solar optimisers that use zigbee, these basically allow each panel wired in series to deal with individual shading issues which may otherwise drop the total output of all panels.

Originally when I was dealing with Tigo, they promised they would drop the transmit power off their devices, but I bet some update has reset it to default. Their box is now blocked form the internet, which means all it’s functionality is also lost really - it is running a local web server but they won’t let you access that!

I spoke too soon… supervisor just restarted Zigbee2MQTT :frowning:

Take it slow, see if there is a pattern or a device.

I’ll bring the :beer: :beer:

Again, standing up a 2nd zigbee2mqtt coordinator and network on different channel might help in finding a pattern or at least a stable end game.

I wonder if this was a result of trying to add a device…

The very last entry in Z2M before it was terminated:

info  2023-08-19 20:41:32: Accepting joining not in blocklist device '0x001e5e090215f768'

Supervisor

23-08-19 20:41:32 WARNING (MainThread) [supervisor.addons.addon] Watchdog found addon Zigbee2MQTT is failed, restarting...

Coincidence?