MoesGo Switch Crashes HAss

I’ve been running HomeAssistant on a RPi4 for the past few months with no issues (ultra stable, no crashes) admittedly not doing much beyond reporting status of my TADO system, and turning a Smart light bulb on and off.

I recently added a 2 way MoesGo Smart light switch (TS0012) to control (both electrically, and also via the HAss interface) the (dumb) main light in that room, and add a physical switch for the uplighter which has the smart bulb (Innr RB279T) in it.

I am using a sonoff Zigbee USB dongle, connected to the RPi.

Ever since installing the switch, and connecting to HAss, the system has crashed on a daily basis. When crashed, the physical/electric switch controls the (dumb) main light still, but not the soft swtich for the uplighter. The only way to recover from the crash is to power cycle the RPi.

I connected the switch to the HAss by discovering directly within HAss - i didn’t install any of the “smart life” apps or anything.

Am I doing something wrong?

I’m still quite new to Home assistant and “smart” things in general.

edit - device signature below:

> {
>   "node_descriptor": "NodeDescriptor(logical_type=<LogicalType.EndDevice: 2>, complex_descriptor_available=0, user_descriptor_available=0, reserved=0, aps_flags=0, frequency_band=<FrequencyBand.Freq2400MHz: 8>, mac_capability_flags=<MACCapabilityFlags.AllocateAddress: 128>, manufacturer_code=4417, maximum_buffer_size=66, maximum_incoming_transfer_size=66, server_mask=10752, maximum_outgoing_transfer_size=66, descriptor_capability_field=<DescriptorCapability.NONE: 0>, *allocate_address=True, *is_alternate_pan_coordinator=False, *is_coordinator=False, *is_end_device=True, *is_full_function_device=False, *is_mains_powered=False, *is_receiver_on_when_idle=False, *is_router=False, *is_security_capable=False)",
>   "endpoints": {
>     "1": {
>       "profile_id": 260,
>       "device_type": "0x0100",
>       "in_clusters": [
>         "0x0000",
>         "0x0003",
>         "0x0004",
>         "0x0005",
>         "0x0006",
>         "0xe000",
>         "0xe001"
>       ],
>       "out_clusters": [
>         "0x000a",
>         "0x0019"
>       ]
>     },
>     "2": {
>       "profile_id": 260,
>       "device_type": "0x0100",
>       "in_clusters": [
>         "0x0004",
>         "0x0005",
>         "0x0006",
>         "0xe001"
>       ],
>       "out_clusters": []
>     }
>   },
>   "manufacturer": "_TZ3000_18ejxno0",
>   "model": "TS0012",
>   "class": "zhaquirks.tuya.ts001x.Tuya_Double_No_N_Plus"
> }

further edit - managed to catch it restarting itself, and got this from the logs:

2022-11-23 14:19:20.385 WARNING (Recorder) [homeassistant.components.recorder.util] The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly
2022-11-23 14:19:20.653 WARNING (Recorder) [homeassistant.components.recorder.util] Ended unfinished session (id=54 from 2022-11-23 13:12:28.169670)
2022-11-23 14:19:42.610 WARNING (MainThread) [homeassistant.components.onvif] Could not retrieve date/time on this camera
2022-11-23 14:19:42.626 WARNING (MainThread) [homeassistant.components.onvif] Could not retrieve date/time on this camera
2022-11-23 14:20:01.105 ERROR (MainThread) [zigpy.application] Couldn't start application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 998, in request
response = await response_future
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/zigpy/application.py", line 124, in startup
await self.connect()
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 106, in connect
await znp.connect()
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 706, in connect
self.capabilities = (await self._skip_bootloader()).Capabilities
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 686, in _skip_bootloader
return await self.request(c.SYS.Ping.Req())
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 994, in request
async with async_timeout.timeout(
File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
self._do_exit(exc_type)
File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2022-11-23 14:20:01.137 WARNING (MainThread) [homeassistant.components.zha.core.gateway] Couldn't start ZNP = Texas Instruments Z-Stack ZNP protocol: CC253x, CC26x2, CC13x2 coordinator (attempt 1 of 3)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 998, in request
response = await response_future
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 172, in async_initialize
self.application_controller = await app_controller_cls.new(
File "/usr/local/lib/python3.10/site-packages/zigpy/application.py", line 144, in new
await app.startup(auto_form=auto_form)
File "/usr/local/lib/python3.10/site-packages/zigpy/application.py", line 124, in startup
await self.connect()
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 106, in connect
await znp.connect()
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 706, in connect
self.capabilities = (await self._skip_bootloader()).Capabilities
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 686, in _skip_bootloader
return await self.request(c.SYS.Ping.Req())
File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 994, in request
async with async_timeout.timeout(
File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
self._do_exit(exc_type)
File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The sonoff coordinator, you installed it on the system, at the same time as the switch? I assume you use ZHA as zigbee sw?

The problem is not the switches, it is most likely something in regards to your zigbee setup or the coordinator. Have you read the many good descriptions on creating a stable zigbee network? Adequate power supply, long usb extension cable etc?

thanks khvej8- the Sonoff co-ordinator was installed when the system was first set up, along with the single ‘smart’ bulb. It was working fine for a few months with the bulb just controlling it via the app on my phone, or the web interface on my pc/laptop etc. Thus my assertion the switch is at least part of the problem. The integration is ZHA, which was the default when it set it up.

I’ve read a guide, but I’m not sure which one (I’ve slept since then). The co-ordinator is connected via a 2ft extension, and is in a separate enclosure to the RPi itself, I do remember reading about the potential interference. Power-supply, I’ll get back on that - it’s buried in the cable salad under the telly. If it helps the co-ordinator, RPi, switch and lamp are all in the same room…

Another issue that may be linked, is that under certain conditions, the ‘Dumb’ LED bulbs will flash. typically when the uplighter is on, but main light is off, but also sometimes when both lights are switched off by software, rather than interaction with the switch. There’s no capacitor across the LED’s which is mentioned in some accounts of this issue, however the MoesGo switch states that a capacitor isn’t needed.
Interestingly its the freshest* of the 3 bulbs in the main light that flashes, if I remove it, one of the older ones will flash, but at an extended interval.

*bulbs are 4 years old, and one failed recently.

Ok, do not think it is the power supply. It is the zigbee network not being stable.

The coordinator, it is a sonoff ???. Have you firmware upgraded it?

Your zigbee network consists of 3 devices? Coordinator, a inner bulb and the moes switch. And they are close to each other?

To be honest, everything is being run ‘out of the box’ and with pretty much default configurations, so a FW upgrade is probably worth looking into. All items are in the same room, which isn’t big, and the only obstruction is a wooden enclosure around the co-ordinator.

Try and do a FW upgrade.

If you have wifi access points, create some distance and make sure your channel do not overlap

I think that it is not impossible that it could still be related to the power supply, (an unstable Zigbee network should not crash Home Assistant but maybe I misunderstood the crash). Anyway, Raspberry Pi’s are known to sometimes get weird and intermittent problems if got bad or underpowered power-supply → Raspberry Pi power supply problems - Google Search

Regardless be sure to follow these best practice guidelines which include upgrading the firmware and connecting the Zigbee Coordinator USB adapter to a USB 2.0 port or via a powered USB 2.0 hub (and not a USB 3.0 port) as well as using a long shielded USB extension cable to avoid interference and get the Zigbee Coordinator USB dongle away from all electronics and electric-appliances → Generic best practice tips on improving Zigbee network range and general stability · zigpy/zigpy Wiki · GitHub

I’ve just been looking into the firmware on the dongle, and to be honest, upgrading it (https://sonoff.tech/wp-content/uploads/2022/11/SONOFF-Zigbee-3.0-USB-dongle-plus-firmware-flashing-.pdf) looks pretty convoluted, I’d rather not go down that route if I can help it.

My firmware appears to be build 20210708 - if this is a cursed configuration then I may consider it, but if it’s just a ‘would be good’ thing, I’m tempted to look elsewhere first.

The power supply came as part of a ‘starter kit’ I ordered from CPC over the summer (was the best way at the time of actually getting one). it’s marked as 5.1V and 3A which, all things being equal, is what it’s producing. I don’t have another PSU of that rating to swap in to test at this point. if it’s likely the PSU is the culprit, I’ll get another one.
Is there any way of checking within HAss that the PSU is adequate?

My main 2.4G wifi router is on channel 11, so assuming the default Zigbee channel 15 is in use then I can’t see an issue. There’s not even any neighbours devices (that my phone can see) lower down the band.

FWIW this is the signature of the dongle:

{
  "node_descriptor": "NodeDescriptor(logical_type=<LogicalType.Coordinator: 0>, complex_descriptor_available=0, user_descriptor_available=0, reserved=0, aps_flags=0, frequency_band=<FrequencyBand.Freq2400MHz: 8>, mac_capability_flags=<MACCapabilityFlags.AllocateAddress|RxOnWhenIdle|MainsPowered|FullFunctionDevice|AlternatePanCoordinator: 143>, manufacturer_code=0, maximum_buffer_size=80, maximum_incoming_transfer_size=160, server_mask=11265, maximum_outgoing_transfer_size=160, descriptor_capability_field=<DescriptorCapability.NONE: 0>, *allocate_address=True, *is_alternate_pan_coordinator=True, *is_coordinator=True, *is_end_device=False, *is_full_function_device=True, *is_mains_powered=True, *is_receiver_on_when_idle=True, *is_router=False, *is_security_capable=False)",
  "endpoints": {
    "1": {
      "profile_id": 260,
      "device_type": "0x0400",
      "in_clusters": [
        "0x0000",
        "0x0006",
        "0x000a",
        "0x0019",
        "0x0501"
      ],
      "out_clusters": [
        "0x0001",
        "0x0020",
        "0x0500",
        "0x0502"
      ]
    },
    "2": {
      "profile_id": 49246,
      "device_type": "0x0820",
      "in_clusters": [
        "0x0000"
      ],
      "out_clusters": []
    }
  },
  "manufacturer": "Texas Instruments",
  "model": "CC1352/CC2652, Z-Stack 3.30+ (build 20210708)",
  "class": "zigpy_znp.zigbee.device.ZNPCoordinator"
}

As mentioned, I do not think it is the power supply. My idea was because I misunderstood you “crash”, as HA beeing down, and not only the zigbee network.

Except from interference from USB3/Wifi, then the FW is my best guess.

I do not think it is so complicated if you have a windows PC.It looks very complicated, however if you ever used command line, then you will be finished in 15 minutes. It does look complicated.

Go for version 20220219, as this is the lastest stable version.

Follow the good explanation from Hedda in comment 2 on this link.

Jump part 4, as this is only for backup/restor. You only have 3 devices, much easier to start over if it fails.

The driver in number 1 is the 4 line on this download page

Agree, then Wifi and Zigbee channel will not interfere. Remember USB3 interference, use USB2 and the USB cable extender you have. You mention the coordinator is in a “separate enclosure”, why not “outside and enclosure”. I have mine on a 2m cable extender, hidden in a plant 1½m from everything else electrical.

I need to be careful in my positioning of devices as I have curious children…

Will look further into firmware flashing.

the crash is HA - the whole thing is unresponsive, web interface, app, etc… I have to power-cycle the RPi to get it back running, obviously as Zigbee is run from the RPi then that goes down/wrong too…

Written communication can go wrong😏 I did misunderstand this.

If all of HA is down, then it is not the zigbee network and interference from outside radios…

The total setup of HA, how is this looking. Updated as new versions come out, running with a SD card or a SSD? Have you looked in the logs? You mention it has been stable, any other changes besides the switch?

No problem, I appreciate your time on this…

The HA setup is mostly default, updates applied whenever the notification says so, as of this morning:

Home Assistant 2022.11.4
Supervisor 2022.11.2
Operating System 9.3
Frontend 20221108.0 - latest

There have been updates since this issue started which have been installed to no effect.

The UI is basic without anything fancy, a few views and very little automation, the system is in its infancy at the moment, the only “smart” equipment connected is as follows:

Tado central heating controls
Internet Router
Shelly EM power meter
A pair of IP cameras (they have their own issues, but they’re minor, and have been consistent from the start)
The Zigbee co-ordinator with switch and lamp as mentioned.

Prior to adding the MoesGo switch, the system was rock solid, no crashes, and the only errors were warnings thrown up by the IP cameras (some sort of ONVIF message - I’ll start another thread on that at some point).

The (relevant) log extract is on the original post as an edit, I believe it’s relevant, but I cannot say for certain as the logs appear to be cleared on restart - certainly the ones available through the interface.

With a relativ simple setup as yours, it should work…

You use a SD card? There is a lot of comments of HA beeing unstable if the card is not of adequate quality. I was running on a SD card for the first 12 month of my HA time, with no issues. Moved from RPi to Intel based, hence buildin SSD. Also make sure the SD card have a lot of free space, due to the way it stores changes.

The powersupply. It could be the problem, even with a sonoff dongle and nothing else it will work on the standard powersupply (I was running this for a long time, using the same sonoff dongle, on firmware 20220219).

There is some lines on the sonoff zigpy coordinator in the logs, it might still be a FW issue. Not sure how ZHA handle the FW, I have been using Z2M.

Regarding logs, in the config directory there are 2 logfiles. try and have a look in them, not sure you find anything.

I never scrimp on SD cards, so that would be one of the last things I check, to be honest. Its a 64Gb card and is 10% used at the moment
I’ve got a branded PSU on order from CPC, more as a backup, than much else.

Whilst I earlier dismissed the ONVIF errors I was getting as a ‘look at it later’ kinda thing as it was a warning, not an error. I did look into it briefly and it appears to be related to a memory leak. Tracking memory usage over the past hour or so and it does creep up to max over the course of 45 minutes.
I’ve disabled and removed the IP cameras and the ONVIF integration to see if that improves the stability. :crossed_fingers:

Update:
Removing the onvif integration fixes the crashing error, and the system has been stable and responding properly for the past few days - result :slight_smile: