Many ZHA failures per day, HUSBZB-1 70+ devices

EDIT: Cross-posted on Reddit. https://www.reddit.com/r/homeassistant/comments/r5bjxa/many_zha_failures_per_day_husbzb1_70_devices/

Okay, so here’s my boggle:

In the last few months, the stability of my ZHA network has drastically decreased. I’ve gone from a very, very responsive ZHA network, to a fairly laggy and failure prone ZHA network. I’ve tried to isolate problems, but every day I have dozens of “message send failure” entries in my log, each representing a light bulb that failed to turn on the first time, by an automation or some action.

Most failures occur when ZHA commands are being issued at the behest of an automation, where Home Assistant gets the motion detection from the sensor, and issues commands to turn on multiple bulbs (I have lots of can lights). One or more can light will almost always fail to come on. I don’t see this behavior with my lights on ZHA buttons, but the commands/frequency of command are much, much fewer.

Now I’ve combatted this problem by writing a Node Red (I installed Node Red for this single purpose) flow, that reads the ZHA log, and then issues retries for any failures it detects. It isn’t perfect, but now about 95% of the time, my bulbs come on… eventually. It still has problems, as my flow doesn’t know what color was asked for or what brightness, just that an on or off was issued and to retry turning on or off. So, it’s still rather problematic, and the delay of all the bulbs coming on at vastly different times is also annoying.

Information about my configuration:

  • I’m using a HUSBZB-1
  • I’m using HA in docker
  • I have around 70 devices, the vast majority are bulbs or hard wired devices (routers?), not sensors
  • I had a Philips Hue Hub, which has been powered off and all bulbs migrated to ZHA.
  • I had a SmartThings hub, which has been powered off and all devices were migrated to HA in some fashion.
  • My 2.4Ghz Wifi is running on channel 1 currently, which shouldn’t conflict (much?) with ZHA running on 15, according to the spectrum guides I followed. I’ve also tried channel 11 on the Wifi to no avail.
  • My HUSBZB-1 is on a USB extension cable, that is 4 ft long, and well away from any interference that I can discern.
  • I’ve tried adding some bulbs through some of my zigbee plugs, to see if that helps, and it didn’t change anything.
  • I’ve tried adding ZHA groups, but unfortunately I don’t use all the bulbs in the same way in every scene, so groups hinder me further. I only tried this because I was thinking that maybe I was sending too many commands, to too many devices at once.

So my question is this: should I buy a new controller? The new Sonoff Zigbee 3? That’s the only variable I haven’t isolated, that I can think of. Am I running up against some limit for this USB device? Are my zigbee or ha databases getting too large (around 1 gb now).

Here’s an example of one of the zigbee failures in my logs: HA ZHA Failure Log - Pastebin.com