ZHA Network Freezing Up - MAX_MESSAGE_LIMIT_REACHED

Hi all,
I’ve been having issues on and off with my ZHA network freezing up and becoming unresponsive to any commands. So far I’m trying to debug the issue myself but I need to understand what the best tools are for debugging, resources around logs and log filtering/pattern recognition. Basically I need to get an understanding of how all the pros do it.

So far I’ve been able to discover the basics of enabling debugging logs for my integration, that’s fine, I understand I can access them, or even follow the logs with a -tail command.

I’ve also been able to find a useful custom flex table that shows the details of my ZHA devices in one place, so I can associate the hex IDs with the IDs listed in the logs.
However past all this it still feels like I’m unable to get full transparency with what’s happening behind the scenes. The messages I get are “MAX_MESSAGE_LIMIT_REACHED” which implies there are messages on queues somewhere that are not being cleared down or processed? I’m not sure about the specifics on how this works with ZHA, but my impression is that moving to ZHA2MQTT might give me more visibility/control, is this right?

Whilst looking at the logs I’m also not seeing any obvious culprit that might be spamming any requests, It’s not clear what the message limits are and what the current state of a message queue might be? I’m also not sure how I would go about finding any devices that are holding the network up in other ways.

I have a pastebin here from where the “MAX_MESSAGE_LIMIT_REACHED” logs started, and this was at 5AM at night which I also find odd. I don’t have any automations that are running overnight either. I’m not sure how many debug logs during X amount of time is normal either.

If anyone could offer some help or insight on how I can tackle this issue with better tools or knowledge it would be much appreciated.

Could you give us more details about your system? Coordinator, number of routers, number of end devices etc? Presumably you follow all the standard advice about having the stick on the end of a USB cable, powered hub and so on.

You might also look at your channel usage (from download diagnostics on the ZHA integration page). It ought to look something like this:

    "energy_scan": {
      "11": 52.75969252664325,
      "12": 84.164247274957,
      "13": 85.82097888710312,
      "14": 55.9836862725909,
      "15": 91.05606689948522,
      "16": 3.6632469452765037,
      "17": 19.00785284282869,
      "18": 9.713248103580147,
      "19": 1.0256846852618655,
      "20": 59.15797905332195,
      "21": 4.15070068297423,
      "22": 10.914542804728702,
      "23": 10.914542804728702,
      "24": 0.9017765778954641,
      "25": 1.0256846852618655,
      "26": 70.89933442360993

Do you make much use of Zigbee groups? They can sometimes flood the network if they are overused - though probably not at 5am…

If the max message error is coming from Zigbee I’m not sure switching to Z2M will make much difference - the network will be the same whichever integration you use.

Lots of good advice here:

2 Likes

Sure thing!
I’m using Sky connect with a HA Green Box running the latest version of HA OS. And yes I’m using the USB dongle on the end of the provided extender and away from the WiFi hub.

I actually have no Zigbee groups at all, I saw that was something people were talking about in this situation, so I made sure to check that this wasn’t something I was doing.

image

My network is relatively small, I’m only running a total of 25 Zigbee “Devices”. I’d say about half are battery powered sensors or buttons, the other half are a mix of IKEA plugs, a couple of repeaters and wired wall switches for our existing lighting (acting as end devices, not routers).


(Routers in white)

I have 2 dedicated repeaters;
A Sonoff ZBDongle-P flashed with router firmware and a TRADFRI signal repeater.
I also have 3 devices that act as repeaters but aren’t solely dedicated to it;
2 TRADFRI control outlets and a VINDSTYRKA (which all report to be routers)

I was actually not aware of channel usage, thanks! Yeah I can see that it’s looking pretty cramped, I’m in an apartment block so I’d imagine this is expected?

Is there something I can do here to optimise the channels I’m using, or is that down to the devices I purchase? Would this also be a risky thing to change if I were to look down this route?

Thanks, I’ll definitely take a look at this guide!
At this point I’m pretty confused if this is down to a device flooding the network, any if so, why it wouldn’t recover automatically after a certain point?

I’ve read that Z2MQTT has the option to provide a “Debounce” value, meaning that it could potentially reduce noise from over-zealous devices that really want to check in quite frequently. Would this be something worth looking into do you think?

All your channels look quite noisy - you’re right, typical apartment block - so I don’t think that would help much, but for reference, it’s done in ZHA and your devices should follow the coordinator to the new channel you set.

I would try the simplest thing first. You only have five routers. It may be that the max message limit is the result not of the number of messages, but of the small number of routes across the network. Try adding three or four repeaters - those little USB ones are not expensive - and see if that improves things.

I don’t know much about Z2M, I’m afraid. Perhaps someone else could chip in.

1 Like

It is actually less useful then you think, it limits the MQTT messages sent, not the Zigbee traffic.

2 Likes

Hey @sol3uk,
I came accross this thread as I am experiencing the same problem, and there is not really a lot of information to be found on the net regarding this issue.
Where you able to solve or mitigate it?
I am running HA on a Raspi 4 with an SSD and the SkyConnect Dongle is at the end of a 1m extender cable.
My Zigbee network consists of arount 45 devices 10 of them (including the ZHA coordinator) acting as routers.
My Raspi has lots of resources left and the radio network is not overly crowded.

However, I repeatedly get this error every few days:
zigpy.exceptions.DeliveryError: Failed to enqueue message after 3 attempts: <EmberStatus.MAX_MESSAGE_LIMIT_REACHED: 114>

If I restart HA everything is ok again.
Any ideas / solutions?

1 Like

You need to add many more Zigbee Router devices, read and try to follow all these tips (which include some recommendations for dedicated Zigbee Router devices);

1 Like

Hi!
Sorry for the late reply, only just seen this. Yeah I think I’ve managed to resolve this issue and haven’t experienced it since.
All I needed to do was to move around my routers more so they are more spaced out around the house, as well as do a few reconfigurations of routers and nearby devices so they realise they can connect to the nearer router with a stronger signal.
My map looks more like this now, only 1 extra router but more spaced out and between other zigbee device endpoints.