Constant MQTT devices disconnections (socket error)

mqtt
Tags: #<Tag:0x00007fd08a0720d0>

#1

Hello guys! I wonder if someone could help me out.

For the past couple of weeks, I am getting constant disconnects of devices connected to my MQTT broker. These are the logs:

1544184814: Client casarfbridge has exceeded timeout, disconnecting.
1544184814: Socket error on client casarfbridge, disconnecting.
1544184815: Client banheirosuite has exceeded timeout, disconnecting.
1544184815: Socket error on client banheirosuite, disconnecting.
1544184815: New connection from 192.168.1.185 on port 1883.
[INFO] found DVES_USER on local database
1544184815: New client connected from 192.168.1.185 as casarfbridge (c1, k10, u'DVES_USER').
1544184816: New connection from 192.168.1.136 on port 1883.
1544184816: New client connected from 192.168.1.136 as banheirosuite (c1, k15).

This is my setup:

  • Hassio running on a RPI 3B over Ethernet with latest update (83.3)
  • Asus RT-AC68U wit latest firmware on the same network as Hassio
  • About 40 devices connected over wifi (2.4g), mostly esp8266 (sonoffs, magicled, broadlinks)
  • MQTT broker using integrations

The strange thing is: never does the device disconnected from the broker gets disconnected from the WiFi. If I check the DHCP leases, all devices have been connected for days.

This is the log I get on the device:

12:14:48 MQT: Attempting connection...
12:14:48 MQT: Connect failed to 192.168.1.12:1883, rc -2. Retry in 10 sec
12:14:59 MQT: Attempting connection...
12:15:01 MQT: Connected
12:15:01 MQT: tele/banheirosuite/LWT = Online (retained)
12:15:01 MQT: cmnd/banheirosuite/POWER = 

I really don’t believe there is a network problem because if I ping the same device, for hours, never one package gets lost, very strong wifi signal here from a reasonable good router.

Any ideas?

Thanks a lot!


#2

Hi @Schneider I think looking the error that the RC (reason code) is 2.

Here you can see anothers: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.tro.doc/q039390_.htm

I think looking the first log you are using always the same client ID? It’s casarfbridge ? It’s banheirosuite ?

My suggestion is to add a random number after this client ID in order to always have a new Client ID when you need to reconnect.

Maybe more things inside this problem :face_with_raised_eyebrow: but I think it’s good to start doing this.
Hope this help you.


#3

Thanks a lot for you reply! I really appreciate it :slight_smile:

I’ve never looked at this error code, this is new information for me!

All devices have different client ids. This an example:

Do you know how to add a random number every time it tries to reconnect, using a readable client id, so i can identify it in the logs?

Thanks again!


#4

Just a quick update: I’ve manually changed the client id on all devices to something different, since it was the exact same name than the topic. Maybe this could help regarding the reason code 2.

I will monitor and let you know tomorrow.

Thanks again!


#5

Hi, the clientID need to be different, this is ok, but I know now that in your settings you can’t change the clientID adding a random value because I understand that you are using: Sonoff-Tasmota In the field Client is only a fix string.

I think inside this firmware are the connection in this lines of code:

Related with the problem you have, another users have the same problem, you can see here:
https://github.com/arendst/Sonoff-Tasmota/issues/4555

Therefore in this situation/suggestion your MQTT broker needs to have a “decent” way to disconnect and connect (reconnections) without problems. Taken care about how to clean session.

Sometimes your configuration inside broker MQTT impact on configuration of clients and the way of how to they do a reconnection.

If you can not manage the broker in the correct way you will have problems.
The quick, bad and ugly solution is to add “always” (this is in every reconnection) a clientID + random number. And this require to modify the firmware.

Related information about this question and interesting links are:

Conclusion:

  • Review configuration in your broker. Inform about this to firmware team.

You are welcome :slightly_smiling_face:


#6

Hello sir!

Amazing information, very useful, thanks a lot!

I will take the time to read everything and try to learn something of it.

As for now, changing the client id to something different than the topic seems to solve the problem so far. It has been hours and no disconnects on log. I just love this screen:

image

Very nice! Thanks a lot again and have a great weekend!


#7

:smiley: nice! Enjoy.


#8

Hello again @fquinto, how are you today?

Sorry to bother you again my friend but last night errors and disconnects started again, could you please take a look at those logs?

This the log found on the device, no RC code whatsoever:

It just disconnects and reconnects do the MQTT broker. The WI-FI connection is solid, no disconnection at all, i can see on Asus router logs.

This problem seems not important but for me kinda is. I get so many unavailable events, MQTT error logs and messed up history on devices that reconnects showing the wrong time of last state change.

I really appreciate your help!

Thanks a lot!


#9

Hi! I think the LOG of Mosquitto is not complete, this is required in order to help you.

Maybe is something related with: Mosquito MQTT update v3 broke my hassio


#10

It’s hard to help you because
and if we could see more of
Without that then it is
so the truncation is not
Seeing all of the log would
then we get the whole picture.

:wink:


#11

Thanks a lot @fquinto and @123! I am really busy these next few days, I will read more and figure out how to get better logs for debugging. Have a great week!


#12

I am getting the exact disconnects in the Mqtt add-on log. At first I thought it was a sonoffs issue but I also have a pc running a python script and that disconnected too. Just letting you know you are not the only one setting this. In tasmota I see error 2 also, I am using unique client ID for each device. Using mosquito addon v4. HA 83.3. WiFi has not dropped


#13

Same here, but didn’t have time to dig into it.


#14

Hello guys!

I’ve messed with dozens of settings and configurations on my router and HA. Changed the firmware of my Asus Router to the Merlin version and still no luck.

Sometimes they remain connected for hours and suddenly a bunch of them disconnects from the MQTT broker and reconnects immediately. Here is a log from one of my devices that just disconnected:

12:05:40 MQT: tele/controle_garagem_casa/STATE = {"Time":"2018-12-14T12:05:40","Uptime":"0T15:53:21","Vcc":2.790,"POWER":"OFF","Wifi":{"AP":1,"SSId":"Antonio e Roberta","RSSI":70,"APMac":"10:7B:44:C1:E6:78"}}
12:10:41 MQT: tele/controle_garagem_casa/STATE = {"Time":"2018-12-14T12:10:40","Uptime":"0T15:58:21","Vcc":2.774,"POWER":"OFF","Wifi":{"AP":1,"SSId":"Antonio e Roberta","RSSI":72,"APMac":"10:7B:44:C1:E6:78"}}
12:15:40 MQT: tele/controle_garagem_casa/STATE = {"Time":"2018-12-14T12:15:40","Uptime":"0T16:03:21","Vcc":2.788,"POWER":"OFF","Wifi":{"AP":1,"SSId":"Antonio e Roberta","RSSI":70,"APMac":"10:7B:44:C1:E6:78"}}
12:16:26 MQT: Attempting connection...
12:16:26 MQT: Connected
12:16:26 MQT: tele/controle_garagem_casa/LWT = Online (retained)
12:16:26 MQT: cmnd/controle_garagem_casa/POWER = 

Here is the MQTT broker log:

1544790665: Saving in-memory database to /data/mosquitto.db.
1544792466: Saving in-memory database to /data/mosquitto.db.
1544794267: Saving in-memory database to /data/mosquitto.db.
1544796068: Saving in-memory database to /data/mosquitto.db.
1544797869: Saving in-memory database to /data/mosquitto.db.
1544799671: Saving in-memory database to /data/mosquitto.db.
1544800593: Client sonoffbridge_casa_sala has exceeded timeout, disconnecting.
1544800593: Socket error on client sonoffbridge_casa_sala, disconnecting.
1544800594: Client d1_casa_cozinha_lava_loucas has exceeded timeout, disconnecting.
1544800594: Socket error on client d1_casa_cozinha_lava_loucas, disconnecting.
1544800595: New connection from 192.168.1.146 on port 1883.
[INFO] found DVES_USER on local database
1544800595: New client connected from 192.168.1.146 as d1_casa_cozinha_lava_loucas (c1, k10, u'DVES_USER').
1544800595: New connection from 192.168.1.72 on port 1883.
[INFO] found DVES_USER on local database
1544800595: New client connected from 192.168.1.72 as sonoffbridge_casa_sala (c1, k10, u'DVES_USER').
1544800595: Client d1_casa_controle_garagem has exceeded timeout, disconnecting.
1544800595: Socket error on client d1_casa_controle_garagem, disconnecting.
1544800596: New connection from 192.168.1.118 on port 1883.
1544800596: New client connected from 192.168.1.118 as d1_casa_controle_garagem (c1, k15).
1544801472: Saving in-memory database to /data/mosquitto.db.

There are no disconnection from the Wifi whatsoever, very odd.

Any luck for any of you?

Thanks!


#15

The issue stopped for me after upgrade to 84.2. I have had 1 disconnect since 84.3. It appears to be the same sonoffs device. I may upgrade tasmota to 6.4 but it’s not causing me huge issues. Are you still having trouble?


#16

Hello! Yes, many disconnects. I’ve updated to version 84.3 and flashed a couple of D1 Minis with Tasmota 6.4 and immediately reverted to 6.21 after the devices were loosing wifi connectivity every 30 seconds. Version 6.21 seems to be a bit more stable. Besides that, still many disconnects from Tasmotised Sonoffs (version 5.14).


#17

I tested a sonoff with 6.4.1 today and it was constantly disconnecting all the time. I think the disconnects were WiFi and therefore Mqtt. I have reverted back to 6.3.0.16 and ok again.

I noticed 6.4.1 upgrades the core to 2.4 and I think that is the issue. There are reports that core 2.5 is ok but I am not trying again now.


#18

Hey guys, I was able to fix my custom devices by setting WiFi.setSleepMode(WIFI_NONE_SLEEP);
My devices also use core 2.4 atm.


#19

I started getting a lot of disconnects with 6.4.0 Never had issues before.
I ended up compiling my own Tasmota firmware with core 2.5, also did a reset 1 command on the sonoff to clear out the configuration and it then picked up the defaults in my compiled firmware. It’s been 12 hours now with no disconnects. I also did sleep 0
Got the info from the Wiki here


#20

Does sleep 0, turn off sleep. I tried that with core 2.4 and that was the main reason I had to do a cable flash.