Constant MQTT devices disconnections (socket error)

As next step I tried the firmware binaries from http://thehackbox.org/tasmota/020500/. Because of the MQTT error I made a custom build with a MQTT Keepalive of 30 seconds. This changed the log a bit:

1547307857: Received PINGREQ from sonoff-workbench
1547307857: Sending PINGRESP to sonoff-workbench
1547307867: Received PINGREQ from sonoff-workbench
1547307867: Sending PINGRESP to sonoff-workbench
1547307877: Received PINGREQ from sonoff-workbench
1547307877: Sending PINGRESP to sonoff-workbench
1547307887: Received PINGREQ from sonoff-workbench
1547307887: Sending PINGRESP to sonoff-workbench
1547307897: Received PINGREQ from sonoff-workbench
1547307897: Sending PINGRESP to sonoff-workbench
1547307908: Received PINGREQ from sonoff-workbench
1547307908: Sending PINGRESP to sonoff-workbench
1547307919: Client sonoff-workbench already connected, closing old connection.
1547307919: Socket error on client sonoff-workbench, disconnecting.
1547307919: New client connected from 192.168.130.33 as sonoff-workbench (c1, k10, u'sensors').
1547307919:     tele/sonoff-workbench/LWT
1547307919: Sending CONNACK to sonoff-workbench (0, 0)
1547307919: Sending PUBLISH to homeassistant (d0, q0, r0, m0, 'tele/sonoff-workbench/LWT', ... (7 bytes))
1547307919: Received PUBLISH from sonoff-workbench (d0, q0, r1, m0, 'tele/sonoff-workbench/LWT', ... (6 bytes))
1547307919: Sending PUBLISH to homeassistant (d0, q0, r0, m0, 'tele/sonoff-workbench/LWT', ... (6 bytes))
1547307919: Received PUBLISH from sonoff-workbench (d0, q0, r0, m0, 'cmnd/sonoff-workbench/POWER', ... (0 bytes))
1547307919: Received SUBSCRIBE from sonoff-workbench

This time the MQTT server doesn’t abort the connection because of a timeout (because it’s bigger now). Instead the Sonoff opens a new connection (because the old one died silently). My WiFi isn’t flacky in general. There is no raeson for delays >10 seconds. As next step I tried the SDK 3.0.0 / STAGE. This doesn’t solve the issue, too.

As last step I’m using the suggested Core 2.3 build as @Bieniu suggested. This is the same Core as my previous setup with Tasmota 5.12. Hopefully it will solve the issue. This is a nice write up for toubleshooting: https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#wifi-issues-arduino-core-versions-and-expressif-sdk

I after upgrading to the Mosquitto addon v4, I was getting lots of socket errors and disconnects showing in the Mosquitto log. I did have mqtt: and a bunch of settings in the configuration.yaml file left over from the Mosquitto v1. I also have each mqtt switch speicifed in that file.

I read that i needed to comment out those sections, save, check and restart. After doing so, my switches checked in and showed in the MQTT log, but did not show up in the Overview anymore.

So then I read about the Integrations in the Configuration menu. There was an option called “MQTT” and when I clicked on it, a popup dialog asked for IP, port, username and password as well as if I wanted automatic “Discovery”, which I said yes.

After another restart, my switches showed up again in the Overview and functioned correctly.
If I go back to MQTT Integrations, it reports “This integration has no devices”, which i have read is only for devices that were “Discovered” and since all mine are in the configuration.yaml file, it made sense.

I am using Sonoff switches flashed with Tasmota firmware. I can’t turn on MQTT_HOST_DISCOVERY in their firmware becasue my wife just bought am iRobot e6 from Costco and it has its own little local MQTT broker in it to talk to its phone APP. That makes all my Sonoff switches MQTT settings get overwritten to the iRobot’s IP address and they lose connection to HASS.IO’s Mosquitto broker obviously. SO no auto discovery for me!

Anyway, I am still getting socket errors and disconnects:

1547321982: New client connected from 192.168.1.33 as Sonoff-071A07 (c1, k60, u’mqttusername’).
1547322598: Socket error on client Sonoff-BD34A2, disconnecting.
1547322599: New connection from 192.168.1.30 on port 1883.
[INFO] found mqttusername on local database
1547322599: New client connected from 192.168.1.30 as Sonoff-BD34A2 (c1, k60, u’mqttusername’).
1547322657: New connection from 192.168.1.30 on port 1883.
[INFO] found mqttusername on local database
1547322658: Client Sonoff-BD34A2 already connected, closing old connection.
1547322658: Client Sonoff-BD34A2 disconnected.
1547322658: New client connected from 192.168.1.30 as Sonoff-BD34A2 (c1, k60, u’mqttusername’).

I am also getting some weird connections like:

1547321294: New client connected from 192.168.1.180 as 0a5595e8-e627-4aa8-880c-e51a560b182b (c1, k60, u’mqttusername’).

The part after the “as” is undecipherable and I never had anything look like that with v1.

Need help with fixing the socket errors.

When I look at the table listing issues with different core’s, 2.3 has some gotchas as well. I’m using a Fritzbox and also Mesh. I switched from using auto channel on my WiFi. So from the notes, it seems as if 2.5 is the best core for me to use. Having said that I had 100% stability with 2.3 core in firmware 6.3 so go figure.

Even though I get ‘dropouts’ they seem to last only for a second and maybe I get 1 every few hours. I’d have to be pretty unlucky to get a dropout in the middle of an automation action so I think I can live with it for now and I guess they will eventually fix it.

I’m using Core 2.3.0 now. No issues for 12 hours.

I have the identical issue with some r1 basics (upgraded them all to R2 boards and issue went away in all but 1) and a T3 switch which i cannot resolve. The switch is now running (was on 5.12 with same issue)
Core/SDK Version 2_4_2/2.2.1(cfd48f3)
Program Version 6.4.1(sonoff) (was on 5.12 with same issue)

When checking console log on the switch you see
17:14:27 MQT: Attempting connection…
17:14:28 MQT: Connected
and when checking the Mosquitto log
1547828049: Client DVES_9876A2 has exceeded timeout, disconnecting.
1547828049: Socket error on client DVES_9876A2, disconnecting.
1547828050: New connection from 192.168.1.149 on port 1883.
[INFO] found DVES_USER on local database
1547828050: New client connected from 192.168.1.149 as DVES_9876A2 (c1, k10, u’DVES_USER’).
1547828061: Client DVES_B0EEA1 has exceeded timeout, disconnecting.
1547828061: Socket error on client DVES_B0EEA1, disconnecting.

Any ideas how to debug it would be appreciated.

http://thehackbox.org/tasmota/release/020300/

Upgrade minimal, then the normal release and your problems should be gone.

1 Like

Hello sir. Could you please elaborate your suggestion? I’ve stated that I was using version 6.4.1 with core 2.3 and still having problems. Thanks.

Hoping this gets resolved as I have these same issues.

I have the same problem on some of my sonoff s20 with 6.2.1 and 2_3_0 (and some versions before that). Havent found a solution yet.
The webserver of the tasmota is not available until i reboot my raspberry or reconnect the switch.

Those are not even related… Tasmota/Sonoff will work even if the Pi is off so this isn’t making sense to me unless you have a WLAN issue with the Pi swamping the network…

I relented yesterday and compiled 6.4.1.12 with Core 2.3.0 and I have not had any device dropout for 24 hours…

Those are not even related…

i´m sure they are. those constant connection errors seem cause a malfunction on the tasmota device.

btw: i´m using hassbian with a mosquitto server. the problem remains the same.

You could turn off the Pi and still be able to go to the web interface for Tasmota. They are not related in any way.

It is possible the Pi is flooding the network though.

Hello everyone.

I’ve found that the problem is due to wifi instabilities, packet loss.

I’ve managed to “fix” the problem by changing a couple of parameters on the Tasmota Firmware ino and changed how long to wait until next MQTT check and keep_alive messages. Compiled and updated the firmware OTA, everything seems to be a lot more stable, no more socket errors on log.

Using firmware 6.3 with core 2.3, pubsubclient mqtt.

Changed these lines on PubSubClient.h file:

// MQTT_KEEPALIVE : keepAlive interval in Seconds
// Keepalive timeout for default MQTT Broker is 10s
#ifndef MQTT_KEEPALIVE
#define MQTT_KEEPALIVE 45
#endif

// MQTT_SOCKET_TIMEOUT: socket timeout interval in Seconds
#ifndef MQTT_SOCKET_TIMEOUT
#define MQTT_SOCKET_TIMEOUT 60
#endif

Here is a default firmware compiled with the above settings: https://mega.nz/#!jQF3xKra!zEx9YInJoyTaOnihdo04oFCn2hNfdYASK9k39n1u2Yw

Hope it helps.

Thanks.

2 Likes

Updated to the following version with your adjustments to PubSubClient.h:

Program Version 6.4.1(sonoff)
Build Date & Time 2019-01-31T22:15:23
Core/SDK Version 2_4_1/2.2.1(cfd48f3)
Uptime 1T23:31:01

also switched from fixed mqtt settings to mqtt discovery and changed the ip adress. i suspect the latter for being the real reason, but no disconnects since nearly 2 days.

i´ve got another problematic device. i´ll check if it helps also.

can you also share your bin file
thank you

I was running 6.4.1 on 8 Sonoffs (S20, touch, basic, T1 2 gang) and had a lot of these socket errors (one more than the other, couldn’t really find a pattern).
I tried:

  • changing the IP addresses of my Sonoffs
  • changing the username/password of the MQTT host
  • adding a list of local users in MQTT config for the devices with most errors
  • changing the client name (DVES_xxxx )
  • erase all flash using esptool.py
  • changed sleep settings

Without any result. Uploaded the firmware of @Schneider yesterday, and so far no Socket errors anymore. Will check the next few days. :pray:

1 Like

@stanvv Please make sure your are using core 2.3 previously, too.

I agree. Core 2.3.0 fixed mine as well.

These socket errors were bothering me as well, I decided that reverting to an older core didn’t seem like a good fix, that using @Schneider’s fix would be the most beneficial.

I ended up modifying the PubSubClient.h the same way… BUT…
After trying to get a STATUS6 message from the switch I noticed that it was still sending a keep alive of 15 seconds!

After a grep search for more keep alive defines, I noticed that sonoff_post.h also had a MQTT_KEEPALIVE 15

After reseting the device and realizing it wasn’t in the CFG I rebuilt the firmwares and I’ve had no socket errors since even with a 60 RSSI.

Just for clarification, I did not modify the code in any way, no user settings just changed the keep alive definitions. My configurations are pretty generic on my Sonoff Basic V2’s

1 Like

Schneider’s fix IS using the older 2.3.0 core or didn’t you notice that?

1 Like