[SOLVED] ESPHome DHCP client problems, temporary solution with manual fixed IP

Hi all,
On a Wemos D1 mini ESP8266 testing device I’m experiencing continuous disconnections, every 2-3 minutes even with a optimal wifi signal strength.
The device gets its IP from the DHCP server, a rock-solid OpenWrt.
The builtin LED flashes as it should do only when disconnected, otherwise it stays off.
ESPHome is 1.14.2 just updated.

Here is a log excerpt:

...
[13:31:21][I][mqtt:162]: Connecting to MQTT...
[13:31:21][I][mqtt:202]: MQTT Connected!
ERROR Error while reading incoming messages: Error while receiving data: [Errno 104] Connection reset by peer
WARNING Disconnected from API: Error while receiving data: [Errno 104] Connection reset by peer
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
WARNING Couldn't connect to API (Timeout while waiting for message response!). Trying to reconnect in 1 seconds
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
INFO Successfully connected to d1mini_test1.local
[13:34:21][I][mqtt:162]: Connecting to MQTT...
[13:34:21][I][mqtt:202]: MQTT Connected!
ERROR Error while reading incoming messages: Error while receiving data: [Errno 104] Connection reset by peer
WARNING Disconnected from API: Error while receiving data: [Errno 104] Connection reset by peer
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
WARNING Couldn't connect to API (Timeout while waiting for message response!). Trying to reconnect in 1 seconds
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
INFO Successfully connected to d1mini_test1.local
[13:36:21][I][mqtt:162]: Connecting to MQTT...
[13:36:21][I][mqtt:202]: MQTT Connected!
...

And here is the ESPHome d1mini_test1.yaml, it’s quite basic:

substitutions:
  device_name: d1mini_test1

esphome:
  name: $device_name
  platform: ESP8266
  board: d1_mini

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:

api:
  password: !secret default_password

ota:
  password: !secret default_password

web_server:

mqtt:
  broker: !secret mqtt_broker
  birth_message:
    topic: ${device_name}/birth
    payload: hi
  will_message:
    topic: ${device_name}/will
    payload: bye

# Wemos D1 Mini builtin LED
status_led:
  pin:
    number: D4
    inverted: true

Any hint?
Thanks,
Piero

These are the MQTT messages sent by this device:

$ mosquitto_sub -v -F "%I %t\t%p" -h 172.16.1.1 -t '#'
2019-11-09T19:59:10+0100 d1mini_test1/debug	[I][mqtt:202]: MQTT Connected!
2019-11-09T19:59:10+0100 d1mini_test1/birth	hi
2019-11-09T19:59:10+0100 d1mini_test1/will	bye
2019-11-09T19:59:10+0100 d1mini_test1/status	offline
2019-11-09T20:01:27+0100 d1mini_test1/will	bye
2019-11-09T20:02:27+0100 d1mini_test1/debug	[I][mqtt:202]: MQTT Connected!
2019-11-09T20:02:28+0100 d1mini_test1/birth	hi
2019-11-09T20:06:27+0100 d1mini_test1/will	bye
2019-11-09T20:07:27+0100 d1mini_test1/debug	[I][mqtt:202]: MQTT Connected!
2019-11-09T20:07:28+0100 d1mini_test1/birth	hi
2019-11-09T20:11:27+0100 d1mini_test1/will	bye
2019-11-09T20:12:27+0100 d1mini_test1/debug	[I][mqtt:202]: MQTT Connected!
2019-11-09T20:12:28+0100 d1mini_test1/birth	hi

Hope it helps.

And this is the corresponding ESPHome dash log:

Use the api or mqtt.

Not both. Read the big red box here:

Thanks @tom_l , but I get the same behavior disabling mqtt: and web_server:

My native language isn’t English, but I understand that the big red box substantially says and, not or.
Reading that box I understand one can use both MQTT and API, provided api: must be disabled only if the native API isn’t used by HA (HA isn’t connected through the API), to avoid the no-API-client-connected timeout, but as you can see below, the device successfully connects to HA.

So that isn’t my case, as I currently use the API to communicate with HA, and MQTT only to communicate to the MQTT clients I write (these are crytical and should not depend on HA availability, see Someone has a plain C (not python) example of ESPhome API client?).

In addition, the wiki says the API-not-connected reboot is every 5 minutes, not a couple of minutes as in my case, so I supposed this is another issue.

[21:15:17][D][api.connection:579]: Client 'Home Assistant 0.101.3 ([redacted])' connected successfully!
[21:15:17][I][app:100]: ESPHome version 1.14.2 compiled on Nov  9 2019, 21:14:38
[21:15:17][C][status_led:019]: Status LED:
[21:15:17][C][status_led:020]:   Pin: GPIO2 (Mode: OUTPUT, INVERTED)
[21:15:17][C][wifi:409]: WiFi:
[21:15:17][C][wifi:277]:   SSID: [redacted]
[21:15:17][C][wifi:278]:   IP Address: 172.16.2.128
[21:15:17][C][wifi:280]:   BSSID: [redacted]
[21:15:17][C][wifi:281]:   Hostname: 'd1mini_test1'
[21:15:17][C][wifi:285]:   Signal strength: -50 dB ▂▄▆█
[21:15:17][C][wifi:289]:   Channel: 7
[21:15:17][C][wifi:290]:   Subnet: 255.255.0.0
[21:15:17][C][wifi:291]:   Gateway: [redacted]
[21:15:17][C][wifi:292]:   DNS1: [redacted]
[21:15:17][C][wifi:293]:   DNS2: [redacted]
[21:15:17][C][logger:175]: Logger:
[21:15:17][C][logger:176]:   Level: DEBUG
[21:15:17][C][logger:177]:   Log Baud Rate: 115200
[21:15:17][C][logger:178]:   Hardware UART: UART0
[21:15:17][C][ota:029]: Over-The-Air Updates:
[21:15:17][C][ota:030]:   Address: d1mini_test1.local:8266
[21:15:17][C][ota:032]:   Using Password.
[21:15:17][C][api:095]: API Server:
[21:15:17][C][api:096]:   Address: d1mini_test1.local:6053
ERROR Error while reading incoming messages: Error while receiving data: [Errno 104] Connection reset by peer
WARNING Disconnected from API: Error while receiving data: [Errno 104] Connection reset by peer
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
WARNING Couldn't connect to API (Timeout while waiting for message response!). Trying to reconnect in 1 seconds
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
INFO Successfully connected to d1mini_test1.local
ERROR Error while reading incoming messages: Error while receiving data: [Errno 104] Connection reset by peer
WARNING Disconnected from API: Error while receiving data: [Errno 104] Connection reset by peer
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
WARNING Couldn't connect to API (Timeout while waiting for message response!). Trying to reconnect in 1 seconds
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
INFO Successfully connected to d1mini_test1.local
ERROR Error while reading incoming messages: Error while receiving data: [Errno 104] Connection reset by peer
WARNING Disconnected from API: Error while receiving data: [Errno 104] Connection reset by peer
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
WARNING Couldn't connect to API (Timeout while waiting for message response!). Trying to reconnect in 1 seconds
INFO Connecting to d1mini_test1.local:6053 (172.16.2.128)
INFO Successfully connected to d1mini_test1.local

The device connects to Wifi and to HA via API, then the problems begin.

Thanks,
Piero

My native language isn’t English,

Mine is and I am ashamed to say you are correct. Sorry.

2 Likes

I am seeing similar behaviour since upgrading ESPhome to 1.14.2 and HASS.io 0.101.3.

This is only using the api and these devices were rock solid before upgrading.

Unfortunately I upgraded both at about the same time so do not know which one may be the problem. In addition, I’m also seeing some ESPHome devices in the ESPHome panel as disconnected (status red) but they are connected - they are working and the log function will connect and deliver the logs from them in real time.

Not sure what to do to correct this. - is it possible to downgrade ESPhome?

Richard

I should have added:

  • the two that are dropping off WiFi are NodeMCU devices and have excellent WiFi signal
  • the one that is showing as disconnected in the ESPHome screen but is working is a Sonoff switch, flashed with ESPHome - it was also rock solid before
  • the logs from the ESPHome screen do not show anything untoward

Hi Richard,

Please try adding “fast_connect: true” and “reboot_timeout: 45min” in your wifi config. To accelerate the wifi dial in process you could also give a static ip to your esp. Maybe this will stop scanning process, redialing and reconnecting?

example (adapt ip to your infra structure, play around with reboot timeout)

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true
  reboot_timeout: 45min
  manual_ip:
    static_ip: 10.10.10.10
    gateway: 10.10.10.1
    subnet: 255.255.255.0
    dns1: 10.10.10.1
    dns2: 1.1.1.1

Crossing my fingers this solves

Trying to narrow down this problem with the community help before opening a github issue.

Using the Wifi config shown in the OP:

  • The wemos reply normally to ping while the API loses connection, maybe the device networking (Wifi and/or IP) “disconnects” for less than 1 second.
  • The problem appears less frequently with 1.13.6 (HASS on a Raspi4-4GB) than 1.14.2 (HASS on a beefy I7). So 1.13.6 isn’t totally immune here.
  • With 1.13.6 the status LED stays always off, while with 1.14.2 the status LED blinks slowly during all the API lost connection time.
  • The wemos issues a DHCP request just before the HA server reconnects to its ESPHome API server.
  • This shoud not be a wemos power supply problem, as this wemos d1 mini is powered via USB with 100nF+470uF capacitors to stabilize its onboard 3.3V.
  • The wifi LAN connectivity is optimal and there isn’t any problem with other LAN devices (except with one PiZero CAM far in the backyard). The AP is 3 meters far from this wemos (it reports -43dB), both are in the same room.
  • This board is a Wemos D1 Mini Pro v1.1.0, the problems doesn’t change by defining board: d1_mini_pro or board: d1_mini and, as @RGN01 pointed out, it isn’t limited to wemos boards.
  • fast_connect: true and reboot_timeout: 45min doesn’t solve the problem.

The manual fixed IP does the magic trick! Thanks @andilge , so this seems a DHCP client issue, I will report it on github.

Piero

Mine were on fixed IPs so that much wasn’t common. I’ve updated the settings - adding fastconnect and reboot_timeout (45mins) as recommended.

wifi:
  ssid: !secret wifi_name
  password: !secret wifi_password
  domain: !secret wifi_domain
  fast_connect: true
  reboot_timeout: !secret wifi_reboottimeout

  manual_ip:
    static_ip: 192.168.0.193
    gateway: !secret wifi_gateway
    subnet: !secret wifi_subnet
    dns1: !secret wifi_dns01
    dns2: !secret wifi_dns02

Unfortunately this has NOT fixed my issue. They are still dropping out - as is one PIR linked via Deconz.

This is all new and the result of a recent change / update I think.

@RGN01 did you try with ping if it’s a wifi connectivity or IP issue?

@RGN01

There are strong hints this has to do with DHCP lease with esphome 1.14 onward (situation right now).

If you try to attribute static ip from the other side with static lease in your router, commenting out “manual_ip” in esphome wifi, will this make things better for you?

This is not the solution, but it might give further indications.

I also tried fixed reservations (the same addresses) but that does not seem to help. Will retry this.

This shows what is happening https://drive.google.com/file/d/1Sk7tcxcMr8eFgcGvHVMkYt1fDMAQexmH/view?usp=sharing

The two NodeMCUs affected are in different rooms, with different WiFi access points, all sharing a gigabit ethernet network on a managed switch showing no errors so this is unlikely to be the network.

An ongoing ping to both devices is not showing breaks when the interruptions occur.

@PieBru looking from the other side at this issue, this can be something in odhcpd of OpenWRT as well. We see DHCP client and server don’t play well together, but the cause is still unknown. Is your OpenWRT up to date, can you give some more details about the odhcpd you use?

To help, I don’t use OpenWRT - my network devices are all Draytek Vigor managed devices (which may use OpenWRT under the covers, I guess!)

I unfortunately won’t be able to do any more on this for a few hours - family time needed, I’m told! :slight_smile:

@andilge
Other IoT MQTT clients doesn’t show problems (i.e. Espurna-dev with custom modules I developed) or frequent disconnects from the broker, which is hosted by the same OpenWRT router.

OpenWRT:

Model: TP-Link TL-WDR3600 v1
Architecture: Atheros AR9344 rev 2
Firmware Version: OpenWrt 18.06.4 r7808-ef686b7292 / LuCI openwrt-18.06 branch (git-19.190.55614-35357e4)
Kernel Version: 4.9.184

Installed packages (dhcp)

odhcp6c 2018-07-14-67ae6a71-15
odhcpd-ipv6only 1.15-3

Thanks for your support.
Piero

similar to mine, I can confirm this odhcpd should do fine

@RGN01
This Otto’s hint didn’t solve my case, but just to narrow it further you may try building with arduino 2.4.2 as Otto suggested here: https://github.com/esphome/issues/issues/838#issuecomment-552183232