ESPHome device won't connect to WiFI after upgrading to 1.20.4

Hi all,

My home mesh has started to fail, so I am in the process of installing a new Ubiquiti UniFi setup. One of the advantages of doing this is you can create additional WiFi networks per AP, which means you can manually spread your IOT nodes between APs instead of having them clump onto a specific one, as they are wont to do.

So this means I am having to reflash my devices because of the new SSID. But I am running into the same problem I had when upgrading from 1.13.6 to 1.14.1: the nodes no longer connect to the WiFi. Here’s what I did so far.

First, I took a new Sonoff Basic R3, and flashed it manually as a test, on my bench. It worked OK.

Encouraged, I flashed one of the Sonoff Basic R2 1.0 modules that I have as a light switch. It failed to connect to the network, so I pulled it out of the wall, and tried to flash it manually. It takes the bin ie it’s programmable, but it won’t connect to the network.

I tried flashing it with the Test bin I had put onto the R3. Still no dice.

Then I went into my backup folder and found an old bin from 1.13.6, and flashed that onto the Sonoff Basic R2 1.0. It worked, the device connects to the old network.

So on the plus side, I still have a working device, but I’m not able to segregate it off onto its own IOT network. Also, I am very afraid to flash the rest of my devices, as some of them are extremely hard to reach (up in the roof), and I am a fat old man with no wish to die, yet.

I remember with the 14.1 upgrade there was some signal strength setting you could try to make the device connect, which sometimes worked and sometimes didn’t. Does anyone remember what that was?

Alternatively, and preferably, is there some setting in 1.20.4 that solves this problem?

Thank you.

EDIT: I remembered the setting, it was:

output_power: 17.5db

However, that did not solve the problem.

1 Like

After much searching and experimentation, and much manual flashing, I have found that I can get it to connect to WiFi by changing the log level to INFO:


logger:
  level: INFO

This is a workable solution as I really only need full debug on one or two devices which I just won’t upgrade.

Leaving this thread here in case anyone else struggles with this.

3 Likes

Hello, same problem after update to 1.20.4 :frowning:
all my ESP32 do not connect anymore! I will try your trick, thank you

I think it’s a memory issue. I have several nodes that would not connect after upgrading.
I discovered that by commenting out ap: and captive_portal:, everything works again.

#  # Enable fallback hotspot (captive portal) in case wifi connection fails
#  ap:
#    ssid: "Wemos_Switch Fallback Hotspot"
#    password: "Oi136ZHGIzON"
#
#captive_portal:

Hi @stevemann, thanks for posting your solution, I’m sure that will help some people.

In my case, the devices I am updating were still running 1.13.6, which was the last stable version after the WiFi issues in 1.14.1. After those occurring, I rolled everything back to 1.13.6.

This being the case, my devices do not have the captive portal code in the yaml. So your solution wouldn’t work for me.

But it makes me think - maybe newer Sonoff devices based on the ESP 8285 have more RAM than the older ESP8266 ones? I think you might be on to something.

Anyway, I have been slowly working my way through my list of Sonoff devices, reflashing with level: INFO wherever I remember that the device is version 1 Sonoff. On the plus side, I am mostly up to date now.

I am, however, not touching my Sonoff iFan02s. The old YAML does not validate under 1.20 and I am too scared to update it, because manually flashing those was a bitch.

1 Like

same, I do not have ap and captive_portal in my configuration

it’s ok for me, it works with logger: info
thank you :slight_smile:

I had four minis and a couple of Basics behind light switches. Every one had to be manually updated. My Sonoff Basics are the older 8266 version.

I use logger: level: DEBUG which if I read the manual correctly, should use more memory than INFO:
“Please note that the global log level determines what log messages are saved in the binary.”
(Do I read this correctly?)

Just for info, I tested this theory just now. In this yaml file I only changed the logger level. The board is a Wemos D1 Mini with 4mB of Flash.

logger: level: VERBOSE
RAM:   [=====     ]  47.6% (used 39024 bytes from 81920 bytes)     2736 bytes more
Flash: [====      ]  41.4% (used 432712 bytes from 1044464 bytes)  52,796 bytes more

.......... BASELINE ..........
logger: level: DEBUG
Checking size /data/office_light/.pioenvs/office_light/firmware.elf
RAM:   [====      ]  44.3% (used 36288 bytes from 81920 bytes)     BASELINE
Flash: [====      ]  37.1% (used 379916 bytes from 1023984 bytes)

logger: level: INFO
RAM:   [====      ]  43.6% (used 35708 bytes from 81920 bytes)     580 bytes less
Flash: [====      ]  36.5% (used 374012 bytes from 1023984 bytes)  5904 bytes less.

logger: level: WARN
RAM:   [====      ]  43.6% (used 35692 bytes from 81920 bytes)     596 bytes less
Flash: [====      ]  36.5% (used 373320 bytes from 1023984 bytes)  6596 bytes less

This tells me two things. One, that the logger level does impact flash useage, and two, if 6K of Flash is the difference from working or not, then this is scary close to the next update going over the line? It certainly makes me much more wary of performing updates in the future.

From my experiment, if you drop back to WARN, you will save another 692 bytes. (YMMV).

I spent the morning going over my ESPHome devices, specifically to review and redefine WiFi settings. First, they were all assigned to an IOT SSID. But what’s making ESPHome life sooo much better is:

power_save_mode: NONE

It seems the ESP8266 likes to disable WiFi to save power. That doesn’t help when trying to update/install new firmware, or poll the device. It also makes it impossible to see them on a scan of your local LAN subnet.

Hi @FredTheFrog, thanks for posting what’s working for you.

I consulted the documentation, and it says that power_save_mode: NONE is the default value for esp8266 chips. I can’t think why explicitly setting something that will be set automatically should make a difference - except perhaps your devices are newer Sonoffs with ESP8285 … but then you wouldn’t have the problem anyway. AARGH. I dunno.

But anyway, I’m glad it’s working for you, and now people have something else to try.

Hey @stevemann, that’s exactly why I can’t face updating my iFan02s. There are 6 of them, and every one is in the roof. They’re staying on 1.13.6, and will just have to connect to the main house SSID. It’s not ideal, but, it will have to do.

I think you may be right. If that’s true, then we may have no choice but to revert to Tasmota for older devices. Tasmota still has a lite version.

It’s either don’t update anymore, or Tasmota, or replace the switches with newer ones, but frankly I can’t afford to.

On first thought it doesn’t make sense to me that changing the log level would fix wifi connectivity, but given the number of issues I’ve seen about connectivity issues with v1.20, I expect there really is something here.

It’s possible that wifi connectivity doesn’t work because the RAM is exhausted, as wifi is one of the first things to run during boot that needs a significant amount of RAM. However, it’s strange that setting the log level to DEBUG would cause RAM usage to increase over INFO, as on the ESP8266 log strings are stored in the flash, and there’s no memory reserved for log messages. I can’t think of any reason why increased flash usage would cause WiFi connectivity to fail (there’s nothing that uses the flash there).

Complicating factor here is that I can’t reproduce this issue myself, even with a “fat” binary:

RAM:   [=====     ]  49.9% (used 40888 bytes from 81920 bytes)
Flash: [====      ]  43.7% (used 456940 bytes from 1044464 bytes)

If anyone still has a device that fails and that can be easily flashed (i.e. it’s not in the roof :)), I’d appreciate the following input:

  • Does it work with either v1.18 or v.19? I’ve seen some reports that either one of these versions introduced the problem. If so, what’s the size of binaries from those versions?
  • Which integrations are you using? Full YAML would be even better.
  • Can you post the serial logs while and after it fails to connect to wifi? There should be information about why it fails to connect there.

Hi @oxan, thanks for taking an interest in this. I agree that it’s mystifying, but it’s also not the first time it’s happened.

Are you involved in ESPHome’s development? If so, I might be prepared to drag one out of the wall for you and run some tests.

The answers to your questions that I can answer are as follows.

  1. I don’t know if it works with 1.18 or 1.19 because my upgrade path was 1.13.6 to 1.20.4 for the problem nodes. I did have 1.15.? running for a while, but I only ever used it with newer hardware, or with D1 Minis, which seem fine with 1.20.4.

  2. The problem nodes are all basic light switches, with a touch sensor wired to a free GPIO to provide a physical on/off switch which is required for WAF. They use binary_sensor, output, light, and switch.

  3. I don’t know how to acquire serial logs. I compile my firmware binaries in the Home Assistant plugin interface, download them through the browser to my Windows PC, and then manually flash with an FTDI interface. If what you are asking is possible with the FTDI interface, I’ll gladly have a go for you, if it will help the community, but you might need to talk me through it.

An example yaml. Note that this device works now, but if logging is changed to DEBUG, it will not connect to WiFi:


esphome:
  name: son002_masterbedroom
  platform: ESP8266
  board: esp01_1m

wifi:
  ssid: [IOT SSID]
  password: [PASSWORD]
  manual_ip:
    static_ip: 192.168.1.76
    gateway: 192.168.1.1
    subnet: 255.255.255.0
    dns1: 192.168.1.103    
  fast_connect: true

# Enable logging
logger:
  level: INFO

# Enable Home Assistant API
api:

ota:

binary_sensor:
  - platform: gpio
    id: button
    pin:
      number: GPIO14
      mode: INPUT_PULLUP
      inverted: True
    on_press:
      - light.toggle: light1

output:
  - platform: gpio
    pin: GPIO12
    id: relay

light:
  - platform: binary
    id: light1
    name: "Light - Master Bedroom"
    output: relay

switch:
  - platform: restart
    name: "Reboot son002_masterbedroom"

I had the same problem on one of my devices too, with the only difference that it could connect to wifi but could not connect to the Home Assistant API. Setting logger level to INFO helped. Thanks!

PS. It looks like that settings logger: baud_rate: 0 (to disable logging over UART) also helps.

I also have an ESP8266-01 (1mB) and an older Sonoff Basic on the bench and would be willing to test.
Could the problem be skipping intermediate upgrades? (From 1.13.6 to 1.20.4, for example?)

I doubt it.

I did pretty much the exact same thing about 2 weeks ago - I went from 1.13.6-dev to 1.20.0 - and everything I have has no connectivity issues.

I have an eclectic mix of original Sonoff Basic, Sonoff R2, Sonoff iFan02, Shelly1 and a bunch of NodeMCU (running various sensors - DHT, HX711, Dallas, HC-SR04) devices and everything still works after the update.

Hi @DeeBeeKay, thanks for your response.

Yes, I’m involved in ESPHome development, see here. The last couple of PRs were specifically aimed at a potential out-of-RAM problem that might be causing this (but I’m not yet convinced that is actually the cause).

Regarding your questions:

  1. I understand, but it’d be useful to know, so we can pinpoint what exactly broke it.
  2. Thanks.
  3. Ah, that might be a bit more complicated. The best solution would be if you could connect the devices to the PC running the ESPHome plugin (I think the FTDI interface should work), and then view the logs the logs through the plug-in interface. Unfortunately I’m not super familar with how the HA plug-in works – maybe someone else could help.

In the meantime, I’ve created this GitHub issue for discussion about this topic (you’re not the only one hitting it): Bootloop/wifi connection issues with illegal instruction exception, mostly on Sonoff devices · Issue #2309 · esphome/issues · GitHub

Additionally, you don’t have a arduino_version in your config by any chance?

I guess I don’t know what “plugin” you are referring to but my ESPHome dashboard has a button for viewing the logs:

ex1

Are you using a different dashboard?

Hi @finity, I have that same dashboard, but it only shows logs for devices that are already attached to WiFi. Since the problem is devices not connecting to WiFi, it is less helpful.

My understanding is that if a device were connected via USB to the machine running ESPHome, more log information would be available, and we’d be able to see what the device was doing up until the point of failure, and after.

The “plugin” I am referring to is because I am an old, and forget the difference between a plugin and an addon. I meant that I am running ESPHome as an addon for Home Assistant, as opposed to running it standalone.

Hi @oxan, thanks for your continued investigation of this.

I’m not aware of any arduino_version configuration option. I’m not using one anywhere in my yaml files, and it does not appear as a pre-launch configuration option in the Home Assistant addon (plugin). Basically I wouldn’t know where to set this. Sorry I can’t be more helpful.

Thank you for creating the GitHub issue, I will keep an eye on that topic.