Can't stop ESPHome restarting with weak wifi

Resurrecting this thread as I never managed to get to the bottom of this last year and ended up closing the pool for winter :thinking:

Now we are in the middle of the pool season and the issue has started to occur again regularly this week.

I tried to capture logs using the esphome logs abc.yaml > log_file command. But I’m not sure how useful it is as I seem to miss the restart (due to lack of connection?)

[14:32:38][D][sensor:093]: 'Pool Pump Uptime Sensor': Sending state 7902.55518 s with 0 decimals of accuracy
[14:34:46][D][sensor:093]: 'Pool Pump Wifi Signal Sensor': Sending state -65.00000 dBm with 0 decimals of accuracy
[14:34:46][D][sensor:093]: 'Pool Pump Uptime Sensor': Sending state 7962.56104 s with 0 decimals of accuracy
[14:34:46][D][sensor:093]: 'Pool Pump Wifi Signal Sensor': Sending state -65.00000 dBm with 0 decimals of accuracy
[14:34:46][D][sensor:093]: 'Pool Pump Uptime Sensor': Sending state 8022.55713 s with 0 decimals of accuracy
[14:36:46][D][sensor:093]: 'Pool Pump Wifi Signal Sensor': Sending state -65.00000 dBm with 0 decimals of accuracy
[14:46:40][D][sensor:093]: 'Pool Pump Uptime Sensor': Sending state 533.29999 s with 0 decimals of accuracy
[14:47:36][D][sensor:093]: 'Pool Pump Wifi Signal Sensor': Sending state -64.00000 dBm with 0 decimals of accuracy
[14:47:40][D][sensor:093]: 'Pool Pump Uptime Sensor': Sending state 593.29498 s with 0 decimals of accuracy

Is there another method to capture logs so I can try to diagnose these restarts once and for all? ESPHome is running on a Sonoff 4CH device.

Yes, for this kind of problems ota logs are not helpful. You need to get the local serial logs to get a clue what’s causing this unwated restarts.

Also you might want to change (increase) the log level in cause the default (DEBUG) isn’t verbose enough.

As a side node you might also try to narrow down the problems by deploying some debug sensors

Specially the reset_reason might be helpful for you as it tells you after a restart/reset happened :bulb:

Thanks @orange-assistant for the continued support. I did a bit more investigation and added the debug and restart config as suggested. Since this is a Sonoff device, it is not straightforward to connect a serial logger but I will do that as a next step if there is still not sufficient info here to diagnose the cause.

The entities show i) device info inc. restart reason ii) uptime and iii) wifi signal strength. The gaps in the graphs correlate to periods where HA is reporting that the device is unavailable. I previously assumed that these dropouts was due to poor wifi signal strength but I can now see that in this specific example the wifi signal strength was actually stronger during the instable periods than the stable. So perhaps the issue is causing the wifi dropouts rather than the other way around.

From the graphs I see a long period of stability followed by a period of instability during which the device rebooted several times. The restart reasons reported from the OTA logs are sometimes exception 28 and sometimes exception 9.

 Reset Info: Fatal exception:28 flag:2 (Exception) epc1:0x40236a5b epc2:0x00000000 epc3:0x00000000 excvaddr:0x0000000e depc:0x00000000
[02:13:27][D][text_sensor:064]: 'Device Info': Sending state '2023.6.4|Flash: 1024kB Speed:40MHz Mode:DOUT|Chip: 0x00c9238b|SDK: 2.2.2-dev(38a443e)|Core: 3.0.2|Boot: 31|Mode: 1|CPU: 80|Flash: 0x00144051|Reset: Exception|Fatal exception:28 flag:2 (Exception) epc1:0x40236a5b epc2:0x00000000 epc3:0x00000000 excvaddr:0x'
[02:13:27][D][text_sensor:064]: 'Reset Reason': Sending state 'Exception'

[06:06:13][D][debug:254]: Reset Reason: Exception
[06:06:13][D][debug:255]: Reset Info: Fatal exception:9 flag:2 (Exception) epc1:0x4023b407 epc2:0x00000000 epc3:0x00000000 excvaddr:0x696817ad depc:0x00000000
[06:06:13][D][text_sensor:064]: 'Device Info': Sending state '2023.6.4|Flash: 1024kB Speed:40MHz Mode:DOUT|Chip: 0x00c9238b|SDK: 2.2.2-dev(38a443e)|Core: 3.0.2|Boot: 31|Mode: 1|CPU: 80|Flash: 0x00144051|Reset: Exception|Fatal exception:9 flag:2 (Exception) epc1:0x4023b407 epc2:0x00000000 epc3:0x00000000 excvaddr:0x6'

Does this shed any light on the issue? Or do I need to solder on some header pins to read the serial logs?

Do you have more than one AP in range/installed?

Looks like something fatal. :grimacing:

Do you have the output_power set to a lower value to see if you get a more stable mileage?

  • output_power (Optional, string): The amount of TX power for the WiFi interface from 8.5dB to 20.5dB. Default for ESP8266 is 20dB, 20.5dB might cause unexpected restarts.

Yes I have Google Wifi and it seems that ESPHome had connected to a weaker access point. To remove this as a factor I have now setup a new, single, wifi access point in the garage with a different SSID

Since doing this I have had 5 days of trouble free running but unfortunately today the resets started again. My observations are:

  1. At 12.44 the reported wifi signal strength to the new access point dropped from around -50 to -60.
  2. First outage occurred at 12.51 for ~40 seconds
  3. Device reconnected at 12.52 for around 5 minutes.
  4. Second outage occurred at 12.57 to 13.29 (33 minutes!)
  5. Device crashes with fatal exception 9 two mins later at 13.31
  6. Device is reported “unavailable” until it reconnectts at 13.39
  7. Device crashes again at 13.44 with fatal exception 28
  8. Device back up and running at 13.45

Interestingly, I have setup another ESPHome device next to the troublesome device and so far its been stable.

Thanks, I’ll play with that setting next and see if it changes anything.

1 Like

Interesting. So with the same yaml configs (obviously different names) you get a different mileage?

Are they the same esp’s (dev boards)? Maybe even same batch? Or different boards/devices?

Often most crucial is the antenna design - and that was messed up on the esp-12e modules:


Source: WiFi module-The Difference Between ESP-12E and ESP-12F

Hi,
I have the same problem with MQTT, it reboot at random time even with good WiFi coverage, my config file:

Showing your esphome yaml may help and perhaps a picture of the circuit. What about the power supply?

The board is this one https://www.aliexpress.com/item/4000026433011.html
I’ve flashed ESPHome on it using their doc, it’s powered by 24V output from a 5KWh Solar Inverter, i guess it should be stable supply. The problem is that i can reproduce this if i reboot the router or nearby wifi repeater

If you are using this on HA probably don’t need the web server. It uses a lot of memory and the 8266 doesn’t have much memory. Try removing.

web_server:
  port: 80

Also suggest to set static ip, gateway and subnet

That make sense, does the same apply for captive_portal? would be nice to keep it for easy upgrades

Esphome documents warns you about web server but not captive portal. Leaving out bits of YAML is just part of the process of elimination for these sort of problems.

I thought so but nope. The other ESP device has now started resetting with the same fatal exception when wifi is poor. Today it (pool-doser) experienced two restarts whilst the original device (pool-pump) experienced none… but more on that below…

They are both Sonoff devices and interestingly both seem to have the same ESP8285 chip. I found these photos online which match my models and revisions:


Anyway, Since yesterday, I think I may have made a breakthrough. On both devices I had the fallback AP and captive portal enabled (although I don’t recall ever using them).

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  ap:
    ssid: "Pool-Pump Fallback Hotspot"
    password: "my-secret-password"

captive_portal:

I had an idea that the crashes were occuring when the wifi drop outs enabled the ESPHome AP/Captive Portal. Yesterday morning I removed the AP config from one device (pool-pump) and left it on the other (pool-doser). Today I have had two reboots on the doser device and none on the pump… so far :sweat_smile:

I will need another week of running to test my hypothesis properly. But if I am correct, I wonder if there is a bug/issue with the AP/Captive Portal when running on ESP8285 chips in at least two Sonoff models :skull_and_crossbones:

1 Like

In your spare sonoff you could run Tasmota to see if more stable at same time.

The captive Portal is indeed a little heavy on the esp82xx because it internally needs probably the web server component to work :point_down:

Please note that enabling this component will take up a lot of memory and may decrease stability, especially on ESP8266.
Web Server Component — ESPHome

To get a little more in-depth about memory usage (specially now that you have the 2 devices behaving differently) you may use the debug sensor which will inform about heap memory of your esps!

And is your yaml very complex/with a lot’s of compoents anyway or is it rather a simple one?

Both devices have been running with 100% uptime since disabling the fallback AP and captive portal. Even during periods of poor wifi.

So it really does look like the root cause was that the captive portal is too much for the ESP8285 chip in these Sonoff devices.

Rather simple. Basically the following yaml with a couple of scheduled automations (time:) and the captive portal.

2 Likes

thx for the infos.
I have a similar issue with an ESP32 and weak wifi.
It reboots every ~4h.

Did you let the “reboot_timeout: 0s” on api & wifi set?

Now i am trying to deactivate: fallback AP, capacitive portal, webserver & reboot timeout

I know the ESP32 is more powerfull, but i dont know what todo instead of this.

Yes. I set reboot_timeout to 0 but it didnt help.
Enable debug: to see the restart reason and determine if the restarts are expected or unexpected.

thx for the hint,
i have now activated the debug component like described here:

I hope there is some usefull information in the reset reason after the issue occurred, and not just “Software Reset CPU” :slight_smile:

There is just an PowerOnReset in the debug Logs.
Also i was connected ofer wifi to the device with live logs and NO exception or anything. Log just stops!

Here is a possible workaround (set static IP instead of using DHCP): Help with esphome error/rebooting every 2-3 hours - #4 by pOpY

Will post updates in the other thread.