ESPhome issues when network goes down

I’ve been building a prototype automated dust collection system for the ‘makerspace’ where I volunteer. It’s a fun project, but I’ve run into a lot of bugs on the system whenever it goes down… and I expect it to go down fairly regularly. (It’s an old industrial building with electrical issues, the wifi goes down fairly regularly etc).
Whenever I shut down my test network, the process of getting everything back on line is a PAIN.
Typically only about half the devices will come back online without issues. The other half - either they require reauthorization (getting the api key for 30 different devices is fun) or they just disappear and have to be re-added from scratch (at which point once done, half the time they pop up again as a newly discovered device so I have to delete the other one).
Is this just standard? I understand that for many the expectation is that this will happen never or very very seldom…
What can I do to avoid these problems?
Thanks in advance…

Well I have never had issues like those in the 5 years I have been using esphome. Mine always reconnect without issue.

Tell us more about how they connect to WiFi, static addresses if any and how they are configured etc.

Sample of yaml code would also be useful.

Never had that kind of issues when an ESPHome device was powered off/on…
I definitely never had to reenter the api key.

I have ~50 ESPHome devices and DO get that randomly/occasionally for (always different ones) and honestly have no idea why, but it’s not an issue of the WiFi going down. I even have that pop up for a few devices that don’t even have API encryption setup. Luckily it’s not super-often and deleting the affected device and then re-adding in HA always solves it right away. I also noticed when we lost power a while back that I had several do it at the same time (we rarely lose power, so no idea if that would happen regularly).

I have pretty rock-solid WiFi (in an older neighborhood, so obviously the spectrum is pretty loaded but I’ve picked the best frequencies available) with multiple high-quality APs and no funny networking going on. It’s always confounded me.

esphome:
name: bandsaw_relay_controller
friendly_name: Bandsaw Relay Controller

esp8266:
board: esp01_1m

Enable logging

logger:

Enable Home Assistant API

api:
encryption:
key: “QgOSM3wDSgPDgT7y+6lak1bjrsbV/I0zaGEPU5NogNs=”

ota:

  • platform: esphome
    password: “9d555612adfa9260128506145e7003c8”

wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password

Enable fallback hotspot (captive portal) in case Wi-Fi connection fails

ap:
ssid: “Relay-Controller”
password: “NxSzo3OkQ2bg”

captive_portal:

Define the relay

switch:

  • platform: gpio
    pin: GPIO5
    name: “Relay”
    id: relay

Optional logging or control commands

sensor:

  • platform: uptime
    name: “Relay Uptime”

How is the wifi signal for them? Do you keep them in metal enclosures?

I would ditch the api encryption, i would use static IP and from what you’ve described, im kind of wandering how important wifi even is?

Have you considered just wiring the dust collection system to the things they are managing dust for? Like when the saw turns on, then so does the relay get turned On.

Physical buttons/switches would probably be useful or create a “master” esp node that uses 433mhz RF to control individual dust collection systems and then you dont need to rely on a unreliable wifi network. Everything can still be wireless, just with 433mhz…

I am considering it. Wondering if I could still use esphome & microcontrollers, just wired ones. What wired microcontrollers are there I could use? I’ve been using Wemos D1 clones to handle the switching and a raspberry pi as the hub.
In a room about 30x50 (with DCs directly below in the basement) I could just wire everything. So… RJ45 connectors? USB on hubs? Interesting ideas.

This might just be a case of not knowing the way an ESPHome device react with a WiFi loss.
Your ESPHome devices are probably up before the WiFi is available and when they can not connect to a WiFi network with a set time limit (ap_timeout = default 1min), then it starts up the fallback AP and the webserver.
This is probably where you detect that the devices have not reconnected to your WiFi network, which is now online again.

The ESPHome device will then sit in this state without a WiFi connection and acting as AP for a set time limit (reboot_timeout = default 15min) before it will restart and try the boot up and reconnection to the WiFi Network.

If you do not want this feature, then remove the ap: section and the keys under it.
If webserver is not used either, then remove the captive_portal key too, since it is just taking up memory then and quite a lot of it.

thanks good info…

Do you mean Esphome within HA or just Esphome? You can go either route, they’re just a bit different.

Have you checked your wifi signal yet to see if thats the problem?

wifi is fine here - router is two feet away. The problem is while I’m building the test network, I shut it down every night. And every morning at least one device requires re-configuring or re-authentication or is just not working or not there at all.
I’m using esphome within HA running on raspberry pi.

If by reconfiguring you mean inputting the SSID and password in its webserver, then your YAML code is buggy.
Restarting the device should be enough.

Really??

Buggy in what way? What sorts of bugs would cause that problem when the OP is using a default config and only added 1 gpio switch to that config…

You think that bug might be in his switch configuration then???

The only other possibility would be that you said a bunch of nonsense and tossed in some programmer lingo to make it sound like that rambling nonsense was coming from a person with expertise to make it seem believable. Also, when you answer someones question by telling them their “code is buggy” is about as helpful as suggesting to him that using sandpaper instead of toilet paper would clear up that rash, which is of 0.0 help to anyone.

Duplicate DHCP server on the same network? This comment makes me wonder.

Edit to add: Along those same lines of thinking, is there a second WiFi access point with the same login credentials as the test network?

Makerspace … lots of moderately heavy machinery ? Plenty of electrical & radio frequency interference ?

And which wifi goes down fairly regularly ? Is there a previous wifi network in the space, which was already unreliable … even before you added your test network ?

Physical distance to router is not proof that there is no interference. A dealer once told me that they took a quality Wi-Fi router to a client premesis, and with laptop on the table right next to the router couldn’t get a clear signal to make a connection. Another client was an ex telephone technician and cleaned all the phone wires and connectors in his home - and his wi-fi improved significantly. I find the most frustrating thing about wi-fi is that we can’t see what might be causing problems - we only make guesses and look at the results.

Thanks Wally, this reminded me of my repeated frustrations trying to add a new PC into my Windows home network, fiddling away but it just wouldn’t connect … finally learnt to setup the IP information then go make a cup of coffee. That 10-15 minutes made all the difference.

Geoffrey, are you turning all devices on at the same time; or powering the wi-fi router on, then giving it several minutes to activate before turning on the ESPHome devices ?

If the connection information inserted in the YAML code is lost on a disconnect and he has to recompile and reflash the device, then something is buggy.

ESPHome is flashed as an appliance.
If it fails reboot it and the code in the YAML should act as the last time it booted. It should not change.

If the device is not connecting to the WiFi, because the configuration set in the YAML code is wrong and you need to manually set it through the captive portal with each boot, then that is your bug.

The problem appears to be connected to Home Assistant. It might be, but probably isn’t something related to Wi-Fi credentials or network name. I don’t use HA much, but I have heard of people having problems with a single device having issues when it changes and sometimes there is the need to delete and add the device to get back to working.

I do have direct experience with power failures and iot devices that can struggle. My Tasmota devices would sometimes never recover the time after the Internet returned. I traced it to and issue with SNTP and eventually solved it by creating my own SNTP server. It can be very frustrating debugging an issue like this because it is typically inconsistent.

I would start by making sure the devices are actually connected to the network and go from there. There are a lot of layers so consistency in validation are critical to success of resolution.

Nothing so complicated. I’m building and testing these devices before moving them to a different location. The ip addresses will change. But I want to make sure these are working reliably here at home on my existing network… and they are not.

donburch:
The system has not been moved to the makerspace so there are no interference issues… yet.
The iot network is in my office at home, running on my home wifi router, which is very stable.
Typically I have tea in the morning, sit down at my desk, and turn on the raspberry pi and connect the devices.
Sometimes I’ll wait an hour before checking.

Unfortunately, you have provided no actionable information. This might help: How to help us help you - or How to ask a good question

In particular, look at the examples of good and bad questions. Also use the code tags (three back ticks on a line to start and also to end) to provide YAML and logs.

Your problem statement is unsolvable by anyone in its current form. Since no one else can see what you are, you will need to be the eyes and ears of everyone here who are trying to play detective. You clearly have thought some things are unimportant (like you are testing at home, where conditions are likely different from the maker space) and left them out, but some of them are likely important.

Intermittent problems are usually hard to solve when you’re on-site, but pretty much impossible when you get inconsistent and inaccurate data about troubleshooting steps.

Start with more details of your whole setup and the environment it is actually in now, and add in a clear problem statement. Something like “when the power goes off for 30s and then comes back on, some but not all of my esphome devices no longer show up in HA (are the gone-gone or just not available?, that might need more description, since I don’t typically use HA, I use MQTT) and here the logs from HA and the problem device”

If you provide details like that you should get the help you need.

1 Like