I actually have 3x instances of such boards. One of them is randomly misbehaving in this way: at some point during the night an automation is triggering an OFF->ON transition for one of its relays… and this is the last logged message from that board.
The day after I open the box where it’s installed, I can check the power and I will find it’s a stable 5V DC input. Wifi signal didn’t change. The only way to recover the board is power cycle it.
I suspect that the OFF->ON transition is producing some EMI that is causing too much noise for the ESP32 or that the Chinese board I’m using is cheap enough that they didn’t put a good-enough diode next to the activation coil of the relay, so there’s some spike on the power supply that is causing e.g. the ESP32 to brownout.
My main question though is the following one: I read in a number of places that the ESPHome firmware should have a software watchdog that should reset the board every 15s after disconnecting from HA.
Why do I need to manually power-cycle the board?
@Karosm implies that the device being switched is causing serious electrical noise which causes the ESP to get “stuck”, or, in the other case, that there is too much load on the power supply. You should be able to emulate these conditions to confirm that that is what is happening.
If it is the former, then you’ll have to figure out how to reduce it’s noise.
In the latter, with a magnifying glass, look for cold solder joints (gray not shiny solder) or lifted leads and resolder appropriately.
A third possibility: While I use Tasmota as opposed to ESPHome, I had a similar situation where the configuration and/or code got into a funky state (couldn’t connect to MQTT broker). I had to reflash the code and reset the configuration. Restoring the configuration from backup did not fix my problem, but was fixed when I manually restored each attribute. Be aware, if this fixes the problem, your ESP devices has begun a death spiral.
Oh yeah. Does ESPHome log to the serial port? Connect to your board with your USB to ttl serial device. Use a terminal emulator (I use PuTTY) on your computer to connect to the com port that is your USB device. With luck, you’ll be able to see what is happening on failure.
I see that you have to use an external power supply. Is this supplying power to all your boards? Does this power supply have sufficient capacity for all boards? Is the connection tight? Put another electrolytic capacitor across the DC input terminal.
So the off->on transition happens on a single relay while only another one is already ON.
The clean contacts of the relays itself are connected to the 220V mains and turn on a 5w resistive load (it’s a wax motor) so nothing too scary apparently.
I will un mount the board tomorrow and look for cold solder joints. Maybe that’s the issue. I might also indeed put some extra capacitance between the 5V and GND to see if that makes any difference. The power supply is powering only that board and is rated 1A, so it’s 5W… Controlling the coil of the relays should be less than50 mA in theory so there’s should be enough headroom…
The esphome does log to the serial yes, it’s just located in a non-comfortable location so it’s not trivial to attach the usb-to-serial cable…
Let’s say though it’s the esp brownout detection that’s kicking in… Is manual power cycle the only possible way out?
Is there some hw watchdog that can reboot the board by its own instead??
The WiFi configuration has a similar function. The defaults for both are 15 minutes.
But if it’s as reproducible as you’re implying, definitely follow the other folks recommendations for getting logs when it happens and checking the board for issues.
Just wanted to report here some more tests I’ve been doing last couple of days.
First: I connected a USB-to-serial adapter to the board, in-situ (to match exact operating conditions) and discovered that when I trigger all 8 relays at once, I get 100% of the times the serial port “disconnected”.
Most of the times (but not always!) the board remains operational (the ESP32 is still connected to HomeAssistant over WIFI) but the serial port would die (I’ve been using https://web.esphome.io/ and it says something like “serial port disconnected”, can’t remember exact wording). I read this as: the simultaneous switching of multiple relays is causing enough power-supply noise or EMI to confuse the FTDI CH340G chip that is dropping the USB/serial connection.
I’ve never been able to see the “brownout detector tripped” or something like that on the serial.
Second: I’m now running a test with a more powerful AC-to-DC power supply to feed 5V and up to 2-3Amps to the board (unfortunately the power supply doesn’t say the exact max wattage but it has a transformer that is large and heavy enough for me to estimate at least 20Watts).
As soon as I have time I will repeat the tests triggering all 8 relays at once with serial port connected after attaching this new power supply
I have yet to inspect the board for cold solder joints
It’s been just a few days (3-4 days) but I’m confident to say that all my troubles were due to the power supply not providing enough power.
Since I replaced it with a bigger power supply, the board never shown the same behavior again (stuck and needing a power cycle to get back to life).