ESPHome not suited for mission critical applications?

I have been looking into using ESPHome with Sonoff devices and controlling a heating element that I would consider mission critical. ESPHome looks to be a good option since you can run the automation on the device itself instead of relying on Home Assistant and a Wifi connection. However after reading the docs and specially about the “reboot_timeout” parameter for the WiFi component, it is stated there: “note that the low level IP stack currently seems to have issues with WiFi where a full reboot is required to get the interface back working”.

This doesn’t give me much confidence in the system. If a reboot happens at the same time as a automation should run doesn’t that mean that the automation will be cancelled?

And surely if this is a very known bug then it should be easy to find and fix?

Get to it then!

If you design an HA automation correctly, it shouldn’t face that issue.

And what is the correct way to design HA automation that has reliability?

I run esphome on mission critical devices for past 2 years. Never had an issue.

I sure would like to but I don’t have any Arduino hardware and it would probably take me a month to learn the ins and outs of Arduino and ESPHome programming before I could start to debug this bug, and I just don’t have that time available just to get a working scheduling system. I would probably just look into other HA systems and some other hardware instead.

But you want someone else to do it for you?

Well everything that relies on auto generated code is definitely not suited for critical stuff, if you want more precise and optimal stuff you should code it yourself.

It is definitely suited for the usual diyer that was usually spending days/weeks of coding to get simple things done vs 5 minutes nowadays thanks to Esphome.

Check if the state is unavailable, and if it is, wait until it has finished rebooting.

Yes I have a number of lights and fans that it should work very well for and are not important if something stops once in a while. I am not trying to be disrespectful here, I fully understand this is an open source project and it is what it is. I am just trying to understand how others have dealt with things like these before, especially since I don’t see anything like retrial counter or system failure notifications in automations so I am just curious how people are handling situation like those.

Yes although if this is called from HA it depends on the wireless connection working and there is also a possible race condition if you happen to check a microsecond just before the reboot.

@kristjanbjarni do you have solid wifi? because this only matters if you don’t.
if you do, set the time out to 0 and never worry about it again.
if you don’t have solid wifi, then you should address that before you try to craft a “mission critical” project that is dependent on wifi. would also suggest if it’s a critical application, that you set up a monitor for the device in case it does go offline unexpectedly for a period of time.

Use wired if it is that critical.

1 Like

Do you need “real time” control of the heating element; that is, with bounded delays in the control loop? Can your system survive a few seconds of interruption, where the heating element is turned off (or left turned on) when it should be in the other state? If you can quantify these sorts of bounds and can live with a few seconds of interruption, you might be OK.

Of course, if the WiFi network fails (or the wired network), or Home Assistant crashes, or the host fails… lots of failure modes to consider. If this is a safety issue if the heater is left on too long, you probably want to include some strictly electrical mechanism if the software control fails. Minimally, a thermal fuse that interrupts power to the heater. Maybe some watchdog timer that drops the relay controlling the heating element if it’s been energized too long.

I wouldn’t think of this as an ESPHome problem; in my experience, Home Assistant will have a longer interruption when it restarts than ESPHome and you’ll want some mechanism to deal with that, as well. Or just more generally in the face of a variety of system faults.

This comment has been in the esphome docs for a long time. I am unsure whther it is still true.

This is actually a heating wire element for a water pipe that is outside to make sure that it doesn’t freeze solid when it’s cold. So what I was thinking about was to have it to be automated with outside temperature sensor. This doesn’t need to be in real time there would just be a cutoff temperature for on and off and a 10 to 30 minute delay should not be a big deal, but what is critical is that it must turn on when it’s freezing otherwise there is a chance of the pipe bursting and needing to be replaced.

What I will probably end up doing is to have a switch with a temperature sensor that will handle this, and with a default state of on when rebooting and then to have a monitor in HA to monitor the switch and notify if there is a disrepency.

You could still do it with Esphome and make it completely independent from HA automations, set reboot_timeout to 0 so it never reboots when wifi is lost, and add a temperature sensor to your device so it does turn on/off based on the temperature reading.

api:
  password: !secret api_passwd
  reboot_timeout: 0s

OR, a use the good old method of protecting outside plumbing, leave the tap slightly open at the end of the pipe so it makes drops every few seconds so the water doesn’t have time to freeze completely.

1 Like

I’m not sure if anyone has mentioned this, but you could disable the rebooting.
(Wow our replies happened at the same time)

1 Like

Simply have an automation that notifies you if the node is offline for x minutes.