Decoupling ESPHome device identity from hardware (idea for LTS-style stability)

TL;DR: I want LTS-style stability for my home automation. That doesn’t really exist in the HA/ESPHome world, so I’m experimenting with a setup where ESPHome devices get stable logical identities via MQTT. Posting early to get architectural feedback.


Hi all,

I’m working on a small open-source experiment called ESPro. This is not a product announcement and I’m not looking for users yet — I’m explicitly looking for feedback on the idea and architecture.

The problem that got me here

I run about a dozen ESPHome nodes for outdoor sensors and switches. Yes, I run indoor Sonoff hardware outdoors. I’m a cheap bastard. The assumption was simple: replacing a failed board would be a 15-minute job, so overall it’d be cheaper than buying outdoor-rated gear. Ideally I’d keep a stack of pre-flashed spares and just swap one in.

That assumption was wrong.

In ESPHome + HA, firmware and entity IDs are tightly coupled to hardware. When a board dies, you don’t just replace hardware — you rebuild identity. In practice, every failure costs about an hour: flash a new board, fix entity IDs, repair automations, update dashboards.

Last week it got worse. An old ESPHome config wouldn’t compile against current ESPHome anymore. That forced an ESPHome update, which pulled in Home Assistant changes, which then broke other things. A dead board turned into half a day of digital janitor work I never asked for.

What I actually want

I’ve been running Home Assistant for 8+ years. Feature-wise, I’ve been good for at least four. I don’t need new integrations or UI tweaks every month. I want stability.

“If it ain’t broke, don’t fix it.”

Security updates? Absolutely. But I want to choose when to upgrade, not be forced into a chain reaction because a sensor died in the rain.

What I’m really after is an LTS mindset for home automation. That doesn’t exist in the HA/ESPHome ecosystem — and I get why. The ecosystem optimizes for features and ease of use. That’s fine. It’s just not what I personally need anymore.

A different path

I think LTS-style stability can exist alongside the current ecosystem, not as a replacement. The key is looser coupling, with MQTT as a stable boundary.

┌───────────────────────────────────────────────────────────────────┐
│                   MQTT Bus (logical identities)                   │
└───────┬─────────────────┬─────────────────┬─────────────────┬─────┘
        │                 │                 │                 │
    ┌───▼───┐       ┌─────▼─────┐     ┌─────▼─────┐     ┌─────▼─────┐
    │ ESPro │       │zigbee2mqtt│     │   Home    │     │   your    │
    │daemon │       │           │     │ Assistant │     │   tools   │
    └───┬───┘       └─────┬─────┘     └───────────┘     └───────────┘
        │                 │
    ESPHome            Zigbee
   Native API          devices

HA stays the UI and automation engine. ESPHome stays ESPHome. Other tools can come and go. Things aren’t welded together.

First step: ESPro

The most painful issue for me is device identity. In HA, identity is effectively tied to hardware. Replace the board, lose the identity.

ESPro adds a small layer that maps logical device names (garden_pump) to physical devices (pump-aabbcc). Home Assistant only ever sees the logical name. Board dies? Flash a new one, rebind, done.

Conceptually it’s similar to what zigbee2mqtt does with device naming — but applied to ESPHome’s native API instead of Zigbee.

Current state:

  • CLI prototype
  • Discovery and registry working
  • No daemon yet (next step)

This is not meant to replace ESPHome or Home Assistant. Native ESPHome integration can happily coexist for devices you don’t want to manage this way.

Feedback I’m looking for

  • How would you design a home automation architecture that supports LTS-style deployments?
  • Critical review: why this might be a bad idea

Repo: https://github.com/sjev/espro

I’m posting this early on purpose. I’d much rather hear “this won’t work” now than after sinking a lot more time into it.

1 Like

I think there is definitely a need for LTS within the HA ecosystem.

I think this will become (if not already) an obstacle for HA to become ubiquitous for non-tinkerer’s and potential ‘service providers’. Not just for ESP.

6 month or annual LTS seems a sufficient pace.

I don’t know how I would design such LTS style, but I agree that being able to easily/simply decouple the physical from the logical is a good first step. (and again, not just for ESP)

I think this is a great idea and hope you get the feedback and support needed to achieve your goals.

edit: I do want to point out, that LTS and the decoupling of physical/logical, are not really the same or mutually exclusive goals.

In my eyes, hardware replacement is a different can of worms to desire of the base or stack remaining in a stable state.

What about reliability? For me it would cost thousands $ and hundreds of hours if my irrigation control silently stops triggering the valve relays during summer when I’m away from home…
On the other hand, most of electronics work just fine outdoors if you protect them from water, excessive heat and insects/rodents etc. and “outdoor-rated” gear often just means IP rated enclosure.

That’t obvious and can be solved by not compiling old config with new esphome.

You’re right. LTS and physical/logical decoupling are not the same thing, and I didn’t make that clear in my first post.

This is how it fits together in my head:

  • Long-term stability requires splitting the system into independent subsystems that communicate over stable interfaces.
  • That enables localized updates: per subsystem, per container, without forcing full-stack upgrades.
  • A first step in that direction is hardware abstraction. That’s what ESPro is exploring.

So ESPro is not “the LTS solution”. It’s a step towards something that could eventually be LTS-able.

I’m starting this conversation early on purpose. The goal is to discuss possible paths toward that long-term goal, not to pretend I already have it solved.

1 Like

I absolutely agree with you that an irrigation system should be much more robust. It all comes down to risk*impact calculation. In my case, all of outdoor switches operate something non-critical, like garden lights. The impact of one failing is minor annoyance worst case.
Professionally I build automation systems and when high reliability is required I use something like this: ROX Automation | Robotics Made Easy .
BTW, i considered potting Sonoff devices, but I know that potting will detune the WIFI antenna, so I decided against it.

The problem is that I currently need to compile a new version of a firmware for every new (replacement) device. The old one that failed was 4 years old. I did not have an old version of esphome compiler, so I had to fix the yaml first.
None of this would be required if I just had a pre-flashed “switch” device that I could commission instead of old one.

Of course, most of us (me included) here apply that to everything. Most furious guys use some $2 unbranded tuya switches from AE on their mains wiring to switch 16A load…

You can use pre-compiled firmware (if you saved it) or esphome installation with old version. You are not limited to just newest one, and old versions are available.

I swapped one of my esphome boards this morning. I did it like this.

Unplug device. go into DHCP server and release the ip address.
Create a bin file by compiling the old firmware.
Load new bin on new harware.
Go to HA integrations page, where it asks me to update the authorisation, after doing this it asks if this is a replacement for old board, I say yes and it over writes the old setup with the new hardware ID stuff. Why do you need this, you will only need spare boards no need to have flashed boards in stock.

Great that this works for you — genuinely. For a single device on a well-maintained setup, that workflow is perfectly valid.

I’m coming from a different angle though: I’m trying to reason about what a professional-grade replacement workflow should look like, not just what works for an experienced user.

In your flow, a failed device still means touching three systems:

  • the network layer (DHCP / IP reuse → implicit identity binding),
  • the device (rebuilding and flashing firmware),
  • Home Assistant (re-auth, replacement, entity rebinding).

That’s manageable as a power user, but it’s far from what installers usually accept in the field.

In most industrial or building-automation systems, replacement looks more like:
install device → pair → commission.
No firmware rebuilds, no DHCP juggling, no cross-system coordination. Ideally you touch one system.

I also want to pre-flash hardware in batches and keep shelf-ready spares. That’s normal in professional contexts, and it doesn’t fit a “compile on failure” model.

So yes — your approach works in your context and probably for the majority of current HA / ESPHome users. But I wonder how many users wish for something better. I am in any case .

So I’m trying to push the workflow toward something closer to installer-grade: predictable, repeatable, and with hardware replacement reduced to a single, well-defined step.

a theoretical dedicated ‘replace device’ function in HA might be something that would be very useful, if it can work easily and consistently across any manner of devices, it could be an alternative to implementation of a whole abstraction layer.

Yes, that would be very useful. I think that ideally, when a device is unavailable and an unknown device of same type is discovered the user should have an option to simply click “replace by ”. That is UX-wise.
Because of great number of integrations and techologies used, behind the screens an abstraction layer (or multiple ones) would be required to make this possible.
I’ll dive a bit deeper into HA architecture, but if someone has ideas on how to create something like this, very interested.

Maybe decouple your physical device so intimately bound to ESPHome and get a device that uses a higher protocol such as MQTT, where you configure that replacement device to send the same MQTT packets as the device it replaces? HomeAssistant (at a higher OSI layer that isn’t touched) never knows, Mosquitto at the lower OSI level still sending it the same data. Mosquitto doesn’t know, as it gets the same MQTT data arriving from the new device. It happens at a lower layer, where it should.

ESPHome is very powerful - it cuts across many layers of the OSI model, and is always changing to support new devices and fix errors This can also be a drawback as you have found.

In your install device → pair → commission model, the MQTT configuration should be the thing you need to maintain to achieve your goals. Your network configuration depends on mDNS or futzing around anyway - nothing different there.

Leave everything else on the other OSI layers alone. No LTE releases needed.

Look at Tasmota. The MQTT configuration is done from the web interface (and can be done programmatically too). No need to recompile the underlying firmware (that also happens to change frequently) just to update the MQTT parameters. You plug your new device in, attach to the default 192.168.1.4 address and configure everything from the web pages shown from a browser, or just cut and paste a pre-configured script to the console page like I do, similar to the one I ran to backup the original configuration when it was deployed. Same as yaml for ESPHome, come to think of it. You do have a well maintained backup library for all your ESPHome devices I hope? Both the source yaml and the generated firmware from the compiler, ready to flash to a replacement chip/SOC?

Your hardware abstraction is just the bottom layer of the OSI model, already a well established concept.

This is my largest gripe with esphome. Look at zwavejsui for solution. But this does not require a full rebuild. It is in the api(?) used between HA/ESPhome or HA/ZUI. I do not know low level of how HA/ZUI interact but if you use it you will see that entity ID is based fully on the device name. If you replace a device you only need give it same name in ZUI and HA will substitute it. Really esphome is similar but due to yaml file organization structure it is not obvious for users. Additionally quirks such as esphome adopting a default name for the yaml the esphome organization becomes messy even if you try to enforce a naming/organizational scheme.

I feel in this instance it would be more useful to push epshome in proper direction vs putting effort toward something new.

And ESPhome uses MQTT if you desire vs API.
just dont use API if you feel MQTT is more reliable long term.

That is our love with ESPHome - it doesn’t just push across the OSI layers, it absolutely bulldozers them! The power to do that at whim is often needed as the other abstract layers often don’t exist or are poorly documented, or (sadly) most often, don’t actually work.

With enormous power comes wisdom, organisation, and responsibility grasshopper.

And verified backups you can revert to…

I agree with the abstraction: freezing identity at the MQTT layer makes sense.

Where I differ is the workflow bar.

I looked at Tasmota. Conceptually it’s one system, but in practice replacement still means connecting to an AP, opening a web UI, and reconfiguring things. That’s already too awkward for me.

I also don’t want to touch firmware that’s working. Reflashing everything just to improve replacement later isn’t acceptable once a system is deployed.

The bar I’m aiming for is simple: a device dies, you plug in a replacement, and someone non-technical (my girlfriend, eventually) can finish the job without YAML, firmware rebuilds, or docs. My GF should be able to do that :wink:

1 Like

I understand that point. I did consider working on ESPHome itself, even forking it.

I decided against it because long term I plan to phase out ESPHome devices in favor of Zigbee. With the current Sonoff range plus zigbee2mqtt, that’s simply a better fit for my use case than ESP8266 + Wi-Fi.

That said, I still need to support my existing ESPHome devices for many years. ESPro is meant to plug that gap, with minimal required effort.

What does your girlfriend do when she loses her phone?

Please explain all steps in detail.

Step one. She buys a new phone.
Step two. She turns it on.
There is no step 3 - it just works, just like the old one.

Sorry, the real world is different, unless her old phone was made from bakelite and has chrome finger stops for dialling.

She has to know about SIM cards, even their sizes, or pseudo ones. Passwords, not just to her phone, but all the apps and clouds she is connected to. How to track and remotely wipe the old phone. How to restore. Face recognition. Fingerprints. Even then she probably lost some contacts that were save to the SIM card and photos to the MicroSD card you installed, or weren’t synchronised yet. Complicated? Painful? Millions do it, daily.

Sounds similar to your replacement IOT device. Plug it in, reconfigure it by restoring from backup, and it just works, as before.

Even your Rox unit you use as an example needs to have the address configured somehow. DIP switch, remote programmer, something. Your John Deere tractor, with a field technician helicoptered in?

You do do backups of all your critical devices, don’t you? Ones you have verified you can restore from? How much is downtime worth? Do you have a field programmer? Able to access your device firmware files conveniently? Their configuration parameters also? Why not?

Vatican bank motto: Jesus saves, so should you.

Backlog command. Paired with the Status command. Often overlooked. Extremely powerful. Can even be invoked from the device url.

Imagine a QR Code sticker on each device with the correct parameters already defined, able to be executed on a fresh device paired to the default 192.168.1.4 IP Address? Print a kiss or rose on the sticker, just to make your girlfriend smile when she scans it.

Connect to the default SSID, scan the QR code on your phone and press the execute button. Don’t forget to move the configuration label to your replacement device. A bit like Matter devices, but girlfriend friendly? You could even pay an Uber driver to do it for you, and spend quality time with your girlfriend instead.

This is exactly my point. Phones work because replacement is a first-class workflow with a single logical identity and guided restore. Hardware is disposable; the user is walked through the process.

ESPHome + HA evolved differently. They grew organically, and hardware replacement simply wasn’t a concern early on. That’s understandable.

My take is that we’ve reached the point where this is a real problem now, and the workflow hasn’t caught up yet.

1 Like

For ROX ICU the replacement workflow is explicit and guided: put the device on the CAN bus, it appeares as ‘uncomissioned’. Commission a CAN ID with a tool, done — no firmware change or backup required.
CANopen supports this model well, which is exactly the kind of commissioning workflow I’m missing on the ESPHome side.

Great point about a QR-based, easy workflow — I like it. And even more, I like the idea of a smiling girlfriend. Happy wife, happy life :wink: