ESPHome Intercom Native
Full-duplex audio intercom, PBX-lite calls, Voice Assistant, Micro Wake Word and Home Assistant softphone
Hey everyone ![]()
I want to introduce ESPHome Intercom Native, an ESPHome + Home Assistant project for building local full-duplex audio intercom devices.
Repository:
The goal is simple: turn ESP32 audio boards into real local intercom endpoints that can call each other, call Home Assistant, receive calls from Home Assistant, and
coexist with Voice Assistant, Micro Wake Word, media playback and touchscreen UIs.
This is not just a doorbell demo anymore. The project now behaves like a small PBX-lite system for ESPHome devices.
What It Can Do
With supported hardware, you can build devices that support:
- full-duplex intercom audio
- ESP-to-ESP calling
- ESP-to-Home Assistant calling
- Home Assistant-to-ESP calling
- Lovelace browser/app softphone
- TCP and UDP audio transports
- direct peer-to-peer calls where possible
- Home Assistant bridging/routing where needed
- echo-clean microphone stream for Voice Assistant and Micro Wake Word
- barge-in while TTS or media playback is active
- Acoustic Echo Cancellation
- Espressif Audio Front-End processing
- Voice Assistant
- Micro Wake Word
- media player playback
- LVGL touchscreen interfaces
- Do Not Disturb
- call decline / busy / timeout / hangup reasons
- protocol-aware phonebook
- mobile notification answer / decline flow
The simple use case is still simple: one ESP at the door and one Home Assistant dashboard card.
The larger model exists so that the simple use case is clean, reliable and extensible instead of being a pile of special cases.
PBX-lite Call Model
Every ESP device is treated as an independent phone extension.
Each device can:
- originate a call
- receive a call
- answer
- decline
- hang up
- expose its current call state
- expose the selected destination
- report why a call ended
Home Assistant is also treated as a peer in the same system. It can act as a softphone, a bridge, or an optional central router.
This makes it possible to support both direct ESP-to-ESP calls and Home Assistant-routed calls without changing the user-facing model.
Home Assistant Integration
The project includes a custom Home Assistant integration: intercom_native.
It provides:
- TCP listener
- UDP socket manager
- browser WebSocket audio for the Lovelace card
- protocol-aware phonebook sensor
- call / answer / decline / hangup services
- call forwarding
- TCP ↔ UDP bridge logic
- call state tracking
- reason propagation
- mobile notification answer flow
The Lovelace card can work in two ways:
- as a mirror of an ESP device when the ESP is calling another ESP
- as a Home Assistant softphone when the selected destination is Home Assistant
This means the same card can be used both for monitoring ESP calls and for real browser/app audio calls.
Mobile Answer Flow
ESP-originated calls can be answered from the Home Assistant Companion app.
The intended flow is:
- ESP starts a call
- Home Assistant sends a mobile notification
- Answer opens the dashboard with the intercom card
- the card requests microphone access
- full-duplex browser/app audio starts
- Decline calls the native decline service
This makes the system usable as a practical door/intercom endpoint, not only as a dashboard experiment.
Audio Stack
The audio stack was heavily reworked in 2026.6.0.
The old custom duplex path has been replaced by the new esp_audio_stack.
The new stack is built around native Espressif / ESP-IDF audio components:
esp_driver_i2sfor official I2S channel ownershipesp_codec_devfor codec-backed boardsgmf_io/io_codec_devfor codec IOesp_audio_effectsfor audio format conversionesp-srfor Acoustic Echo Cancellationgmf_ai_audio/esp_gmf_afe_managerfor the full Audio Front-End pipeline
This moves the project closer to Espressif’s native audio ecosystem and reduces the amount of custom low-level audio code that has to be maintained inside the
project.
Echo-clean Microphone Path and Barge-in
One of the most important parts of the project is that the microphone stream exposed to the application layer can be echo-cleaned.
This means that Voice Assistant, Micro Wake Word and intercom consumers do not have to listen to the raw microphone while the device speaker is playing.
Instead, with the AEC / AFE profiles, the stack uses the speaker reference to remove the device playback from the captured microphone signal.
In practice, this makes these use cases possible:
- wake word can keep listening while music or TTS is playing
- the user can interrupt the assistant while it is speaking
- Voice Assistant receives mostly the user voice, not its own speaker output
- Micro Wake Word can be retriggered without having to fully stop playback first
- intercom audio is cleaner because local speaker echo is reduced before it reaches the network/application layer
This is especially important for real full-duplex devices: the microphone is not just “recording audio”, it is feeding an application-ready stream designed to
coexist with playback.
AEC and AFE Profiles
The project now has clearer audio profiles.
esp_aec
Lightweight Acoustic Echo Cancellation.
Best suited for:
- intercom-only devices
- Generic ESP32-S3 builds
- smaller flash layouts
- users who want echo cancellation without the full AFE cost
esp_afe
Full Espressif Audio Front-End pipeline.
It can provide:
- Acoustic Echo Cancellation
- Noise Suppression
- Automatic Gain Control
- Voice Activity Detection
- Speech Enhancement / Blind Source Separation on supported dual-mic boards
This is heavier, but it is the best direction for more capable voice devices.
Supported Audio Layouts
The maintained profiles cover several hardware shapes:
- single-bus codec boards
- single-bus no-codec boards
- dual-bus MEMS mic + I2S amplifier boards
- ES8311 codec boards
- ES7210 + ES8311 TDM boards
- dual-mic AFE boards
- Generic ESP32-S3 AEC profiles
- larger full AFE profiles
- ESP32-P4 touchscreen profiles
Codec-backed devices use esp_codec_dev.
No-codec devices use official esp_driver_i2s channels directly.
TCP and UDP Transports
The project supports two intercom transports:
- TCP: framed signaling and audio over one reliable connection
- UDP: raw PCM audio plus separate framed control channel
Default ports:
- TCP signaling/audio:
6054 - UDP audio:
6054 - UDP control:
6055
TCP and UDP devices can coexist on the same network.
If two peers use the same transport, they can call directly.
If transports differ, Home Assistant can bridge the call.
Protocol-aware Phonebook
Home Assistant publishes one logical phonebook, and ESP devices normalize it locally depending on their transport.
This allows the same user-facing contact list to support:
- TCP peers
- UDP peers
- Home Assistant softphone peer
- direct calls
- bridged calls
- optional HA-as-PBX routing
The result is much easier to reason about than maintaining separate transport-specific phonebooks.
Maintained Device Direction
Current maintained focus:
- Waveshare ESP32-S3 Audio Board
- Spotpear Ball v2
- Generic ESP32-S3 single-bus audio boards
- Generic ESP32-S3 dual-bus intercom boards
- Waveshare ESP32-P4 Touch profiles
Generic YAMLs are meant as starting points.
Pins, codec wiring and board-specific details still need to match your hardware.
Ready-to-use YAMLs
The repository includes ready-made YAML profiles for:
- intercom-only devices
- full-experience AEC devices
- full-experience AFE devices
- TCP transport
- UDP transport
- single-bus boards
- dual-bus boards
- selected touchscreen boards
Users can download a maintained YAML and let ESPHome fetch packages, assets and external components directly from the repository.
Current Release
Current release:
2026.6.0
Minimum versions:
- ESPHome
2026.5.xor newer - Home Assistant Core
2026.5.0or newer
The 2026.6.0 release includes:
- the new Espressif-based audio stack
- Intercom Native integration polish
- versioned Lovelace card loading
- better unavailable-device handling
- safer browser softphone cleanup
- mobile notification answer flow
- Generic AEC / AFE profile split
- improved runtime and memory behavior
Release notes:
Getting Started
Repository:
Recommended path:
- Install the Home Assistant integration through HACS.
- Restart Home Assistant.
- Choose the YAML closest to your board.
- Configure your pins, secrets and peer names.
- Compile with ESPHome.
- Flash the device.
- Add the Lovelace card to your dashboard.
If you are using custom YAMLs from old versions, start from a current maintained YAML and reapply your board-specific changes.
Project Status
The project is actively maintained and tested on real hardware.
The current focus is no longer “can this work at all?”, but making the stack cleaner, more maintainable and closer to native ESP-IDF audio behavior.
Hardware audio is still hardware audio: correct pins, codec wiring, PSRAM availability, flash size and board layout matter. But the maintained profiles are now built
around a much more solid architecture than the early versions.
Testing reports, hardware-specific fixes, board confirmations and feature requests are welcome.
