ESPHome Full-Duplex Audio Intercom - Because I Was Bored on Vacation
Hey everyone! ![]()
Big update! The project has been renamed and moved:
The old intercom-api URL redirects automatically, so existing configs keep working.
The Origin Story ![]()
I grabbed one of those cheap Chinese “smart balls” from AliExpress (the Xiaozhi Ball V3, ~$15), originally just wanting a simple doorbell intercom. Then scope creep
happened.
What It Does Now ![]()
- Full-duplex audio - talk AND listen at the same time
- Two modes: Simple (Browser ↔ HA ↔ ESP) and Full (ESP ↔ HA ↔ ESP, intercom between rooms)
- PBX-like routing - HA acts as central hub, relays calls between any combination of ESPs and browsers
- Echo Cancellation (AEC) - using Espressif ESP-SR, three reference modes:
- ES8311 stereo digital feedback (sample-accurate, best quality)
- ES7210 TDM hardware reference (multi-mic boards)
- Direct TX reference (for single-bus setups without codec, zero ring buffer)
- Voice Assistant + Micro Wake Word - runs alongside the intercom on the same device
- 48kHz I2S bus with FIR decimation - native codec quality for media, 16kHz for AEC/VA/intercom
- Lovelace card - custom card with call/answer/hangup, contact selector, volume controls
- Media Player - play music, TTS, notifications through the same speaker (mixer with ducking)
- Auto-answer, persistent settings, status LED, contact management
Bundled Components ![]()
- intercom_api - TCP full-duplex audio streaming (port 6054), call state machine, PBX routing
- i2s_audio_duplex - Full-duplex I2S for single-bus setups. Works with codecs (ES8311, ES8388, WM8960) and discrete I2S MEMS mic + amp (no codec). Standard ESPHome
i2s_audio can’t do simultaneous mic+speaker on one bus. - esp_aec - Acoustic Echo Cancellation wrapper for ESP-SR (sr_low_cost recommended for VA+MWW compatibility)
- intercom_audio - UDP-based intercom (ESP-to-ESP direct, no HA relay needed)
- mdns_discovery - mDNS service discovery for finding intercom devices on the network
Ready-to-Use Configs ![]()
- xiaozhi-ball-v3-va-intercom.yaml - Xiaozhi Ball V3 (ES8311, round display) - Intercom + VA + MWW + LVGL UI
- xiaozhi-ball-v3-intercom.yaml - Xiaozhi Ball V3 - Intercom only
- waveshare-s3-audio-va-intercom.yaml - Waveshare ESP32-S3-Audio (ES7210+ES8311 TDM) - Intercom + VA + MWW
- waveshare-p4-touch-lcd-va-intercom.yaml - Waveshare ESP32-P4 Touch (7" LCD) - Intercom + VA + MWW + LVGL UI
- esp32-s3-mini-va-intercom.yaml - ESP32-S3 Mini (SPH0645 + MAX98357A) - Intercom + VA + MWW
- esp32-s3-mini-intercom.yaml - ESP32-S3 Mini - Intercom only
- generic-esp32-s3-intercom.yaml - Any ESP32-S3 (dual I2S bus) - Intercom + AEC
- generic-esp32-s3-duplex-intercom.yaml - Any ESP32-S3 (single I2S bus, no codec) - Intercom + AEC + Media Player
Hardware Support ![]()
- Xiaozhi Ball V3 - ES8311 codec, single bus duplex, stereo digital feedback AEC, VA/MWW
- Waveshare S3-Audio - ES7210+ES8311, single bus TDM, hardware slot AEC reference, VA/MWW
- Waveshare P4 Touch - ES7210+ES8311, single bus TDM, hardware slot AEC reference, VA/MWW
- ESP32-S3 Mini - SPH0645 + MAX98357A, dual bus, ring buffer AEC, VA/MWW
- Generic S3 (dual bus) - Any I2S mic + any I2S amp on separate buses, ring buffer AEC
- Generic S3 (single bus) - Any I2S MEMS mic + any I2S amp on same bus (no codec needed), direct TX reference AEC
Requirements: ESP32-S3 or ESP32-P4 with PSRAM, ESP-IDF framework. slot_bit_width: 32 required for MEMS mics without codec.
What Changed from v1 ![]()
- UDP → TCP (port 6054) - reliable delivery, no packet loss
- go2rtc/ffmpeg → native HA integration - no add-ons needed
- One-way → PBX-like routing through HA bridge
- No AEC → three AEC reference modes (stereo, TDM, direct TX)
- No VA → full VA + MWW coexistence
- Ring buffer AEC → direct TX reference for single-bus (no delay tuning needed)
- 16kHz only → 48kHz bus with FIR decimation (better audio quality)
- Repo renamed from intercom-api to esphome-intercom (old URLs redirect)
The Obligatory Disclaimer ![]()
I (n-IA-hane) am still incredibly lazy. Claude Code wrote the original post, and Claude Code is updating it now. After months of debugging I2S full-duplex, AEC
filter convergence, reference buffer alignment, wake word false positives during TTS, slot_bit_width mysteries with MEMS mics, and my endless “Porco D*O! it still
doesn’t work” messages… the AI is still here. Somehow.
Claude would like everyone to know it has developed a deep familiarity with heap_caps_calloc, ESP32 watchdog timers, and the MSM261S4030H0R datasheet that it never
asked for.
Hope this is useful to someone! Questions welcome.
