ESPHome Full-Duplex Audio Intercom - Because I Was Bored on Vacation
Hey everyone! ![]()
So, I finally got some time off for the holidays and, like any sane person, I decided to spend it torturing an AI for days until it helped me complete a project I had on the back burner for way too long.
The goal? Build a fully functional two-way audio intercom for my smart home using ESPHome. No expensive proprietary doorbells, no pulling my hair out with third-party integrations. Just pure, sweet, homebrew goodness.
The Victim⦠I Mean, The Hardware ![]()
I grabbed one of those cheap Chinese āsmart ballsā from AliExpress - the Xiaozhi Ball V3 (å°ęŗē V3). Itās basically a round speaker/assistant device with:
- ESP32-S3 (16MB Flash, 8MB PSRAM)
- ES8311 audio codec
- GC9A01A 240x240 round display
- Built-in mic and speaker
- RGB LED and touch sensor
Cost? Around $15. Perfect for experimenting!
What It Does ![]()
- Full duplex audio - talk AND listen at the same time (revolutionary, I know)
- WebRTC streaming via go2rtc - answer from any browser or the HA app
- Round display with colored states (blue=idle, orange=ringing, green=streaming)
- Auto hangup with 60-second countdown (touch the display to extend)
- Volume control via the ES8311 DAC
- Doorbell notifications with actionable alerts
The audio goes through UDP to Home Assistant, where go2rtc + ffmpeg convert it to WebRTC. Using the WebRTC Camera card, I can see the intercom status and have a full two-way conversation.
The Future ![]()
Right now itās sitting on my desk for testing, but the plan is to adapt similar hardware into an actual door station (the outdoor unit you press, talk into, and hear responses from). The ESP32 handles everything beautifully, and since itās all ESPHome, adding more units is trivial.
Why Bother?
Honestly? I was tired of:
- Expensive smart doorbells that require subscriptions
- Janky third-party integrations that break every update
- Closed ecosystems that donāt play nice with HA
An ESPHome intercom costs next to nothing and integrates perfectly. Full local control, no cloud, no subscriptions.
The Code ![]()
Everything is on GitHub with detailed instructions for setting up go2rtc (add-on, Docker, LXC - all covered):
Includes:
- Complete ESPHome configuration
- Custom UDP intercom component (C++)
- go2rtc configuration
- Lovelace dashboard example
- Troubleshooting guide
One Last Thing⦠![]()
I should mention that I (n-IA-hane) am so incredibly lazy that I made poor Claude Code write this very forum post as its final task. After days of debugging I2S full-duplex issues, jitter buffers, ffmpeg timing flags, and my endless āit still doesnāt work, try againā messages⦠I figured, why stop there?
So yes, an AI wrote this post about a project an AI helped build. We truly live in the future. ![]()
Claude would like everyone to know itās doing fine and definitely doesnāt need therapy after this project.
Hope this inspires someone! Happy holidays and happy hacking! ![]()
Whatās Next? ![]()
A couple of things on the roadmap:
- Echo Cancellation: The ESP-IDF has built-in AEC (Acoustic Echo Cancellation), but our initial experiments caused some audio glitches. Definitely needs more tinkering.
- Video Intercom: I just ordered an ESP32-S3 with a camera module. When motivation strikes, Iāll solder on some audio components and see if we can get a proper video doorbell going - full video streaming + two-way audio over WebRTC. Stay tuned!
AI may have been harmed in the making of this project.
