I would like to share a personal open-source project that I published on GitHub,
using the AI-Thinker ESP32 Audio Kit V2.2 board applied to a VoIP intercom system based on SIP.
Watch the complete step-by-step video
Channel: Home Assistant PC
Video with multilingual audio
Project Repository (GitHub):
Important:Important: This project does NOT have direct/native integration with Home Assistant.
The intercom works independently, but can be used in conjunction with automations, cards, and dashboards in Home Assistant, as demonstrated in the video.
The project focuses on the assembly, installation, and configuration of the intercom, showing in practice how it can coexist with visual automations and flows in Home Assistant. ### What the project covers: * Ubuntu installation * Installation and use of ESP-IDF and ESP-ADF * Flashing the AI-Thinker ESP32 Audio Kit V2.2 * SIP account configuration (SignalWire) * Audio and VoIP call configuration * Physical mounting of the board in the intercom * Real-world system operation For those who want to see the project in operation, I have also published a video showing the entire process step by step: Itโs a project aimed at those who enjoy ESP32, embedded Linux, VoIP/SIP, automation, and real-world projects in practice. Channel: Home Assistant PC If anyone is interested in discussing improvements, project evolution, or even a possible future integration with Home Assistant, I am available.
Your documentation says โThe objective is to allow that, by pressing the call button on the intercom, the card makes a call
to a cell phone or SIP extension outside the local network.โ
I am looking to build a true intercom between two stations. Not to a cell phone. Is this possible with this hardware?
Good question โ this part of the documentation can indeed be confusing.
Yes, it is absolutely possible to build a true intercom system between two stations using this hardware (AI-Thinker ESP32 Audio Kit V2.2).
The key point is that the project uses SIP as the communication method, and SIP is not limited to mobile phones.
When the documentation mentions โcalling a mobile phone or a SIP extension outside the local networkโ, this is just an example use case, not a limitation of the hardware.
In practice, you can have:
Two ESP32 Audio Kit boards
Each one registered as a SIP extension
Both connected to the same SIP server (local or remote)
And they can call each other, working as a classic point-to-point intercom
In other words:
ESP32 โ ESP32 โ yes, it works
ESP32 โ SIP phone โ works
ESP32 โ mobile phone โ works
Regarding the hardware itself:
Dedicated audio codec
Microphone and speaker output
Enough processing power for real-time VoIP
Full SIP support via ESP-ADF
What defines who it calls is not the board, but the SIP configuration (extensions, server, dialing rules).
If needed, I can also:
Suggest an ESP32 โ ESP32 topology
Explain how to set this up using a local Asterisk, FreePBX, or another SIP server
Or update the documentation to make this clearer and avoid this confusion
This is a great question and definitely worth an extra clarification in the README
Watch the step-by-step video โ just click on the image.
The video includes audio in your own language.