AI-Thinker ESP32 Audio Kit V2.2 (ESP32-A1S) I Transformed an Intercom into AI (From Scratch)

I would like to share a personal open-source project that I published on GitHub,

using the AI-Thinker ESP32 Audio Kit V2.2 board applied to a VoIP intercom system based on SIP.

Watch the complete step-by-step video

Watch on YouTube
Channel: Home Assistant PC
Video with multilingual audio

:open_file_folder: Project Repository (GitHub):

:warning: Important: Important: This project does NOT have direct/native integration with Home Assistant.

The intercom works independently, but can be used in conjunction with automations, cards, and dashboards in Home Assistant, as demonstrated in the video.

The project focuses on the assembly, installation, and configuration of the intercom, showing in practice how it can coexist with visual automations and flows in Home Assistant. ### :wrench: What the project covers: * Ubuntu installation * Installation and use of ESP-IDF and ESP-ADF * Flashing the AI-Thinker ESP32 Audio Kit V2.2 * SIP account configuration (SignalWire) * Audio and VoIP call configuration * Physical mounting of the board in the intercom * Real-world system operation For those who want to see the project in operation, I have also published a video showing the entire process step by step: Itโ€™s a project aimed at those who enjoy ESP32, embedded Linux, VoIP/SIP, automation, and real-world projects in practice. Channel: Home Assistant PC If anyone is interested in discussing improvements, project evolution, or even a possible future integration with Home Assistant, I am available.

Your documentation says โ€œThe objective is to allow that, by pressing the call button on the intercom, the card makes a call
to a cell phone or SIP extension outside the local network.โ€

I am looking to build a true intercom between two stations. Not to a cell phone. Is this possible with this hardware?

1 Like

Good question โ€” this part of the documentation can indeed be confusing.

Yes, it is absolutely possible to build a true intercom system between two stations using this hardware (AI-Thinker ESP32 Audio Kit V2.2).
The key point is that the project uses SIP as the communication method, and SIP is not limited to mobile phones.

When the documentation mentions โ€œcalling a mobile phone or a SIP extension outside the local networkโ€, this is just an example use case, not a limitation of the hardware.

In practice, you can have:

  • Two ESP32 Audio Kit boards
  • Each one registered as a SIP extension
  • Both connected to the same SIP server (local or remote)
  • And they can call each other, working as a classic point-to-point intercom

In other words:

  • :telephone_receiver: ESP32 โ†” ESP32 โ†’ yes, it works
  • :telephone_receiver: ESP32 โ†” SIP phone โ†’ works
  • :telephone_receiver: ESP32 โ†” mobile phone โ†’ works

Regarding the hardware itself:

  • Dedicated audio codec
  • Microphone and speaker output
  • Enough processing power for real-time VoIP
  • Full SIP support via ESP-ADF

What defines who it calls is not the board, but the SIP configuration (extensions, server, dialing rules).

If needed, I can also:

  • Suggest an ESP32 โ†” ESP32 topology
  • Explain how to set this up using a local Asterisk, FreePBX, or another SIP server
  • Or update the documentation to make this clearer and avoid this confusion

This is a great question and definitely worth an extra clarification in the README :+1:

Watch the step-by-step video โ€” just click on the image.
The video includes audio in your own language.