šŸ”” ESPHome Full-Duplex Audio Intercom

Just posted to say that this is an outstanding component and already integrated into my doorbell project to replace my Ring pair. Currently using one ESP32-s3 to drive the camera, one to drive the audio and controls though I’ll try and bring it all back into a single device in the near future. I didn’t use go2rtc, just streaming the mic direct from the board. Many thanks!

1 Like

@meconiotech Can you add ESP32P4 to the check in esp_aec/init.py?
I can’t send you a PR because github won’t allow me to fork your repo directly since I already have it forked via @will_santana 's fork.

@gtjoseph , I’ve merged your PR.

I’ll find some time to test it too before submitig the PR to @meconiotech

1 Like

@meconiotech you just read my mind about the dynamic AEC. It would save some resources disabling it once there’s nothing playing or announcing, but still very important to be enabled during those to improve the mic detection.

I’d extend the mic component adding actions like aec_start and aec_stop

Really exited with this project and with the possibility do finally ditch my echos for good

Another upgrade that should be awesome is the possibility to use the speaker at 48000 and delagate the resampling to the speaker mixer component. so voice calls and announcements could be used at lower sample rates and media would play at it’s best

Don’t know if the sampling rate MUST be equal down the hardware level, but there’s no problem in the config. Maybe because it assumes separate buses. But, as long as I know, the clock pins in the esp32 i2s are virtual, and only the data in and out pins gotta be phisical (for obvious reasons). So, maybe allowing mic and speaker separate configurations in the i2s_audio_duplex component could allow this feature

In time, thinkin in buying one of these for testing

https://pt.aliexpress.com/item/1005010271067848.html

Is this the one you’re using, @gtjoseph?

I’ve merged your PR and tested just fine

The voice assist pipeline works perfectly, but it breaks the media pipeline as shown below:

I’ll try to work on it, but my work schedule is a quite full today

Hi, my experience:

DIY
DAC MAX98357
Mic INMP441


Intercom-Mini config exept changes here:

esp_aec:
  id: aec_processor
  sample_rate: 16000
  filter_length: 8     # 4 = 64ms tail (good balance of quality vs CPU)
  mode: VOIP_HIGH_PERF  # Optimized for real-time voice, lower CPU than default

Pretty well Audioquality.
Even good Echo cancelation.

Spotbear Ball V2


Bad audio quali.
Cracking mic even with no one speaking.
I think its the device.

Nope, I’m using these right now…

Hmmm. I’m not sure why it would have broken the media pipeline unless you’re doing something in the yaml like stopping the speaker but not restarting it. I can play both media and announcements from HA .

Here’s the yaml config I’ve been testing with…

i2s_audio_duplex:
  id: i2s_duplex
  i2s_lrclk_pin: ${i2s_lrclk_pin}
  i2s_bclk_pin: ${i2s_bclk_pin}
  i2s_mclk_pin: ${i2s_mclk_pin}
  i2s_din_pin: ${i2s_din_pin}
  i2s_dout_pin: ${i2s_dout_pin}
  sample_rate: ${sample_rate}

audio_dac:
  - platform: es8311
    id: es8311_dac
    i2c_id: ${dac_i2c_bus_id}
    bits_per_sample: 16bit
    sample_rate: ${sample_rate}

microphone:
  - platform: i2s_audio_duplex
    i2s_audio_duplex_id: i2s_duplex
    id: ${i2s_microphone_id}
    sample_rate: ${sample_rate}

speaker:
  - platform: i2s_audio_duplex
    i2s_audio_duplex_id: i2s_duplex
    id: ${i2s_speaker_id}
    sample_rate: ${sample_rate}
    audio_dac: es8311_dac
    num_channels: 1

  - platform: mixer
    id: mixing_speaker
    output_speaker: ${i2s_speaker_id}
    source_speakers:
      - id: announcement_mixing_input
        timeout: never
      - id: media_mixing_input
        timeout: never

  - platform: resampler
    id: announcement_resampling_speaker
    output_speaker: announcement_mixing_input

  - platform: resampler
    id: media_resampling_speaker
    output_speaker: media_mixing_input

media_player:
  - platform: speaker
    name: ${media_player_name}
    id: ${media_player_id}
    task_stack_in_psram: true
    codec_support_enabled: true
    media_pipeline:
        speaker: media_resampling_speaker
        num_channels: 1
        format: FLAC     # FLAC is the least processor intensive codec
        sample_rate: ${sample_rate}
    announcement_pipeline:
      speaker: announcement_resampling_speaker
      format: FLAC     # FLAC is the least processor intensive codec
      num_channels: 1  # Stereo audio is unnecessary for announcements
      sample_rate: ${sample_rate} #Supported by Music Assistant
    files:
      - id: mute_switch_on_sound
        file: ${mute_switch_on_sound_file}
      - id: mute_switch_off_sound
        file: ${mute_switch_off_sound_file}

I cant open the link it says not availeable in my country.
I found an interesting device maybe its the one you linked:
Its called:
Farbe: JC4880P443C-I-W-Y

@tomcat I’ve seen this device today (or some kind of similar one)
But the one I saw had no mic or speaker (despite a speaker header)
So I was now suitable for a voice assistant

The one I’ve posted is this one:

It’s called ESP32-P4-WIFI6-Touch-LCD-3.4C

3.4 inch (4 inch available) round capacitive screen, dual mic and included reasonable sized speaker
With a little work and a nice 3D printed case would be a perfect replacement for an echo dot/spot
Have no hopes for replacing an echo show, but I don’t care about the video features, but some visual feedback and interaction is good
With

Ah ok saw that some days ago too.
But its very expansive ~75€.

The One I postes is around 30€ and the JC4880P443C-I-W version without enclosure and camera around 24€. Both prices with 3 € discound code.
Both seams to have a mic on board and a speaker connector.
Maybe its also possible to connect an INMP441 to it on the free gpios.
Size is very nice for a wall pannel or a doorbell I guess.
Well the speaker has to be somewhere then but shoud be possible with 3D Printed case.

Really, the JC4880P443C-I-W is way more reasonably priced. The rectangular format would be easier to work with

The only cons I see are the only mic versus the dual mic + hardware AEC from the one I’ve posted and the lack of the speaker out of the box

But it seems great for a ā€œin wall mountā€ like in a 4x2 box, but may suffer for voice assistence

PS: I could find the round one for about 60 euros, but still way pricier than the JC

Here is a config with audio and I guess this one:

is for that device too.
So for me it looks like being usable as voice assistant.
Il give it a try because I gues it could be nice controll pannel too (main reason).
WinWin if both will work :smiley:

Why you want dual mic and hardware aec?
Was INMP441 not working for you?
I have it inside this enclosure:

(But mic input to the front not back as on his pictures.
At least the quality is much batter that the one from that Spotbear Ball V2.
Just played around a bit with voice assistant but not that much until now.
But because of the quality I can hear over the mic uing it as an intercom it should be enough.

For intercom, the INMP441 works fine, but could do better

My focus, as stated before, is not intercom. That’s meconiotech’s goal
My goal is to replace my echo devices with something that can work locally and have a better interface than the decadent echos. They are getting dumber by the time and the promised Alexa+, besides being paywalled, isn’t avaiable outside de US

In my experience this weeks with the spotpear v2 and it’s INMP441, the voice detection is ok to bad, requiring some shounting or retries. And if planing to use as a media playback device, AEC is quite important. So, if it could be delegated to hw, the better. Multiple mic arrays usually do way better with noise supression too. Echo devices do so much better 'cause of this. But the low prices comes from Jeff’s deep pokets and scale economics.
Don’t get me wrong, I’m still amazed with the spotpear’s performance, specially in the price range. With the stock firmware it’s absolutelly amazing and so fluid. But of course, it can do better ($$$)

I’ll probably integrate meconiotech’s intercom in my stack because drop in calls are just amazing. Add cameras like the one in the hardware you posted and the sky is the limit, but my main goal now is efortless voice detection and media integration

I understand.
Made such experience too with the diy voice assistant but wasnt sure about the reason.
Anyway the Ball was much more dissapointing for me. Im sending it bag.
This could be interesting too:

~23-25 € @ Ali

Despite the shouting and poor sound quality (spected both for the price tag), it really impressed me

As I said, with the stock firmware and activating the MCP connection with HA, it can handle my house and requests WAY better than any of my many echo devices (have 4 different versions)

What bugs me is the cloud part. It’s not all bad, but don’t wanna be stuck with it
I use Perplexity and Eleven Labs in my pipeline now, but that’s MY choice

The only major bugs I’ve found were the lack of media playback (with that speaker almost a bonus), and the impossibility to interrupt a response by calling it againg
Upon further inspection, the last came to be because of the single bus design (despite the hardware having 2). The only project that could handle that issue was meconiotech’s, and here we are

And I guess, once this is all done, it should be merged in to esphome main, 'cause it enables so much functionality and unlocks a lot of power to such cheap devices

I think so too, you know? As far as I know, the V2 and V3 I have aren’t that different. On my V3, this problem doesn’t occur.

Don’t mean to pollute this thread but…

@tomcat I have 3 of those Waveshares…

I just turned them upside down. :slight_smile:

Camera, display and audio all work fine.

There a thread for them here…

After further investigation my problem seems to be the media player not sending the max_bit_depth to music assistant

Adding bits_per_sample to the speaker setting makes the player stream indefinitely, but still no sound and the same error in the end

Speaker, mixer, media player, mww and va configs seems to be ok
Voice replies (TTS) and wake word detections work 100% of the time after @gtjoseph PR was merged. Only music stream from music assist thats buggy

speaker:
  - platform: i2s_audio_duplex
    id: i2s_audio_speaker
    i2s_audio_duplex_id: i2s_duplex
    sample_rate: 16000
    audio_dac: es8311_dac
    num_channels: 1

  - platform: mixer
    id: mixer_speaker_id
    output_speaker: i2s_audio_speaker
    source_speakers:
      - id: announcement_spk_mixer_input
        timeout: never
      - id: media_spk_mixer_input
        timeout: never

  - platform: resampler
    id: announcement_spk_resampling_input
    output_speaker: announcement_spk_mixer_input

  - platform: resampler
    id: media_spk_resampling_input
    output_speaker: media_spk_mixer_input
  

media_player:
  - platform: speaker
    name: None
    id: external_media_player
    task_stack_in_psram: true
    #codec_support_enabled: true
    volume_initial: 70%
    media_pipeline:
      speaker: media_spk_resampling_input
      num_channels: 1
      format: FLAC
      sample_rate: 16000
    announcement_pipeline:
      speaker: announcement_spk_resampling_input
      format: FLAC
      sample_rate: 16000
      num_channels: 1  # S3 Box only has one output channel

micro_wake_word:
  id: mww
  microphone: i2s_mics
  stop_after_detection: false
  models:
    - alexa
  on_wake_word_detected:
    - if:
        condition:
          voice_assistant.is_running:
        then:
          voice_assistant.stop:
          # Stop any other media player announcement
        else:
          - if:
              condition:
                media_player.is_announcing:
              then:
                - media_player.stop:
                    announcement: true
              else:
              # Start the voice assistant
                - voice_assistant.start:
                    wake_word: !lambda return wake_word;

voice_assistant:
  id: va
  microphone: i2s_mics
  media_player: external_media_player
  #speaker: announcement_spk_resampling_input
  micro_wake_word: mww
  #noise_suppression_level: 2
  use_wake_word: false
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  on_client_connected:
    - micro_wake_word.start:
  on_client_disconnected:
    - voice_assistant.stop:

Can you guys tell me the versions you’re running?

Suspecting from some release bug, specially EPSHome builder and Music assistant (but addons are hard to rollback on HA)

I’m on:
Core: 2026.1.3
Supervisor: 2026.01.1
HAOS: 17.0
ESPHome Builder: 2026.1.2
Music Assistant: 2.7.5

Installation method Home Assistant OS
Core 2026.1.3
Supervisor 2026.01.1
Operating System 15.2
ESPHome 2026.1.2
Music Assistant 2.8.0b9 but dont rely use it

1 Like