🔔 ESPHome Full-Duplex Audio Intercom

Core 2026.1.3
Supervisor 2026.01.1
Operating System 17.0
Frontend 20260107.2

I don’t use ESPHome Builder as I build on my host machine but I’m running the same 2026.1.2 locally.

I don’t use Music Assistant at all.

Update: v2.0.1 Released - TCP Architecture + Improved AEC

What’s New in v2.0.1

This release focuses on echo cancellation quality, especially for the Xiaozhi Ball V3 (ES8311 codec).

ES8311 Digital Feedback AEC

The AEC in v2.0.0 worked but wasn’t great. The ring buffer approach for the speaker reference signal had inherent timing issues - the delay between “what the speaker
played” and “when the mic heard the echo” would slowly drift.

The fix exploits an ES8311 hardware feature: register 0x44 can be configured to output both the DAC playback and ADC recording on the same I2S data line as a
stereo signal:

  • L channel = DAC loopback (exactly what the speaker is outputting)
  • R channel = ADC input (what the microphone hears)

The i2s_audio_duplex component reads this stereo frame, splits L/R, and feeds both to the AEC. The reference is now sample-accurate - same I2S frame, zero
timing drift. The improvement in echo cancellation quality is dramatic.

i2s_audio_duplex:                                                                                                                                                    
  id: i2s_duplex                                                                                                                                                     
  i2s_lrclk_pin: GPIO45                                                                                                                                              
  i2s_bclk_pin: GPIO9                                                                                                                                                
  i2s_mclk_pin: GPIO16                                                                                                                                               
  i2s_din_pin: GPIO10                                                                                                                                                
  i2s_dout_pin: GPIO8                                                                                                                                                
  sample_rate: 16000                                                                                                                                                 
  aec_id: aec_component                                                                                                                                              
  use_stereo_aec_reference: true   # ES8311 digital feedback                                                                                                         
  aec_reference_delay_ms: 10       # sample-aligned, minimal delay       

You also need to configure the ES8311 register via I2C on boot:

esphome:                                                                                                                                                             
  on_boot:                                                                                                                                                           
    priority: 600                                                                                                                                                    
    then:                                                                                                                                                            
      - lambda: |-                                                                                                                                                   
          uint8_t data[2] = {0x44, 0x48};  // ADCDAT_SEL = DACL+ADC                                                                                                  
          id(i2c_bus).write(0x18, data, 2);                                                                                                                          

Devices with separate mic/speaker (INMP441 + MAX98357A) still use the ring buffer approach, which also received timing fixes in this release.

Multi-Listener Support (Voice Assistant Coexistence)

Thanks to a contribution from willrnsantana · GitHub, the i2s_audio_duplex microphone and speaker platforms now support reference counting. Multiple
components can request mic/speaker access simultaneously - the hardware only stops when all listeners have released it.

Combined with the new MicrophoneSource pattern (using MicrophoneSource* instead of Microphone*), this lays the groundwork for running voice_assistant and
intercom_api on the same device.

Disclaimer: The reference counting infrastructure is merged and working for intercom use, but I haven’t personally tested simultaneous operation with voice_assistant
yet. If anyone tries it, I’d love to hear how it goes!

Other Improvements

  • Bridge cleanup fix - HA auto-bridge between ESPs now properly cleans up when calls end. Previously, hanging up could leave a “ghost bridge” that blocked future
    calls.
  • Pre-AEC mic attenuation - Configurable attenuation for hot mics (ES8311 at high gain) applied before AEC processing, preventing clipping from breaking echo
    cancellation.
  • AEC always-on - Removed the RMS threshold that was toggling AEC on/off, which caused audio discontinuities (“pr-o-va” instead of “prova”).

Hardware Tested

│         Device         │     Mic/Speaker     │           AEC           │         Notes         │                                                                   
├────────────────────────┼─────────────────────┼─────────────────────────┼───────────────────────┤                                                                   
│ Xiaozhi Ball V3 (~$15) │ ES8311 codec        │ Stereo digital feedback │ Best AEC quality      │                                                                   
├────────────────────────┼─────────────────────┼─────────────────────────┼───────────────────────┤                                                                   
│ ESP32-S3 Mini          │ INMP441 + MAX98357A │ Ring buffer             │ Good, improved timing │                                                                   
└────────────────────────┴─────────────────────┴─────────────────────────┴───────────────────────┘                                                                   

Links

Just sent you a message on tele

The semaphore contribution was @gtjoseph’s contribution

@meconiotech There appear to be several issues with your recent pushes…

  • Part of the change to keep the audio task alive that was implemented in i2s_audio_duplex.cpp is missing. The result is that when all microphone users have left, the audio task stops even if there are active speaker users and vice versa.
  • The microphone and speaker components aren’t applying audio defaults so they fail to compile.
  • The esp_aec seems to completely disable the microphone, at least for me.

Finally, It’s really not cool to take the contents of other people’s commits, like the commit I pushed to @will_santana , and recommit them under your own name. I’m hoping it was an honest mistake or just got lost in the transfer from @will_santana . It’d be nice to know that If I should send you a few PRs in the future they’ll be properly credited.

You’re right about everything. I apologize. I’m not very familiar with GitHub yet. I’m getting help from Claude Code, who sometimes doesn’t understand anything, but it saves me a lot of time. Let me fix everything. Unfortunately, I realize I only have two devices on which to test the changes in my existing ecosystem. Changing the reference for AEC has finally started working properly after months, even better than on dual-bus devices. I was just starting to test coexistence with Voice Assistant and MWW.

Ah, I gotcha. I can send you a PR to fix the audio defaults plus another one to enable the AEC for the ESP32P4.

I sincerely apologize. The reference counting code was yours and it should have been properly attributed from the start. I’ve
now fixed the commit history — the rc9 commit correctly shows your Co-Authored-By along with @willrnsantana’s:

v2.0.1-rc9: Reference counting for multi-listener support · n-IA-hane/intercom-api@811bd16 · GitHub

It was not intentional — I was moving fast between branches and didn’t handle the attribution properly. That’s on me, no excuses. Your contribution made
multi-listener support possible, which is foundational for voice_assistant coexistence. Thank you for that work.

Regarding the bugs you reported:

  • Audio task termination when all mic users leave: Your original implementation in start_mic()/stop_mic()/start_speaker()/stop_speaker() handles this correctly — the
    parent only stops when BOTH mic and speaker have no active users. I’ll review if the v2.0.2 refactor accidentally broke this.
  • Microphone/speaker compilation failures with audio defaults: I’ll look into this — likely missing default parameters after the code style refactor.
  • AEC disabling the microphone: This was caused by a missing #include “esphome/core/defines.h” — the v2.0.2-rc1 refactor removed a transitive include that carried
    the USE_ESP_AEC define. Already fixed in v2.0.2.

PRs are absolutely welcome and will be properly credited. I’d love the ESP32-P4 AEC PR and the audio defaults fix you mentioned. Sorry again for the attribution
issue.

No worries! I figured it was just part of the confusion. I pushed up the two PRs but I see you already took care of it so I’ll close the PRs.

It looks like everything’s working! I was able to enable the stereo aec reference and so far, so good. The only nit is that the AEC status messages every 500 frames should probably be less frequent and debug level instead of info level. :slight_smile:

Thank you for all the hard work you’ve put into this!

EDIT: And… I just tested playing the wake word through the media player and it did NOT trigger MWW so that’s a big win!

I’ve fixed the speaker and mic with v2.0.1

But suffer from the same issue as @gtjoseph. Once AEC is enabled, it “kills” the mic saying the result from input-output=0

Right now, VA and MWW works 100% if you point the VA to the speaker_duplex

If it points to a speaker_media_player, it only prints an error with no further clue (even in very verbose mode)
Any help with that is aprreciated

I’ll commit my semi-working yaml to my second repo, but it’s down here too:

substitutions:
  name: esphome-web-deba9c
  friendly_name: "Alexa do escritório"

## v1.07 26-jul-2025 #############################################################################################################################
## changed how to play startup sound to avoid double triggering
## v1.06 19-jul-2025 #############################################################################################################################
## Added: Startup sound when connected to HA, optional with switch, and option to select other sounds in settings below.
## Added: Mute and Playing icon to all emos.
## v1.05 18-jul-2025 #############################################################################################################################
## Fixes: minor bugfixes, sound level to max, show muted mic when playing music or playing TTS, not when playing internal sounds.
## v1.04 13-jul-2025 #############################################################################################################################
## Added: optional wake sound, delays the time before listening, so optional switch in HA.
## Moved: show_text and show_battery_status to switches in HA.
## v1.03 08-jul-2025 #############################################################################################################################
## Added: optional show text boxes
## v1.02 30-jun-2025 #############################################################################################################################
## Added optional Battery Status

## SETTINGS ######################################################################################################################################

  imagemodel: "Eyes" # (options are: Alfred,Astrobot,Buzz,Casita,Cybergirl,Dory,EVE,Eyes,Eyes2,GLaDOS,Girl1,Guy1,Guy2,Gwen,HA-character,Harley,Jarvis,Luffy,Mario,Max,Prime,Robochibi,Robocop,Robot,Robotgirl,Shaun)
  startup_sound: "Home_Connected" # (options are: available,Home_Connected,Computer_Ready)

  imagewidth: "240" # GC9A01A (Ball v2 & Muma & Puck) "240"
  imageheight: "240" # GC9A01A (Ball v2 & Muma & Puck) "240"
  displaymodel: "GC9A01A" # GC9A01A (Ball v2 & Puck) or ST7789V (Muma)
  invertcolors: "true" # GC9A01A/ST7789V (Ball v2 & Muma & Puck) "true"

##################################################################################################################################################

  # Hardware v2 pin mappings
  sda_pin_bus_a: "15"        # I2C Bus A SDA
  scl_pin_bus_a: "14"        # I2C BUS A SCL
  sda_pin_bus_b: "11"        # I2C Bus B SDA
  scl_pin_bus_b: "7"         # I2C BUS B SCL

  i2s_lrclk_pin: "45"        # I2S LRCLK (Word Select)
  i2s_bclk_pin: "9"          # I2S BCLK (Bit Clock)
  i2s_mclk_pin: "16"         # I2S MCLK (Master Clock)
  i2s_din_pin: "10"          # I2S Data In (Mic)
  i2s_dout_pin: "8"          # I2S Data Out (Speaker)

  speaker_enable_pin: "46"   # Speaker Enable
  touch_input_pin: "12"      # Touch interrupt
  touch_reset_pin: "6"       # Touch Reset

  backlight_output_pin: "42" # Display Backlight
  lcd_cs_pin: "5"            # Display CS (Chip Select)
  lcd_dc_pin: "47"           # Display DC (Data/Command)
  lcd_reset_pin: "38"        # Display Reset
  spi_clk_pin: "4"           # SPI Clock
  spi_mosi_pin: "2"          # SPI MOSI (Data Out)

  left_top_button_pin: "0"   # Main Button
  led_pin: "48"              # RGB LED (WS2812)
  battery_adc_pin: "1"       # Battery Voltage ADC

##################################################################################################################################################

  loading_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/loading.png
  idle_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/idle.png
  listening_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/listening.png
  thinking_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/thinking.png
  replying_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/replying.png
  error_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/error.png
  timer_finished_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/timer_finished.png
  mute_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/mute.png

  startup_sound_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/sounds/${startup_sound}.flac

  loading_illustration_background_color: "000000"
  idle_illustration_background_color: "000000"
  listening_illustration_background_color: "000000"
  thinking_illustration_background_color: "000000"
  replying_illustration_background_color: "000000"
  error_illustration_background_color: "000000"

  voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"
  voice_assist_timer_finished_phase_id: "20"

  allowed_characters: " !#%'()+,-./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWYZ[]_abcdefghijklmnopqrstuvwxyz{|}°²³µ¿ÁÂÄÅÉÖÚßàáâãäåæçèéêëìíîðñòóôõöøùúûüýþāăąćčďĐđēėęěğĮįıļľŁłńňőřśšťũūůűųźŻżŽžơưșțΆΈΌΐΑΒΓΔΕΖΗΘΚΜΝΠΡΣΤΥΦάέήίαβγδεζηθικλμνξοπρςστυφχψωϊόύώАБВГДЕЖЗИКЛМНОПРСТУХЦЧШЪЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћאבגדהוזחטיכלםמןנסעפץצקרשת،ءآأإئابةتجحخدذرزسشصضطظعغفقكلمنهوىيٹپچڈکگںھہیےংকচতধনফবযরলশষস়ািু্చయలిెొ్ംഅആഇഈഉഎഓകഗങചജഞടഡണതദധനപഫബഭമയരറലളവശസഹാിീുൂെേൈ്ൺൻർൽൾაბგდევზთილმნოპრსტუფქყშჩცძჭხạảấầẩậắặẹẽếềểệỉịọỏốồổỗộớờởợụủứừửữựỳ—、一上不个中为主乾了些亮人任低佔何作供依侧係個側偵充光入全关冇冷几切到制前動區卧厅厨及口另右吊后吗启吸呀咗哪唔問啟嗎嘅嘛器圍在场執場外多大始安定客室家密寵对將小少左已帘常幫幾库度庫廊廚廳开式後恆感態成我戲戶户房所扇手打执把拔换掉控插摄整斯新明是景暗更最會有未本模機檯櫃欄次正氏水沒没洗活派温測源溫漏潮激濕灯為無煙照熱燈燥物狀玄现現瓦用發的盞目着睡私空窗立笛管節簾籬紅線红罐置聚聲脚腦腳臥色节著行衣解設調請謝警设调走路車车运連遊運過道邊部都量鎖锁門閂閉開關门闭除隱離電震霧面音頂題顏颜風风食餅餵가간감갔강개거게겨결경고공과관그금급기길깥꺼껐꼽나난내네놀누는능니다닫담대더데도동됐되된됨둡드든등디때떤뜨라래러렇렌려로료른를리림링마많명몇모무문물뭐바밝방배변보부불블빨뽑사산상색서설성세센션소쇼수스습시신실싱아안않알았애야어얼업없었에여연열옆오온완외왼요운움워원위으은을음의이인일임입있작잠장재전절정제져조족종주줄중줘지직진짐쪽차창천최추출충치침커컴켜켰쿠크키탁탄태탬터텔통트튼티파팬퍼폰표퓨플핑한함해했행혀현화활후휴힘,?"

  font_glyphsets: "GF_Latin_Core"
  font_family: Figtree

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  min_version: 2025.5.0
  name_add_mac_suffix: false
  on_boot:
  #  - lambda: |-
  #      uint8_t data[2] = {0x44, 0x48};  # ADCDAT_SEL = DACL+ADC
  #      id(i2c_bus).write(0x18, data, 2);
    priority: 600
    then:
      - component.update: battery_voltage
      - component.update: battery_percentage
      - delay: 30s

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  cpu_frequency: 240MHz  
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"

psram:
  mode: octal
  speed: 80MHz

ota:
  - platform: esphome
    id: ota_esphome

logger:
  hardware_uart: USB_SERIAL_JTAG
  level: DEBUG

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
    ssid: "Ball v2 Hotspot"
    password: "RZ7D3EzJdPM6"

captive_portal:

external_components:
  - source:
      type: git
      url: https://github.com/willrnsantana/esphome-i2s_audio_duplex_integration
      ref: main
      path: esphome_components
    refresh: 0s
    components: [intercom_api, i2s_audio_duplex, esp_aec]

intercom_api:

button:
  - platform: factory_reset
    id: factory_reset_btn
    internal: true

sensor:
  - platform: adc
    pin: GPIO${battery_adc_pin}
    name: "Battery Voltage"
    id: battery_voltage
    attenuation: 12db
    accuracy_decimals: 2
    update_interval: 1s
    unit_of_measurement: "V"
    icon: mdi:battery-medium
    filters:
      - multiply: 2.0
      - median:
          window_size: 7
          send_every: 7
          send_first_at: 7
      - throttle: 1min
    on_value:
      then:
        - component.update: battery_percentage

  - platform: template
    id: battery_percentage
    name: "Battery Percentage"
    lambda: return id(battery_voltage).state;
    accuracy_decimals: 0
    unit_of_measurement: "%"
    icon: mdi:battery-medium
    filters:
      - calibrate_linear:
         method: exact
         datapoints:
          - 2.80 -> 0.0
          - 3.10 -> 10.0
          - 3.30 -> 20.0
          - 3.45 -> 30.0
          - 3.60 -> 40.0
          - 3.70 -> 50.0
          - 3.75 -> 60.0
          - 3.80 -> 70.0
          - 3.90 -> 80.0
          - 4.00 -> 90.0
          - 4.20 -> 100.0
      - lambda: |-
          if (x > 100) return 100;
          if (x < 0) return 0;
          return x;


touchscreen:
  - platform: cst816
    i2c_id: bus_b
    interrupt_pin: ${touch_input_pin}
    reset_pin: ${touch_reset_pin}
    id: touch_dp

output:
  - platform: ledc
    pin: GPIO${backlight_output_pin}
    id: backlight_output
    inverted: true

light:
  - platform: monochromatic
    id: Sled
    name: Screen
    icon: "mdi:television"
    entity_category: config
    output: backlight_output
    restore_mode: ALWAYS_ON
    default_transition_length: 250ms

  - platform: esp32_rmt_led_strip
    id: led
    name: RGB light
    disabled_by_default: false
    entity_category: config
    pin: GPIO${led_pin}
    default_transition_length: 0s
    chipset: WS2812
    num_leds: 1
    rgb_order: grb
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

api:

i2c:
  - id: bus_a
    sda: GPIO${sda_pin_bus_a}
    scl: GPIO${scl_pin_bus_a}
    scan: true
  - id: bus_b
    sda: GPIO${sda_pin_bus_b}
    scl: GPIO${scl_pin_bus_b}
    scan: true

esp_aec:
  id: aec_component
  sample_rate: 16000
  filter_length: 4

i2s_audio_duplex:
  id: i2s_duplex
  i2s_lrclk_pin: GPIO${i2s_lrclk_pin}      # Word Select (WS/LRCLK)
  i2s_bclk_pin: GPIO${i2s_bclk_pin}        # Bit Clock (BCK/BCLK)
  i2s_mclk_pin: GPIO${i2s_mclk_pin}        # Master Clock (optional, some codecs need it)
  i2s_din_pin: GPIO${i2s_din_pin}          # Data In (from codec ADC → ESP mic)
  i2s_dout_pin: GPIO${i2s_dout_pin}        # Data Out (from ESP → codec DAC speaker)
  sample_rate: 16000
  #aec_id: aec_component                    # Optional: link to esp_aec
  #use_stereo_aec_reference: true           # ES8311 digital feedback
  #aec_reference_delay_ms: 10
  #mic_attenuation: 0.5

audio_dac:
  - platform: es8311
    i2c_id: bus_a
    id: es8311_dac
    bits_per_sample: 16bit
    sample_rate: 16000

microphone:
  - platform: i2s_audio_duplex
    id: i2s_mics
    i2s_audio_duplex_id: i2s_duplex
    sample_rate: 16000

speaker:
  - platform: i2s_audio_duplex
    id: i2s_audio_speaker
    i2s_audio_duplex_id: i2s_duplex
    sample_rate: 16000
    bits_per_sample: 32 bit
    audio_dac: es8311_dac
    num_channels: 1

  - platform: mixer
    id: mixer_speaker_id
    output_speaker: i2s_audio_speaker
    source_speakers:
      - id: announcement_spk_mixer_input
        timeout: never
      - id: media_spk_mixer_input
        timeout: never

  - platform: resampler
    id: announcement_spk_resampling_input
    output_speaker: announcement_spk_mixer_input
    bits_per_sample: 16

  - platform: resampler
    id: media_spk_resampling_input
    output_speaker: media_spk_mixer_input
    bits_per_sample: 16
  

media_player:
  - platform: speaker
    name: None
    id: external_media_player
    task_stack_in_psram: true
    codec_support_enabled: true
    volume_initial: 70%
    buffer_size: 6000
    media_pipeline:
      speaker: media_spk_resampling_input
      num_channels: 1
      format: FLAC
      sample_rate: 16000
    announcement_pipeline:
      speaker: announcement_spk_resampling_input
      format: FLAC
      sample_rate: 16000
      num_channels: 1  # S3 Box only has one output channel

micro_wake_word:
  id: mww
  microphone: i2s_mics
  stop_after_detection: false
  models:
    - alexa
  on_wake_word_detected:
    - if:
        condition:
          voice_assistant.is_running:
        then:
          voice_assistant.stop:
          # Stop any other media player announcement
        else:
          - if:
              condition:
                media_player.is_announcing:
              then:
                - media_player.stop:
                    announcement: true
              else:
              # Start the voice assistant
                - voice_assistant.start:
                    wake_word: !lambda return wake_word;

voice_assistant:
  id: va
  microphone: i2s_mics
  #media_player: external_media_player
  speaker: announcement_spk_resampling_input
  micro_wake_word: mww
  #noise_suppression_level: 2
  use_wake_word: false
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  on_client_connected:
    - micro_wake_word.start:
  on_client_disconnected:
    - voice_assistant.stop:

spi:
  - id: spi_bus
    clk_pin: GPIO${spi_clk_pin}
    mosi_pin: GPIO${spi_mosi_pin}

display:
  - platform: ili9xxx
    id: s3_box_lcd
    model: ${displaymodel}
    invert_colors: ${invertcolors}
    data_rate: 40MHz
    cs_pin: GPIO${lcd_cs_pin}
    dc_pin: GPIO${lcd_dc_pin}
    reset_pin:
      number: GPIO${lcd_reset_pin}
    update_interval: never
    dimensions:
        height: ${imageheight}
        width: ${imagewidth}

switch:
  - platform: gpio
    id: speaker_enable_switch
    name: Speaker Enable
    icon: "mdi:speaker"
    entity_category: config
    pin: GPIO${speaker_enable_pin}
    restore_mode: RESTORE_DEFAULT_ON

@will_santana Pull the latest main branch from n-IA-hane/intercom-api. I have VA pointed to the media player announcement pipeline with the stereo_aec_reference enabled and both VA and MWW are working fine.

@gtjoseph , no luck with v2.0.2

Still the same integration errors

Im playing around with an ESP32-S3-CAM board

Just the Intercom Mini config with some small changes

substitutions:
  name: doorbell
  friendly_name: Doorbell

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  min_version: 2025.5.0
  platformio_options:
    board_build.flash_mode: dio
    board_upload.maximum_ram_size: 327680
    board_upload.maximum_size: 16777216
    


esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  flash_size: 16MB
  partitions: "default_16MB.csv"
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      # Default is 10, increased for: TCP server + API + OTA + web_server
      CONFIG_LWIP_MAX_SOCKETS: "16"

psram:
  mode: octal    # Mini has quad PSRAM
  speed: 80MHz

# ==============================================================================
# CONNECTIVITY
# ==============================================================================
api:
  on_client_connected:
    - lambda: 'id(intercom).publish_entity_states();'
  encryption:
    key: ""

ota:
  - platform: esphome
    password: ""

logger:
  hardware_uart: UART0
  level: DEBUG
  logs:
    intercom_api: DEBUG
    component: INFO

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  power_save_mode: none


i2c:
  - id: cam_i2c
    sda: GPIO4
    scl: GPIO5


esp32_camera:
  name: "Doorbell CAM"
  external_clock:
    pin: GPIO15
    frequency: 20MHz
  i2c_id: cam_i2c
  data_pins: [GPIO11, GPIO9, GPIO8, GPIO10, GPIO12, GPIO18, GPIO17, GPIO16]
  vsync_pin: GPIO6
  href_pin: GPIO7
  pixel_clock_pin: GPIO13
  resolution: 800X600
  jpeg_quality: 10
  max_framerate: 5fps
  idle_framerate: 0.1fps
  brightness: 2
  frame_buffer_count: 1
  vertical_flip: true
  horizontal_mirror: false
  aec_mode: AUTO
  aec2: True
  ae_level: 2
  #special_effect: grayscale 


# ==============================================================================
# EXTERNAL COMPONENTS
# ==============================================================================
external_components:
  - source:
      type: local
      path: esphome_components
    components: [intercom_api, esp_aec]

# ==============================================================================
# I2S AUDIO BUSES
# ==============================================================================
i2s_audio:
  # I2S Bus 0: INMP441 Microphone
  - id: i2s_mic_bus
    i2s_lrclk_pin: GPIO41
    i2s_bclk_pin: GPIO42

  # I2S Bus 1: MAX98357A Speaker
  - id: i2s_spk_bus
    i2s_lrclk_pin: GPIO47
    i2s_bclk_pin: GPIO21

# ==============================================================================
# MICROPHONE (SPH0645)
# ==============================================================================
microphone:
  - platform: i2s_audio
    id: mic_component
    i2s_audio_id: i2s_mic_bus
    i2s_din_pin: GPIO40
    adc_type: external
    pdm: false
    bits_per_sample: 32bit
    sample_rate: 16000
    channel: left

# ==============================================================================
# SPEAKER (MAX98357A)
# ==============================================================================
speaker:
  - platform: i2s_audio
    id: spk_component
    i2s_audio_id: i2s_spk_bus
    i2s_dout_pin: GPIO14
    dac_type: external
    i2s_mode: primary
    sample_rate: 16000
    bits_per_sample: 16bit
    timeout: never        # Keep I2S running (avoids clicks on resume)
    buffer_duration: 100ms  # Low latency buffer for real-time intercom

# ==============================================================================
# AEC (Acoustic Echo Cancellation)
# ==============================================================================
esp_aec:
  id: aec_processor
  sample_rate: 16000
  filter_length: 4     # 4 = 64ms tail (good balance of quality vs CPU)
  mode: voip_high_perf  

# ==============================================================================
# INTERCOM API (TCP-based, port 6054)
# ==============================================================================
# Auto-creates these sensors:
#   - text_sensor.intercom_mini_intercom_state (Idle/Ringing/Streaming)
#   - text_sensor.intercom_mini_destination (selected contact) [full mode only]
#   - text_sensor.intercom_mini_caller (who is calling) [full mode only]
#   - text_sensor.intercom_mini_contacts (count) [full mode only]

intercom_api:
  id: intercom
  mode: full                  # full = ESP↔ESP calls with contacts, simple = browser only
  microphone: mic_component
  speaker: spk_component
  mic_bits: 32                # SPH0645 outputs 32-bit (default 16), data in upper 18 bits
  dc_offset_removal: true     # SPH0645 has DC bias that must be removed
  aec_id: aec_processor       # Links to esp_aec for echo cancellation
  ringing_timeout: 30s        # Auto-decline unanswered calls

  # === FSM event callbacks ===
  on_incoming_call:
    - logger.log: "Incoming call"

  on_outgoing_call:
    - light.turn_on:
        id: status_led
        effect: "Ringing"
        red: 100%
        green: 50%
        blue: 0%
    # Fire HA event when calling "Home Assistant" (for notifications/automations)
    - if:
        condition:
          lambda: 'return id(intercom).get_current_destination() == "Home Assistant";'
        then:
          - homeassistant.event:
              event: esphome.intercom_call
              data:
                caller: !lambda 'return App.get_friendly_name();'
                destination: "Home Assistant"
                type: "doorbell"

  on_ringing:
    - light.turn_on:
        id: status_led
        effect: "Ringing"
        red: 100%
        green: 0%
        blue: 0%

  on_answered:
    - logger.log: "Call answered"

  on_streaming:
    - light.turn_on:
        id: status_led
        effect: "None"
        red: 0%
        green: 100%
        blue: 0%

  on_idle:
    - light.turn_off: status_led

  on_hangup:
    - logger.log:
        format: "Hangup: %s"
        args: ['reason.c_str()']

  on_call_failed:
    - logger.log:
        format: "Call failed: %s"
        args: ['reason.c_str()']

# ==============================================================================
# BUTTONS
# ==============================================================================
button:
  # Smart Call button: idle→call, ringing→answer, streaming→hangup
  # The on_outgoing_call callback handles the HA event for doorbell notifications
  - platform: template
    id: call_button
    name: "Call"
    icon: "mdi:phone"
    on_press:
      - intercom_api.call_toggle:
          id: intercom

  # Next contact (full mode)
  - platform: template
    id: next_contact_button
    name: "Next Contact"
    icon: "mdi:arrow-right"
    on_press:
      - intercom_api.next_contact:
          id: intercom

  # Previous contact (full mode)
  - platform: template
    id: prev_contact_button
    name: "Previous Contact"
    icon: "mdi:arrow-left"
    on_press:
      - intercom_api.prev_contact:
          id: intercom

  # Decline incoming call
  - platform: template
    id: decline_button
    name: "Decline"
    icon: "mdi:phone-hangup"
    on_press:
      - intercom_api.decline_call:
          id: intercom

  - platform: template
    id: refresh_contacts_button
    name: "Refresh Contacts"
    icon: "mdi:refresh"
    entity_category: config
    on_press:
      - intercom_api.set_contacts:
          id: intercom
          contacts_csv: !lambda 'return id(ha_active_devices).state;'

  - platform: restart
    name: "Restart"
    icon: "mdi:restart"

# ==============================================================================
# SWITCHES (native platform with restore_mode)
# ==============================================================================
switch:
  - platform: intercom_api
    intercom_api_id: intercom
    auto_answer:
      id: auto_answer_switch
      name: "Auto Answer"
      restore_mode: RESTORE_DEFAULT_ON
    aec:
      id: aec_switch
      name: "Echo Cancellation"
      restore_mode: RESTORE_DEFAULT_OFF

# ==============================================================================
# NUMBERS (native platform with restore_value)
# ==============================================================================
number:
  - platform: intercom_api
    intercom_api_id: intercom
    speaker_volume:
      id: speaker_volume
      name: "Speaker Volume"
    mic_gain:
      id: mic_gain
      name: "Mic Gain"

# ==============================================================================
# STATUS LED (WS2812 RGB on GPIO21)
# ==============================================================================
light:
  - platform: esp32_rmt_led_strip
    id: status_led
    name: "Status LED"
    icon: "mdi:led-on"
    pin: GPIO48
    chipset: WS2812
    num_leds: 1
    rgb_order: RGB
    effects:
      - pulse:
          name: "Streaming"
          min_brightness: 20%
          max_brightness: 100%
      - strobe:
          name: "Ringing"
          colors:
            - state: true
              brightness: 100%
              red: 100%
              green: 0%
              blue: 0%
              duration: 250ms
            - state: false
              duration: 250ms

# ==============================================================================
# TEXT SENSORS
# ==============================================================================
text_sensor:
  # Subscribe to HA's centralized contacts sensor
  - platform: homeassistant
    id: ha_active_devices
    entity_id: sensor.intercom_active_devices
    on_value:
      - intercom_api.set_contacts:
          id: intercom
          contacts_csv: !lambda 'return x;'

# ==============================================================================
# DIAGNOSTICS
# ==============================================================================
sensor:
  - platform: wifi_signal
    name: "WiFi Signal"
    update_interval: 60s

  - platform: uptime
    name: "Uptime"
    update_interval: 60s

  - platform: internal_temperature
    name: "CPU Temperature"
    update_interval: 60s

It works somehow but calls get radom stopped with this messeage in log:

[W][intercom_api:1350][intercom_srv]: Payload incomplete: 1436/2048

Any Idea how not to end the complete call if that happens?

Im on version 2.0.2

Hi, I have a very similar board. I’ll try to run some tests with that. So, based on my instincts, it’ll be a pain to get them to coexist. I was hoping the cams would work better with the S3, but even in my tests without any audio components, the ESP seems to be struggling. I lowered the resolution, took a while to update, but I have the feeling that managing the cams for an ESP is a significant effort, but I’m optimistic. Last night we made significant progress on i2s audio duplex and AEC. I haven’t published anything yet because I still have to finish the tests properly, but everything suggests that in the next versions we’ll have i2s audio duplex working with MWW and voice assistant with AEC that cleans up the speaker output. You can say “ok nabu” while the TTS is speaking; in my tests, this is already working. Voice assistant is also working even during calls between ESPs; the TTS output is suppressed by AEC and you can’t hear it during the call; you only hear the other person’s voice. One day it will be possible to say ok nabu, call kitchen, (stream full duplex), ok nabu, hangup… I2s audio duplex was born as a component to support intercom but it is becoming an audio hub with noise suppression, many roads are opening up going forward.

That’s really odd. Are you sure you have the tip of the main branch?
You need at least commit a26edc74 “fix: audio task lifecycle, stream limits, ESP32-P4 AEC support”. If you have “v2.0.2: code style refactor, TCP timeout, xiaozhi display”, that’s broken. Could the older version still be in ESPHome’s cache?

Hi, my observations were that it often worked without any problems and then suddenly dropped the connection with Payload incomplete.

Next call worked over minutes with video without any issues.
In some cases, the ESP32-S3 Cam also had connection problems or poor Wi-Fi. And I guess there could be a relation.

But its only working nearly reliable if I reduces resolution of the Cam like you also said.

The upcomming changes sounds very promission !

I gues next week that P4 Board with Mic and Display will arrive.
Maybe its working better there.

Clean build cache and refresh set to 0s in external_components so it always pulls from github every build

I’ll give it another go tomorrow with a fresh mind

I was finally able to use it

The main issue seems to be with the resampler component
Point directly to the media player, it worked (but still had to declare num_channels and sample_rate explicitly)

Besides being too little sensitive (will play with the gain and others) and the initial wake word detection (probably my yaml), the mic detected the wake word mid reply just fine

Still unable to stream music from music assistant with the same errors

In the device:

In MA:

Seems some inherited property isn’t being declared down the pipeline

Won’t have time to work on it until Thursday, but its noted

I started to play around with the jc4880p443 but getting stuck in that compiler error:

In file included from src/esphome/core/component.h:9,
                 from src/esphome/components/intercom_api/intercom_api.h:5,
                 from src/esphome/components/intercom_api/intercom_api.cpp:1:
src/esphome/components/intercom_api/intercom_api.cpp: In member function 'void esphome::intercom_api::IntercomApi::publish_entity_states()':
src/esphome/components/intercom_api/intercom_api.cpp:263:53: error: 'class esphome::intercom_api::IntercomApi' has no member named 'aec_enabled_'
  263 |            this->auto_answer_ ? "ON" : "OFF", this->aec_enabled_ ? "ON" : "OFF");
      |                                                     ^~~~~~~~~~~~
src/esphome/core/log.h:109:99: note: in definition of macro 'esph_log_i'
  109 |   ::esphome::esp_log_printf_(ESPHOME_LOG_LEVEL_INFO, tag, __LINE__, ESPHOME_LOG_FORMAT(format), ##__VA_ARGS__)
      |                                                                                                   ^~~~~~~~~~~
src/esphome/components/intercom_api/intercom_api.cpp:261:3: note: in expansion of macro 'ESP_LOGI'
  261 |   ESP_LOGI(TAG, "Entity states synced (vol=%.0f%%, mic=%.1fdB, auto=%s, aec=%s)",
      |   ^~~~~~~~
Compiling .pioenvs/jc4880p443/src/esphome/components/safe_mode/safe_mode.cpp.o
*** [.pioenvs/jc4880p443/src/esphome/components/intercom_api/intercom_api.cpp.o] Error 1

I tried with that esphome yaml

esphome:
  name: "jc4880p443"
  friendly_name: JC4880P443
logger:
  level: DEBUG

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true
  post_connect_roaming: false

web_server:
  port: 80

ota:
  - platform: esphome

api:
  on_client_connected:
    - globals.set:
        id: homeassistant_ip
        value: !lambda return client_address;
font:
  - file: "gfonts://Montserrat"
    id: montserrat_28
    size: 28

output:
  - id: gpio_backlight_pwm
    platform: ledc
    pin: 23

light:
  - id: backlight
    name: Backlight
    platform: monochromatic
    output: gpio_backlight_pwm
    restore_mode: ALWAYS_ON


switch:
  - platform: restart
    name: Restart

binary_sensor:
  - platform: status
    name: Status

number:
  - platform: template
    name: Screen timeout
    optimistic: true
    id: display_timeout
    unit_of_measurement: "m"
    initial_value: 5 #minutes
    restore_value: true
    min_value: 0 #0 is no timeout
    max_value: 99
    step: 1
    mode: box


sensor:
  - id: wifi_signal_db
    name: WiFi Signal
    platform: wifi_signal
    update_interval: 60s
    entity_category: diagnostic

  - id: wifi_signal_strength
    name: WiFi Strength
    platform: copy
    source_id: wifi_signal_db
    filters:
      - lambda: return min(max(2 * (x + 100.0), 0.0), 100.0);
    unit_of_measurement: "%"
    entity_category: diagnostic

text_sensor:
  - platform: wifi_info
    ip_address:
      name: IP Address
      entity_category: diagnostic
    ssid:
      name: Connected SSID
      entity_category: diagnostic
    mac_address:
      name: Mac Address
      entity_category: diagnostic

globals:
  - id: homeassistant_ip
    type: std::string
  - id: backlight_brightness_level
    type: float
    restore_value: yes
    initial_value: '0.5'  # 50% default brightness

esp32:
  board: esp32-p4-evboard
  #cpu_frequency: 360MHz
  flash_size: 16MB
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_LWIP_MAX_SOCKETS: "16"
    advanced:
      enable_idf_experimental_features: yes 


# ==============================================================================
# EXTERNAL COMPONENTS
# ==============================================================================
external_components:
  - source:
      type: local
      path: esphome_components
    components: [intercom_api]


# Intercom API - Simple mode (browser only)
intercom_api:
  id: intercom
  mode: full
  microphone: es8311_mic
  speaker: es8311_hardware_out


display:
  - platform: mipi_dsi
    id: device_display
    model: JC4880P443
    byte_order: little_endian
    rotation: 90
    lambda: |-
      it.fill(Color::BLACK);
      it.print(340, 100, id(montserrat_28), Color(0,255,0), TextAlign::LEFT, "Hello World1");
      it.print(340, 200, id(montserrat_28), Color::WHITE, TextAlign::LEFT, "Hello World2");
      it.print(340, 300, id(montserrat_28), Color(255,0,0), TextAlign::LEFT, "Hello World3");
touchscreen:
  platform: gt911
  i2c_id: i2c_bus
  id: device_touchscreen
  reset_pin: GPIO3
  update_interval: 100ms
  transform: #This is for 90 degree display rotation
    swap_xy: true
    mirror_x: false
    mirror_y: true
  on_update:
    then:
      - lambda: |-
          if (touches.size() > 0) {
            auto touch = touches[0];
            ESP_LOGI("TOUCH", "X=%d Y=%d", touch.x, touch.y);
          }
esp_ldo:
  - channel: 3
    voltage: 2.5V

psram:
  mode: hex
  speed: 200MHz

preferences:
  flash_write_interval: 5min

esp32_hosted:
  variant: ESP32C6
  reset_pin: GPIO54
  cmd_pin: GPIO19
  clk_pin: GPIO18
  d0_pin: GPIO14
  d1_pin: GPIO15
  d2_pin: GPIO16
  d3_pin: GPIO17
  active_high: true

i2c:
  id: i2c_bus
  sda: 7
  scl: 8
  scan: false
  frequency: 400kHz

# Audio section
audio_dac:
  - platform: es8311
    id: esp7311_dac
    address: 0x18
    i2c_id: i2c_bus

i2s_audio:
  - id: i2s_bus
    i2s_lrclk_pin: GPIO10 # WS / LRCK
    i2s_bclk_pin: GPIO12  # BCLK
    i2s_mclk_pin: GPIO13  # ES8311 usually requires a Master Clock

microphone:
  - platform: i2s_audio
    id: es8311_mic
    i2s_audio_id: i2s_bus
    i2s_din_pin: GPIO48  # Data In from the Codec
    adc_type: external
    pdm: false           # ES8311 uses standard I2S, not PDM
    channel: left

speaker:
  - platform: i2s_audio
    id: es8311_hardware_out
    i2s_audio_id: i2s_bus
    i2s_dout_pin: GPIO9
    dac_type: external
    audio_dac: esp7311_dac
  - platform: mixer
    id: audio_mixer
    output_speaker: es8311_hardware_out
    source_speakers:
      - id: spk_announcement
      - id: spk_media

media_player:
  - platform: speaker
    name: "XL Speaker"
    id: xl_media_player
    # This section is now mandatory for the 'speaker' platform
    announcement_pipeline:
      speaker: spk_announcement
    media_pipeline:
      speaker: spk_media

how you guys with the P4 got it working?

[Edit]
Partially working, noticed i wasnt on newest code with P4 support.
Still struggeling with speaker now :frowning:

Take a look to the new release.