Core 2026.1.3
Supervisor 2026.01.1
Operating System 17.0
Frontend 20260107.2
I don’t use ESPHome Builder as I build on my host machine but I’m running the same 2026.1.2 locally.
I don’t use Music Assistant at all.
Core 2026.1.3
Supervisor 2026.01.1
Operating System 17.0
Frontend 20260107.2
I don’t use ESPHome Builder as I build on my host machine but I’m running the same 2026.1.2 locally.
I don’t use Music Assistant at all.
This release focuses on echo cancellation quality, especially for the Xiaozhi Ball V3 (ES8311 codec).
The AEC in v2.0.0 worked but wasn’t great. The ring buffer approach for the speaker reference signal had inherent timing issues - the delay between “what the speaker
played” and “when the mic heard the echo” would slowly drift.
The fix exploits an ES8311 hardware feature: register 0x44 can be configured to output both the DAC playback and ADC recording on the same I2S data line as a
stereo signal:
The i2s_audio_duplex component reads this stereo frame, splits L/R, and feeds both to the AEC. The reference is now sample-accurate - same I2S frame, zero
timing drift. The improvement in echo cancellation quality is dramatic.
i2s_audio_duplex:
id: i2s_duplex
i2s_lrclk_pin: GPIO45
i2s_bclk_pin: GPIO9
i2s_mclk_pin: GPIO16
i2s_din_pin: GPIO10
i2s_dout_pin: GPIO8
sample_rate: 16000
aec_id: aec_component
use_stereo_aec_reference: true # ES8311 digital feedback
aec_reference_delay_ms: 10 # sample-aligned, minimal delay
You also need to configure the ES8311 register via I2C on boot:
esphome:
on_boot:
priority: 600
then:
- lambda: |-
uint8_t data[2] = {0x44, 0x48}; // ADCDAT_SEL = DACL+ADC
id(i2c_bus).write(0x18, data, 2);
Devices with separate mic/speaker (INMP441 + MAX98357A) still use the ring buffer approach, which also received timing fixes in this release.
Multi-Listener Support (Voice Assistant Coexistence)
Thanks to a contribution from willrnsantana · GitHub, the i2s_audio_duplex microphone and speaker platforms now support reference counting. Multiple
components can request mic/speaker access simultaneously - the hardware only stops when all listeners have released it.
Combined with the new MicrophoneSource pattern (using MicrophoneSource* instead of Microphone*), this lays the groundwork for running voice_assistant and
intercom_api on the same device.
Disclaimer: The reference counting infrastructure is merged and working for intercom use, but I haven’t personally tested simultaneous operation with voice_assistant
yet. If anyone tries it, I’d love to hear how it goes!
Other Improvements
Hardware Tested
│ Device │ Mic/Speaker │ AEC │ Notes │
├────────────────────────┼─────────────────────┼─────────────────────────┼───────────────────────┤
│ Xiaozhi Ball V3 (~$15) │ ES8311 codec │ Stereo digital feedback │ Best AEC quality │
├────────────────────────┼─────────────────────┼─────────────────────────┼───────────────────────┤
│ ESP32-S3 Mini │ INMP441 + MAX98357A │ Ring buffer │ Good, improved timing │
└────────────────────────┴─────────────────────┴─────────────────────────┴───────────────────────┘
Links
Just sent you a message on tele
The semaphore contribution was @gtjoseph’s contribution
@meconiotech There appear to be several issues with your recent pushes…
Finally, It’s really not cool to take the contents of other people’s commits, like the commit I pushed to @will_santana , and recommit them under your own name. I’m hoping it was an honest mistake or just got lost in the transfer from @will_santana . It’d be nice to know that If I should send you a few PRs in the future they’ll be properly credited.
You’re right about everything. I apologize. I’m not very familiar with GitHub yet. I’m getting help from Claude Code, who sometimes doesn’t understand anything, but it saves me a lot of time. Let me fix everything. Unfortunately, I realize I only have two devices on which to test the changes in my existing ecosystem. Changing the reference for AEC has finally started working properly after months, even better than on dual-bus devices. I was just starting to test coexistence with Voice Assistant and MWW.
Ah, I gotcha. I can send you a PR to fix the audio defaults plus another one to enable the AEC for the ESP32P4.
I sincerely apologize. The reference counting code was yours and it should have been properly attributed from the start. I’ve
now fixed the commit history — the rc9 commit correctly shows your Co-Authored-By along with @willrnsantana’s:
v2.0.1-rc9: Reference counting for multi-listener support · n-IA-hane/intercom-api@811bd16 · GitHub
It was not intentional — I was moving fast between branches and didn’t handle the attribution properly. That’s on me, no excuses. Your contribution made
multi-listener support possible, which is foundational for voice_assistant coexistence. Thank you for that work.
Regarding the bugs you reported:
PRs are absolutely welcome and will be properly credited. I’d love the ESP32-P4 AEC PR and the audio defaults fix you mentioned. Sorry again for the attribution
issue.
No worries! I figured it was just part of the confusion. I pushed up the two PRs but I see you already took care of it so I’ll close the PRs.
It looks like everything’s working! I was able to enable the stereo aec reference and so far, so good. The only nit is that the AEC status messages every 500 frames should probably be less frequent and debug level instead of info level. ![]()
Thank you for all the hard work you’ve put into this!
EDIT: And… I just tested playing the wake word through the media player and it did NOT trigger MWW so that’s a big win!
I’ve fixed the speaker and mic with v2.0.1
But suffer from the same issue as @gtjoseph. Once AEC is enabled, it “kills” the mic saying the result from input-output=0
Right now, VA and MWW works 100% if you point the VA to the speaker_duplex
If it points to a speaker_media_player, it only prints an error with no further clue (even in very verbose mode)
Any help with that is aprreciated
I’ll commit my semi-working yaml to my second repo, but it’s down here too:
substitutions:
name: esphome-web-deba9c
friendly_name: "Alexa do escritório"
## v1.07 26-jul-2025 #############################################################################################################################
## changed how to play startup sound to avoid double triggering
## v1.06 19-jul-2025 #############################################################################################################################
## Added: Startup sound when connected to HA, optional with switch, and option to select other sounds in settings below.
## Added: Mute and Playing icon to all emos.
## v1.05 18-jul-2025 #############################################################################################################################
## Fixes: minor bugfixes, sound level to max, show muted mic when playing music or playing TTS, not when playing internal sounds.
## v1.04 13-jul-2025 #############################################################################################################################
## Added: optional wake sound, delays the time before listening, so optional switch in HA.
## Moved: show_text and show_battery_status to switches in HA.
## v1.03 08-jul-2025 #############################################################################################################################
## Added: optional show text boxes
## v1.02 30-jun-2025 #############################################################################################################################
## Added optional Battery Status
## SETTINGS ######################################################################################################################################
imagemodel: "Eyes" # (options are: Alfred,Astrobot,Buzz,Casita,Cybergirl,Dory,EVE,Eyes,Eyes2,GLaDOS,Girl1,Guy1,Guy2,Gwen,HA-character,Harley,Jarvis,Luffy,Mario,Max,Prime,Robochibi,Robocop,Robot,Robotgirl,Shaun)
startup_sound: "Home_Connected" # (options are: available,Home_Connected,Computer_Ready)
imagewidth: "240" # GC9A01A (Ball v2 & Muma & Puck) "240"
imageheight: "240" # GC9A01A (Ball v2 & Muma & Puck) "240"
displaymodel: "GC9A01A" # GC9A01A (Ball v2 & Puck) or ST7789V (Muma)
invertcolors: "true" # GC9A01A/ST7789V (Ball v2 & Muma & Puck) "true"
##################################################################################################################################################
# Hardware v2 pin mappings
sda_pin_bus_a: "15" # I2C Bus A SDA
scl_pin_bus_a: "14" # I2C BUS A SCL
sda_pin_bus_b: "11" # I2C Bus B SDA
scl_pin_bus_b: "7" # I2C BUS B SCL
i2s_lrclk_pin: "45" # I2S LRCLK (Word Select)
i2s_bclk_pin: "9" # I2S BCLK (Bit Clock)
i2s_mclk_pin: "16" # I2S MCLK (Master Clock)
i2s_din_pin: "10" # I2S Data In (Mic)
i2s_dout_pin: "8" # I2S Data Out (Speaker)
speaker_enable_pin: "46" # Speaker Enable
touch_input_pin: "12" # Touch interrupt
touch_reset_pin: "6" # Touch Reset
backlight_output_pin: "42" # Display Backlight
lcd_cs_pin: "5" # Display CS (Chip Select)
lcd_dc_pin: "47" # Display DC (Data/Command)
lcd_reset_pin: "38" # Display Reset
spi_clk_pin: "4" # SPI Clock
spi_mosi_pin: "2" # SPI MOSI (Data Out)
left_top_button_pin: "0" # Main Button
led_pin: "48" # RGB LED (WS2812)
battery_adc_pin: "1" # Battery Voltage ADC
##################################################################################################################################################
loading_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/loading.png
idle_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/idle.png
listening_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/listening.png
thinking_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/thinking.png
replying_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/replying.png
error_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/error.png
timer_finished_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/timer_finished.png
mute_illustration_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/images/${imagemodel}/${imagewidth}x${imageheight}/mute.png
startup_sound_file: https://github.com/RealDeco/xiaozhi-esphome/raw/main/sounds/${startup_sound}.flac
loading_illustration_background_color: "000000"
idle_illustration_background_color: "000000"
listening_illustration_background_color: "000000"
thinking_illustration_background_color: "000000"
replying_illustration_background_color: "000000"
error_illustration_background_color: "000000"
voice_assist_idle_phase_id: "1"
voice_assist_listening_phase_id: "2"
voice_assist_thinking_phase_id: "3"
voice_assist_replying_phase_id: "4"
voice_assist_not_ready_phase_id: "10"
voice_assist_error_phase_id: "11"
voice_assist_muted_phase_id: "12"
voice_assist_timer_finished_phase_id: "20"
allowed_characters: " !#%'()+,-./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWYZ[]_abcdefghijklmnopqrstuvwxyz{|}°²³µ¿ÁÂÄÅÉÖÚßàáâãäåæçèéêëìíîðñòóôõöøùúûüýþāăąćčďĐđēėęěğĮįıļľŁłńňőřśšťũūůűųźŻżŽžơưșțΆΈΌΐΑΒΓΔΕΖΗΘΚΜΝΠΡΣΤΥΦάέήίαβγδεζηθικλμνξοπρςστυφχψωϊόύώАБВГДЕЖЗИКЛМНОПРСТУХЦЧШЪЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћאבגדהוזחטיכלםמןנסעפץצקרשת،ءآأإئابةتجحخدذرزسشصضطظعغفقكلمنهوىيٹپچڈکگںھہیےংকচতধনফবযরলশষস়ািু্చయలిెొ్ംഅആഇഈഉഎഓകഗങചജഞടഡണതദധനപഫബഭമയരറലളവശസഹാിീുൂെേൈ്ൺൻർൽൾაბგდევზთილმნოპრსტუფქყშჩცძჭხạảấầẩậắặẹẽếềểệỉịọỏốồổỗộớờởợụủứừửữựỳ—、一上不个中为主乾了些亮人任低佔何作供依侧係個側偵充光入全关冇冷几切到制前動區卧厅厨及口另右吊后吗启吸呀咗哪唔問啟嗎嘅嘛器圍在场執場外多大始安定客室家密寵对將小少左已帘常幫幾库度庫廊廚廳开式後恆感態成我戲戶户房所扇手打执把拔换掉控插摄整斯新明是景暗更最會有未本模機檯櫃欄次正氏水沒没洗活派温測源溫漏潮激濕灯為無煙照熱燈燥物狀玄现現瓦用發的盞目着睡私空窗立笛管節簾籬紅線红罐置聚聲脚腦腳臥色节著行衣解設調請謝警设调走路車车运連遊運過道邊部都量鎖锁門閂閉開關门闭除隱離電震霧面音頂題顏颜風风食餅餵가간감갔강개거게겨결경고공과관그금급기길깥꺼껐꼽나난내네놀누는능니다닫담대더데도동됐되된됨둡드든등디때떤뜨라래러렇렌려로료른를리림링마많명몇모무문물뭐바밝방배변보부불블빨뽑사산상색서설성세센션소쇼수스습시신실싱아안않알았애야어얼업없었에여연열옆오온완외왼요운움워원위으은을음의이인일임입있작잠장재전절정제져조족종주줄중줘지직진짐쪽차창천최추출충치침커컴켜켰쿠크키탁탄태탬터텔통트튼티파팬퍼폰표퓨플핑한함해했행혀현화활후휴힘,?"
font_glyphsets: "GF_Latin_Core"
font_family: Figtree
esphome:
name: ${name}
friendly_name: ${friendly_name}
min_version: 2025.5.0
name_add_mac_suffix: false
on_boot:
# - lambda: |-
# uint8_t data[2] = {0x44, 0x48}; # ADCDAT_SEL = DACL+ADC
# id(i2c_bus).write(0x18, data, 2);
priority: 600
then:
- component.update: battery_voltage
- component.update: battery_percentage
- delay: 30s
esp32:
board: esp32-s3-devkitc-1
flash_size: 16MB
cpu_frequency: 240MHz
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
psram:
mode: octal
speed: 80MHz
ota:
- platform: esphome
id: ota_esphome
logger:
hardware_uart: USB_SERIAL_JTAG
level: DEBUG
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "Ball v2 Hotspot"
password: "RZ7D3EzJdPM6"
captive_portal:
external_components:
- source:
type: git
url: https://github.com/willrnsantana/esphome-i2s_audio_duplex_integration
ref: main
path: esphome_components
refresh: 0s
components: [intercom_api, i2s_audio_duplex, esp_aec]
intercom_api:
button:
- platform: factory_reset
id: factory_reset_btn
internal: true
sensor:
- platform: adc
pin: GPIO${battery_adc_pin}
name: "Battery Voltage"
id: battery_voltage
attenuation: 12db
accuracy_decimals: 2
update_interval: 1s
unit_of_measurement: "V"
icon: mdi:battery-medium
filters:
- multiply: 2.0
- median:
window_size: 7
send_every: 7
send_first_at: 7
- throttle: 1min
on_value:
then:
- component.update: battery_percentage
- platform: template
id: battery_percentage
name: "Battery Percentage"
lambda: return id(battery_voltage).state;
accuracy_decimals: 0
unit_of_measurement: "%"
icon: mdi:battery-medium
filters:
- calibrate_linear:
method: exact
datapoints:
- 2.80 -> 0.0
- 3.10 -> 10.0
- 3.30 -> 20.0
- 3.45 -> 30.0
- 3.60 -> 40.0
- 3.70 -> 50.0
- 3.75 -> 60.0
- 3.80 -> 70.0
- 3.90 -> 80.0
- 4.00 -> 90.0
- 4.20 -> 100.0
- lambda: |-
if (x > 100) return 100;
if (x < 0) return 0;
return x;
touchscreen:
- platform: cst816
i2c_id: bus_b
interrupt_pin: ${touch_input_pin}
reset_pin: ${touch_reset_pin}
id: touch_dp
output:
- platform: ledc
pin: GPIO${backlight_output_pin}
id: backlight_output
inverted: true
light:
- platform: monochromatic
id: Sled
name: Screen
icon: "mdi:television"
entity_category: config
output: backlight_output
restore_mode: ALWAYS_ON
default_transition_length: 250ms
- platform: esp32_rmt_led_strip
id: led
name: RGB light
disabled_by_default: false
entity_category: config
pin: GPIO${led_pin}
default_transition_length: 0s
chipset: WS2812
num_leds: 1
rgb_order: grb
effects:
- pulse:
name: "Slow Pulse"
transition_length: 250ms
update_interval: 250ms
min_brightness: 50%
max_brightness: 100%
- pulse:
name: "Fast Pulse"
transition_length: 100ms
update_interval: 100ms
min_brightness: 50%
max_brightness: 100%
api:
i2c:
- id: bus_a
sda: GPIO${sda_pin_bus_a}
scl: GPIO${scl_pin_bus_a}
scan: true
- id: bus_b
sda: GPIO${sda_pin_bus_b}
scl: GPIO${scl_pin_bus_b}
scan: true
esp_aec:
id: aec_component
sample_rate: 16000
filter_length: 4
i2s_audio_duplex:
id: i2s_duplex
i2s_lrclk_pin: GPIO${i2s_lrclk_pin} # Word Select (WS/LRCLK)
i2s_bclk_pin: GPIO${i2s_bclk_pin} # Bit Clock (BCK/BCLK)
i2s_mclk_pin: GPIO${i2s_mclk_pin} # Master Clock (optional, some codecs need it)
i2s_din_pin: GPIO${i2s_din_pin} # Data In (from codec ADC → ESP mic)
i2s_dout_pin: GPIO${i2s_dout_pin} # Data Out (from ESP → codec DAC speaker)
sample_rate: 16000
#aec_id: aec_component # Optional: link to esp_aec
#use_stereo_aec_reference: true # ES8311 digital feedback
#aec_reference_delay_ms: 10
#mic_attenuation: 0.5
audio_dac:
- platform: es8311
i2c_id: bus_a
id: es8311_dac
bits_per_sample: 16bit
sample_rate: 16000
microphone:
- platform: i2s_audio_duplex
id: i2s_mics
i2s_audio_duplex_id: i2s_duplex
sample_rate: 16000
speaker:
- platform: i2s_audio_duplex
id: i2s_audio_speaker
i2s_audio_duplex_id: i2s_duplex
sample_rate: 16000
bits_per_sample: 32 bit
audio_dac: es8311_dac
num_channels: 1
- platform: mixer
id: mixer_speaker_id
output_speaker: i2s_audio_speaker
source_speakers:
- id: announcement_spk_mixer_input
timeout: never
- id: media_spk_mixer_input
timeout: never
- platform: resampler
id: announcement_spk_resampling_input
output_speaker: announcement_spk_mixer_input
bits_per_sample: 16
- platform: resampler
id: media_spk_resampling_input
output_speaker: media_spk_mixer_input
bits_per_sample: 16
media_player:
- platform: speaker
name: None
id: external_media_player
task_stack_in_psram: true
codec_support_enabled: true
volume_initial: 70%
buffer_size: 6000
media_pipeline:
speaker: media_spk_resampling_input
num_channels: 1
format: FLAC
sample_rate: 16000
announcement_pipeline:
speaker: announcement_spk_resampling_input
format: FLAC
sample_rate: 16000
num_channels: 1 # S3 Box only has one output channel
micro_wake_word:
id: mww
microphone: i2s_mics
stop_after_detection: false
models:
- alexa
on_wake_word_detected:
- if:
condition:
voice_assistant.is_running:
then:
voice_assistant.stop:
# Stop any other media player announcement
else:
- if:
condition:
media_player.is_announcing:
then:
- media_player.stop:
announcement: true
else:
# Start the voice assistant
- voice_assistant.start:
wake_word: !lambda return wake_word;
voice_assistant:
id: va
microphone: i2s_mics
#media_player: external_media_player
speaker: announcement_spk_resampling_input
micro_wake_word: mww
#noise_suppression_level: 2
use_wake_word: false
auto_gain: 31dBFS
volume_multiplier: 2.0
on_client_connected:
- micro_wake_word.start:
on_client_disconnected:
- voice_assistant.stop:
spi:
- id: spi_bus
clk_pin: GPIO${spi_clk_pin}
mosi_pin: GPIO${spi_mosi_pin}
display:
- platform: ili9xxx
id: s3_box_lcd
model: ${displaymodel}
invert_colors: ${invertcolors}
data_rate: 40MHz
cs_pin: GPIO${lcd_cs_pin}
dc_pin: GPIO${lcd_dc_pin}
reset_pin:
number: GPIO${lcd_reset_pin}
update_interval: never
dimensions:
height: ${imageheight}
width: ${imagewidth}
switch:
- platform: gpio
id: speaker_enable_switch
name: Speaker Enable
icon: "mdi:speaker"
entity_category: config
pin: GPIO${speaker_enable_pin}
restore_mode: RESTORE_DEFAULT_ON
@will_santana Pull the latest main branch from n-IA-hane/intercom-api. I have VA pointed to the media player announcement pipeline with the stereo_aec_reference enabled and both VA and MWW are working fine.
@gtjoseph , no luck with v2.0.2
Still the same integration errors
Im playing around with an ESP32-S3-CAM board
Just the Intercom Mini config with some small changes
substitutions:
name: doorbell
friendly_name: Doorbell
esphome:
name: ${name}
friendly_name: ${friendly_name}
min_version: 2025.5.0
platformio_options:
board_build.flash_mode: dio
board_upload.maximum_ram_size: 327680
board_upload.maximum_size: 16777216
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
flash_size: 16MB
partitions: "default_16MB.csv"
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
# Default is 10, increased for: TCP server + API + OTA + web_server
CONFIG_LWIP_MAX_SOCKETS: "16"
psram:
mode: octal # Mini has quad PSRAM
speed: 80MHz
# ==============================================================================
# CONNECTIVITY
# ==============================================================================
api:
on_client_connected:
- lambda: 'id(intercom).publish_entity_states();'
encryption:
key: ""
ota:
- platform: esphome
password: ""
logger:
hardware_uart: UART0
level: DEBUG
logs:
intercom_api: DEBUG
component: INFO
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
power_save_mode: none
i2c:
- id: cam_i2c
sda: GPIO4
scl: GPIO5
esp32_camera:
name: "Doorbell CAM"
external_clock:
pin: GPIO15
frequency: 20MHz
i2c_id: cam_i2c
data_pins: [GPIO11, GPIO9, GPIO8, GPIO10, GPIO12, GPIO18, GPIO17, GPIO16]
vsync_pin: GPIO6
href_pin: GPIO7
pixel_clock_pin: GPIO13
resolution: 800X600
jpeg_quality: 10
max_framerate: 5fps
idle_framerate: 0.1fps
brightness: 2
frame_buffer_count: 1
vertical_flip: true
horizontal_mirror: false
aec_mode: AUTO
aec2: True
ae_level: 2
#special_effect: grayscale
# ==============================================================================
# EXTERNAL COMPONENTS
# ==============================================================================
external_components:
- source:
type: local
path: esphome_components
components: [intercom_api, esp_aec]
# ==============================================================================
# I2S AUDIO BUSES
# ==============================================================================
i2s_audio:
# I2S Bus 0: INMP441 Microphone
- id: i2s_mic_bus
i2s_lrclk_pin: GPIO41
i2s_bclk_pin: GPIO42
# I2S Bus 1: MAX98357A Speaker
- id: i2s_spk_bus
i2s_lrclk_pin: GPIO47
i2s_bclk_pin: GPIO21
# ==============================================================================
# MICROPHONE (SPH0645)
# ==============================================================================
microphone:
- platform: i2s_audio
id: mic_component
i2s_audio_id: i2s_mic_bus
i2s_din_pin: GPIO40
adc_type: external
pdm: false
bits_per_sample: 32bit
sample_rate: 16000
channel: left
# ==============================================================================
# SPEAKER (MAX98357A)
# ==============================================================================
speaker:
- platform: i2s_audio
id: spk_component
i2s_audio_id: i2s_spk_bus
i2s_dout_pin: GPIO14
dac_type: external
i2s_mode: primary
sample_rate: 16000
bits_per_sample: 16bit
timeout: never # Keep I2S running (avoids clicks on resume)
buffer_duration: 100ms # Low latency buffer for real-time intercom
# ==============================================================================
# AEC (Acoustic Echo Cancellation)
# ==============================================================================
esp_aec:
id: aec_processor
sample_rate: 16000
filter_length: 4 # 4 = 64ms tail (good balance of quality vs CPU)
mode: voip_high_perf
# ==============================================================================
# INTERCOM API (TCP-based, port 6054)
# ==============================================================================
# Auto-creates these sensors:
# - text_sensor.intercom_mini_intercom_state (Idle/Ringing/Streaming)
# - text_sensor.intercom_mini_destination (selected contact) [full mode only]
# - text_sensor.intercom_mini_caller (who is calling) [full mode only]
# - text_sensor.intercom_mini_contacts (count) [full mode only]
intercom_api:
id: intercom
mode: full # full = ESP↔ESP calls with contacts, simple = browser only
microphone: mic_component
speaker: spk_component
mic_bits: 32 # SPH0645 outputs 32-bit (default 16), data in upper 18 bits
dc_offset_removal: true # SPH0645 has DC bias that must be removed
aec_id: aec_processor # Links to esp_aec for echo cancellation
ringing_timeout: 30s # Auto-decline unanswered calls
# === FSM event callbacks ===
on_incoming_call:
- logger.log: "Incoming call"
on_outgoing_call:
- light.turn_on:
id: status_led
effect: "Ringing"
red: 100%
green: 50%
blue: 0%
# Fire HA event when calling "Home Assistant" (for notifications/automations)
- if:
condition:
lambda: 'return id(intercom).get_current_destination() == "Home Assistant";'
then:
- homeassistant.event:
event: esphome.intercom_call
data:
caller: !lambda 'return App.get_friendly_name();'
destination: "Home Assistant"
type: "doorbell"
on_ringing:
- light.turn_on:
id: status_led
effect: "Ringing"
red: 100%
green: 0%
blue: 0%
on_answered:
- logger.log: "Call answered"
on_streaming:
- light.turn_on:
id: status_led
effect: "None"
red: 0%
green: 100%
blue: 0%
on_idle:
- light.turn_off: status_led
on_hangup:
- logger.log:
format: "Hangup: %s"
args: ['reason.c_str()']
on_call_failed:
- logger.log:
format: "Call failed: %s"
args: ['reason.c_str()']
# ==============================================================================
# BUTTONS
# ==============================================================================
button:
# Smart Call button: idle→call, ringing→answer, streaming→hangup
# The on_outgoing_call callback handles the HA event for doorbell notifications
- platform: template
id: call_button
name: "Call"
icon: "mdi:phone"
on_press:
- intercom_api.call_toggle:
id: intercom
# Next contact (full mode)
- platform: template
id: next_contact_button
name: "Next Contact"
icon: "mdi:arrow-right"
on_press:
- intercom_api.next_contact:
id: intercom
# Previous contact (full mode)
- platform: template
id: prev_contact_button
name: "Previous Contact"
icon: "mdi:arrow-left"
on_press:
- intercom_api.prev_contact:
id: intercom
# Decline incoming call
- platform: template
id: decline_button
name: "Decline"
icon: "mdi:phone-hangup"
on_press:
- intercom_api.decline_call:
id: intercom
- platform: template
id: refresh_contacts_button
name: "Refresh Contacts"
icon: "mdi:refresh"
entity_category: config
on_press:
- intercom_api.set_contacts:
id: intercom
contacts_csv: !lambda 'return id(ha_active_devices).state;'
- platform: restart
name: "Restart"
icon: "mdi:restart"
# ==============================================================================
# SWITCHES (native platform with restore_mode)
# ==============================================================================
switch:
- platform: intercom_api
intercom_api_id: intercom
auto_answer:
id: auto_answer_switch
name: "Auto Answer"
restore_mode: RESTORE_DEFAULT_ON
aec:
id: aec_switch
name: "Echo Cancellation"
restore_mode: RESTORE_DEFAULT_OFF
# ==============================================================================
# NUMBERS (native platform with restore_value)
# ==============================================================================
number:
- platform: intercom_api
intercom_api_id: intercom
speaker_volume:
id: speaker_volume
name: "Speaker Volume"
mic_gain:
id: mic_gain
name: "Mic Gain"
# ==============================================================================
# STATUS LED (WS2812 RGB on GPIO21)
# ==============================================================================
light:
- platform: esp32_rmt_led_strip
id: status_led
name: "Status LED"
icon: "mdi:led-on"
pin: GPIO48
chipset: WS2812
num_leds: 1
rgb_order: RGB
effects:
- pulse:
name: "Streaming"
min_brightness: 20%
max_brightness: 100%
- strobe:
name: "Ringing"
colors:
- state: true
brightness: 100%
red: 100%
green: 0%
blue: 0%
duration: 250ms
- state: false
duration: 250ms
# ==============================================================================
# TEXT SENSORS
# ==============================================================================
text_sensor:
# Subscribe to HA's centralized contacts sensor
- platform: homeassistant
id: ha_active_devices
entity_id: sensor.intercom_active_devices
on_value:
- intercom_api.set_contacts:
id: intercom
contacts_csv: !lambda 'return x;'
# ==============================================================================
# DIAGNOSTICS
# ==============================================================================
sensor:
- platform: wifi_signal
name: "WiFi Signal"
update_interval: 60s
- platform: uptime
name: "Uptime"
update_interval: 60s
- platform: internal_temperature
name: "CPU Temperature"
update_interval: 60s
It works somehow but calls get radom stopped with this messeage in log:
[W][intercom_api:1350][intercom_srv]: Payload incomplete: 1436/2048
Any Idea how not to end the complete call if that happens?
Im on version 2.0.2
Hi, I have a very similar board. I’ll try to run some tests with that. So, based on my instincts, it’ll be a pain to get them to coexist. I was hoping the cams would work better with the S3, but even in my tests without any audio components, the ESP seems to be struggling. I lowered the resolution, took a while to update, but I have the feeling that managing the cams for an ESP is a significant effort, but I’m optimistic. Last night we made significant progress on i2s audio duplex and AEC. I haven’t published anything yet because I still have to finish the tests properly, but everything suggests that in the next versions we’ll have i2s audio duplex working with MWW and voice assistant with AEC that cleans up the speaker output. You can say “ok nabu” while the TTS is speaking; in my tests, this is already working. Voice assistant is also working even during calls between ESPs; the TTS output is suppressed by AEC and you can’t hear it during the call; you only hear the other person’s voice. One day it will be possible to say ok nabu, call kitchen, (stream full duplex), ok nabu, hangup… I2s audio duplex was born as a component to support intercom but it is becoming an audio hub with noise suppression, many roads are opening up going forward.
That’s really odd. Are you sure you have the tip of the main branch?
You need at least commit a26edc74 “fix: audio task lifecycle, stream limits, ESP32-P4 AEC support”. If you have “v2.0.2: code style refactor, TCP timeout, xiaozhi display”, that’s broken. Could the older version still be in ESPHome’s cache?
Hi, my observations were that it often worked without any problems and then suddenly dropped the connection with Payload incomplete.
Next call worked over minutes with video without any issues.
In some cases, the ESP32-S3 Cam also had connection problems or poor Wi-Fi. And I guess there could be a relation.
But its only working nearly reliable if I reduces resolution of the Cam like you also said.
The upcomming changes sounds very promission !
I gues next week that P4 Board with Mic and Display will arrive.
Maybe its working better there.
Clean build cache and refresh set to 0s in external_components so it always pulls from github every build
I’ll give it another go tomorrow with a fresh mind
I was finally able to use it
The main issue seems to be with the resampler component
Point directly to the media player, it worked (but still had to declare num_channels and sample_rate explicitly)
Besides being too little sensitive (will play with the gain and others) and the initial wake word detection (probably my yaml), the mic detected the wake word mid reply just fine
Still unable to stream music from music assistant with the same errors
In the device:
In MA:
Seems some inherited property isn’t being declared down the pipeline
Won’t have time to work on it until Thursday, but its noted
I started to play around with the jc4880p443 but getting stuck in that compiler error:
In file included from src/esphome/core/component.h:9,
from src/esphome/components/intercom_api/intercom_api.h:5,
from src/esphome/components/intercom_api/intercom_api.cpp:1:
src/esphome/components/intercom_api/intercom_api.cpp: In member function 'void esphome::intercom_api::IntercomApi::publish_entity_states()':
src/esphome/components/intercom_api/intercom_api.cpp:263:53: error: 'class esphome::intercom_api::IntercomApi' has no member named 'aec_enabled_'
263 | this->auto_answer_ ? "ON" : "OFF", this->aec_enabled_ ? "ON" : "OFF");
| ^~~~~~~~~~~~
src/esphome/core/log.h:109:99: note: in definition of macro 'esph_log_i'
109 | ::esphome::esp_log_printf_(ESPHOME_LOG_LEVEL_INFO, tag, __LINE__, ESPHOME_LOG_FORMAT(format), ##__VA_ARGS__)
| ^~~~~~~~~~~
src/esphome/components/intercom_api/intercom_api.cpp:261:3: note: in expansion of macro 'ESP_LOGI'
261 | ESP_LOGI(TAG, "Entity states synced (vol=%.0f%%, mic=%.1fdB, auto=%s, aec=%s)",
| ^~~~~~~~
Compiling .pioenvs/jc4880p443/src/esphome/components/safe_mode/safe_mode.cpp.o
*** [.pioenvs/jc4880p443/src/esphome/components/intercom_api/intercom_api.cpp.o] Error 1
I tried with that esphome yaml
esphome:
name: "jc4880p443"
friendly_name: JC4880P443
logger:
level: DEBUG
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
fast_connect: true
post_connect_roaming: false
web_server:
port: 80
ota:
- platform: esphome
api:
on_client_connected:
- globals.set:
id: homeassistant_ip
value: !lambda return client_address;
font:
- file: "gfonts://Montserrat"
id: montserrat_28
size: 28
output:
- id: gpio_backlight_pwm
platform: ledc
pin: 23
light:
- id: backlight
name: Backlight
platform: monochromatic
output: gpio_backlight_pwm
restore_mode: ALWAYS_ON
switch:
- platform: restart
name: Restart
binary_sensor:
- platform: status
name: Status
number:
- platform: template
name: Screen timeout
optimistic: true
id: display_timeout
unit_of_measurement: "m"
initial_value: 5 #minutes
restore_value: true
min_value: 0 #0 is no timeout
max_value: 99
step: 1
mode: box
sensor:
- id: wifi_signal_db
name: WiFi Signal
platform: wifi_signal
update_interval: 60s
entity_category: diagnostic
- id: wifi_signal_strength
name: WiFi Strength
platform: copy
source_id: wifi_signal_db
filters:
- lambda: return min(max(2 * (x + 100.0), 0.0), 100.0);
unit_of_measurement: "%"
entity_category: diagnostic
text_sensor:
- platform: wifi_info
ip_address:
name: IP Address
entity_category: diagnostic
ssid:
name: Connected SSID
entity_category: diagnostic
mac_address:
name: Mac Address
entity_category: diagnostic
globals:
- id: homeassistant_ip
type: std::string
- id: backlight_brightness_level
type: float
restore_value: yes
initial_value: '0.5' # 50% default brightness
esp32:
board: esp32-p4-evboard
#cpu_frequency: 360MHz
flash_size: 16MB
framework:
type: esp-idf
sdkconfig_options:
CONFIG_LWIP_MAX_SOCKETS: "16"
advanced:
enable_idf_experimental_features: yes
# ==============================================================================
# EXTERNAL COMPONENTS
# ==============================================================================
external_components:
- source:
type: local
path: esphome_components
components: [intercom_api]
# Intercom API - Simple mode (browser only)
intercom_api:
id: intercom
mode: full
microphone: es8311_mic
speaker: es8311_hardware_out
display:
- platform: mipi_dsi
id: device_display
model: JC4880P443
byte_order: little_endian
rotation: 90
lambda: |-
it.fill(Color::BLACK);
it.print(340, 100, id(montserrat_28), Color(0,255,0), TextAlign::LEFT, "Hello World1");
it.print(340, 200, id(montserrat_28), Color::WHITE, TextAlign::LEFT, "Hello World2");
it.print(340, 300, id(montserrat_28), Color(255,0,0), TextAlign::LEFT, "Hello World3");
touchscreen:
platform: gt911
i2c_id: i2c_bus
id: device_touchscreen
reset_pin: GPIO3
update_interval: 100ms
transform: #This is for 90 degree display rotation
swap_xy: true
mirror_x: false
mirror_y: true
on_update:
then:
- lambda: |-
if (touches.size() > 0) {
auto touch = touches[0];
ESP_LOGI("TOUCH", "X=%d Y=%d", touch.x, touch.y);
}
esp_ldo:
- channel: 3
voltage: 2.5V
psram:
mode: hex
speed: 200MHz
preferences:
flash_write_interval: 5min
esp32_hosted:
variant: ESP32C6
reset_pin: GPIO54
cmd_pin: GPIO19
clk_pin: GPIO18
d0_pin: GPIO14
d1_pin: GPIO15
d2_pin: GPIO16
d3_pin: GPIO17
active_high: true
i2c:
id: i2c_bus
sda: 7
scl: 8
scan: false
frequency: 400kHz
# Audio section
audio_dac:
- platform: es8311
id: esp7311_dac
address: 0x18
i2c_id: i2c_bus
i2s_audio:
- id: i2s_bus
i2s_lrclk_pin: GPIO10 # WS / LRCK
i2s_bclk_pin: GPIO12 # BCLK
i2s_mclk_pin: GPIO13 # ES8311 usually requires a Master Clock
microphone:
- platform: i2s_audio
id: es8311_mic
i2s_audio_id: i2s_bus
i2s_din_pin: GPIO48 # Data In from the Codec
adc_type: external
pdm: false # ES8311 uses standard I2S, not PDM
channel: left
speaker:
- platform: i2s_audio
id: es8311_hardware_out
i2s_audio_id: i2s_bus
i2s_dout_pin: GPIO9
dac_type: external
audio_dac: esp7311_dac
- platform: mixer
id: audio_mixer
output_speaker: es8311_hardware_out
source_speakers:
- id: spk_announcement
- id: spk_media
media_player:
- platform: speaker
name: "XL Speaker"
id: xl_media_player
# This section is now mandatory for the 'speaker' platform
announcement_pipeline:
speaker: spk_announcement
media_pipeline:
speaker: spk_media
how you guys with the P4 got it working?
[Edit]
Partially working, noticed i wasnt on newest code with P4 support.
Still struggeling with speaker now ![]()
Take a look to the new release.