Hi everyone,
Disclaimer: I’m not an expert on XMOS or audio DSP, but I did try hard to run experiments and debug this issue, with help from Claude Code. If I’ve misunderstood something, please correct me!
I’m working on a custom firmware for the Home Assistant Voice Preview Edition (VPE) device, and I’ve hit a wall trying to get the XMOS acoustic echo cancellation (AEC) to work. I’d really appreciate any insights from those who have experience with this hardware.
Background
I’m building a voice assistant that uses LiveKit for real-time communication instead of the standard Home Assistant voice pipeline. My goal is to enable full-duplex conversation - the ability to speak while the assistant is talking (barge-in capability).
After studying the official ESPHome reference implementation carefully, I realized it doesn’t actually use the XMOS AEC. Instead, it uses temporal separation (half-duplex):
| Approach | ESPHome Reference |
|---|---|
| Full-duplex with AEC | No |
| Half-duplex (don’t listen while speaking) | Yes |
| Barge-in with new commands | No |
| “Stop” interrupt during TTS | Yes (special wake word) |
The voice assistant state machine ensures STT is not running while TTS is playing:
Wake Word → Listen → Process → Speak → (done) → Wake Word
↑ │
└───── NOT simultaneous ────┘
This works fine for their use case, but I need true AEC for full-duplex operation.
My Setup
Hardware
- Voice Preview Edition device
- XMOS XU316 chip (running voice processing firmware v1.3.1)
- TI AIC3204 DAC (I2C address 0x18)
- ESP32-S3 as controller
Audio Path
Input: Physical Mics → XMOS (AEC→IC→NS→AGC) → I2S RX → ESP32
↑
│ AEC Reference (?)
│
Output: ESP32 → I2S TX → XMOS → AIC3204 DAC → Amp → Speaker
My Implementation
I had a custom ESP-IDF firmware (not ESPHome) that:
- Initializes I2C bus for XMOS and DAC control
- Resets XMOS via GPIO4 (HIGH→LOW), waits 3 seconds for boot
- Configures XMOS pipeline stages via I2C:
- Channel 0: AGC (stage 4)
- Channel 1: NS (stage 3)
- Initializes AIC3204 DAC with standard configuration
- Sets up I2S:
- TX (to speaker): 48kHz, 32-bit stereo, slave mode, GPIO 7/8/10
- RX (from mic): 16kHz, 32-bit stereo, slave mode, GPIO 13/14/15
- Enables amplifier via GPIO38
This matches the ESPHome reference exactly - same firmware version, same pipeline stages, same I2S configuration, same GPIOs.
The Problem
AEC is not working. When I play audio through the speaker, the microphone picks it up and it passes through the entire XMOS pipeline without being canceled.
Experiments I’ve Tried
Test 1: Pipeline Stage Analysis
I created a test module that:
- Reconfigures XMOS output channels to different pipeline stages
- Plays a 3-second speech sample through the speaker
- Captures stereo audio from XMOS and streams via UDP
- Analyzes in Audacity
Result: With both channels set to AGC, both clearly contain the speaker audio. The echo is NOT being canceled.
Test 2: Pre-AGC Stage Levels
Tested NS (stage 3) vs AGC (stage 4) output.
Result: Pre-AGC stages output at very low levels, but when amplified in Audacity, they contain the SAME uncanceled speaker audio. AGC just amplifies it - AEC is not removing it.
Test 3: Sample Rate Matching
Changed mic I2S from 16kHz to 48kHz to match speaker I2S, in case XMOS needed matching rates for AEC reference routing.
Result: No effect. Reverted.
Test 4: Init Order Changes
Tried initializing I2S TX before XMOS reset (like ESPHome does with its priority system), hoping XMOS needs to see active I2S data during boot.
Result: Broke audio playback. I2S TX in slave mode doesn’t work without XMOS providing clocks.
Test 5: I2S TX Restart
Added a restart of I2S TX channel after full initialization.
Result: No effect.
Test 6: XMOS I2C Interface
Verified I can communicate with XMOS via I2C:
- Successfully read firmware version (servicer 240)
- Successfully read/write pipeline stages (servicer 241)
- VNR readings work
Result: Basic I2C communication works, but there’s no AEC-specific control interface exposed.
Test 7: VNR Monitoring
Read Voice-to-Noise Ratio before and after playing audio.
Result: VNR increased from 1 to 16 during playback - XMOS IS detecting signal activity. But this “voice” is the speaker audio being picked up by the mic, not being canceled.
What I’ve Verified Matches ESPHome
| Parameter | My Implementation | ESPHome Reference |
|---|---|---|
| XMOS Firmware | 1.3.1 | 1.3.1 |
| Pipeline ch0 | AGC (4) | AGC (4) |
| Pipeline ch1 | NS (3) | NS (3) |
| I2S TX rate | 48kHz | 48kHz |
| I2S TX format | 32-bit stereo, slave | 32-bit stereo, slave |
| I2S TX GPIOs | 7 (BCLK), 8 (WS), 10 (DOUT) | Same |
| I2S RX rate | 16kHz | 16kHz |
| I2S RX format | 32-bit stereo, slave | 32-bit stereo, slave |
| I2S RX GPIOs | 13 (BCLK), 14 (WS), 15 (DIN) | Same |
| XMOS I2C addr | 0x42 | 0x42 |
| DAC I2C addr | 0x18 | 0x18 |
Questions for the Community
-
Is AEC actually enabled in the XMOS firmware on VPE? Since ESPHome doesn’t use it, maybe it’s disabled or not configured by default?
-
Are there additional XMOS registers that need to be configured to enable AEC? The ESPHome code only writes pipeline stage registers - is there something else needed?
-
Is there something special about how the I2S TX reference signal needs to be routed internally in XMOS? The audio reaches the DAC fine, but maybe it’s not being routed to the AEC module?
-
Has anyone successfully used XMOS AEC on this device? I’d love to see working code or configuration.
My Hypothesis (After Analyzing XMOS Firmware Source with Claude Code)
After reading through the XMOS sln_voice firmware source code, here’s what I suspect might be happening:
-
AEC is enabled by default - The
appconfAUDIO_PIPELINE_SKIP_AECflag is0inapp_conf.h, so AEC should be running. -
Only two I2C servicers exist - Resource ID 0xF0 (240) for DFU and 0xF1 (241) for configuration. There is no runtime AEC control interface exposed via I2C - you can’t enable/disable or tune AEC at runtime.
-
AEC reference routing might be the issue - The firmware appears to expect the AEC reference signal via a separate I2S INPUT stream, not by internally tapping its own output to the DAC. From
src/ffva/src/main.c:
if (aec_ref_source == appconfAEC_REF_I2S) {
// Reference comes from I2S INPUT on Tile 1
}
If this is correct, AEC would require the VPE hardware to route the speaker audio BACK to XMOS as an I2S input - which may not exist on this board.
If this hypothesis is correct, it would explain why ESPHome uses half-duplex instead of AEC - the hardware simply doesn’t support it.
Can anyone confirm or deny this? Does VPE have an internal loopback for AEC reference, or is the speaker I2S routed directly to the DAC?
Summary
I’ve spent significant time trying to get AEC working, but no luck. The XMOS chip responds to commands, the audio path works (speaker plays, mic captures), but echo cancellation simply doesn’t happen.
Any help, pointers to documentation, or suggestions would be greatly appreciated. I’m happy to run additional tests or share more details about my implementation.
Thanks in advance!