Custom ESP32-S3 Voice Assistant (INMP441 + MAX98357A): I2S Bus Ownership and Playback Collisions

I think a new ESP32 Voice Assistant discussion is worth revisiting now that the ESPHome Speaker Mixer has become the standard approach.

Following tutorials from @tpage, @lmatter and others, I built my own ESP32-S3-based Voice Assistant using the Speaker Mixer, a microphone, wake word detection, and Voice Assistant.


I repurposed a HAS-useless Xiaomi Gateway :smiley:

The issue I'm facing is that the ESP32-S3 SuperMini only has a single I2S controller available for my setup. The microphone is always active by default, continuously capturing audio for wake word detection. Whenever audio playback starts (WAV files, announcements, TTS responses, or media playback), the microphone should stop and release the I2S bus so the speaker can take ownership of it.

The problem is that I haven't found a reliable way to receive a "pre-playback" event. Triggers such as on_play, on_turn_on, or on_state seem to occur after the speaker pipeline has already started allocating resources, which is too late. At that point, the I2S bus is still owned by the microphone and I get allocation/collision errors.

My ideal flow would be:

  1. Stop wake word detection.
  2. Stop Voice Assistant if running.
  3. Stop microphone capture.
  4. Wait for the I2S RX channel to be released.
  5. Start speaker playback.
  6. When playback finishes, stop the speaker.
  7. Re-enable microphone capture.
  8. Restart wake word detection.

Has anyone successfully implemented this with a shared I2S bus between microphone and speaker on an ESP32-S3 Supermini or similar?

Is there an existing pattern, component hook, or recommended architecture for handling I2S ownership transitions before audio playback begins?

Hardware

Component Model
MCU ESP32-S3 SuperMini
Microphone INMP441 I2S MEMS Microphone
Amplifier MAX98357A I2S DAC/Amplifier
Speaker 4Ξ© / 3W Speaker
RGB LED RGB LED Strip
Button GPIO Push Button

Pinout

Function GPIO
I2S LRCLK / WS GPIO8
I2S BCLK / SCK GPIO9
INMP441 SD (Data Out) GPIO7
MAX98357A DIN (Data In) GPIO10
Push Button GPIO6
RGB LED Red GPIO11
RGB LED Green GPIO12
RGB LED Blue GPIO13

Audio Topology

              ESP32-S3 SuperMini

            Shared I2S Control Bus
              GPIO8 (LRCLK / WS)
              GPIO9 (BCLK  / SCK)

                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                             β”‚
        β–Ό                             β–Ό

   INMP441 Microphone           MAX98357A Amplifier
       GPIO7 (SD)               GPIO10 (DIN)

       I2S RX Data               I2S TX Data

Software Stack

  • ESPHome 2026.5

  • Home Assistant Voice Assistant

  • Speaker Mixer

  • micro_wake_word

  • Shared I2S bus for microphone and speaker

  • Wake word always active while idle

  • Speaker used for:

    • TTS responses
    • Announcement pipeline
    • WAV playback
    • Media playback

I'm including my ESPHome configuration below in case it helps others reproduce the issue.

substitutions:
  device_name: hall-speaker
  friendly_name: Hall Speaker


esphome:
  name: ${device_name}
  friendly_name: ${friendly_name}
  on_boot:
    priority: 600.0
    then:
      - speaker.stop: va_speaker_hw
      - microphone.stop_capture: va_mic

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: ""

  on_client_connected:
    then:
      - light.turn_on:
          id: light1
          brightness: 25%
          red: 0%
          green: 50.9%
          blue: 98.8%
          effect: "Soft Breath"
      - delay: 800ms
      - light.turn_off: light1

ota:
  - platform: esphome
    password: ""

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  power_save_mode: none
  manual_ip:
    static_ip: 192.168.1.51
    gateway: 192.168.1.1
    subnet: 255.255.255.0
    dns1: 1.1.1.1
    dns2: 8.8.8.8

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "${friendly_name} Hotspot"
    password: ""

network:
  enable_high_performance: false   # Avoid aggressive high-performance networking to save RAM

captive_portal:

web_server:

binary_sensor:
  - platform: gpio
    name: "Button 1"
    id: button_1
    pin:
      number: GPIO6
      mode:
        input: true
        pullup: true
      inverted: true
    on_press:
      - if:
          condition:
            switch.is_on: mute_mic_sw
          then:
            - switch.turn_off: mute_mic_sw
          else:
            - switch.turn_on: mute_mic_sw

output:
  - platform: ledc
    id: output_led_red
    pin: GPIO11
    frequency: 1220 Hz

  - platform: ledc
    id: output_led_green
    pin: GPIO12
    frequency: 1220 Hz

  - platform: ledc
    id: output_led_blue
    pin: GPIO13
    frequency: 1220 Hz

light:
  - platform: rgb
    id: light1
    name: "LED bar"
    red: output_led_red
    green: output_led_green
    blue: output_led_blue
    default_transition_length: 500ms
    gamma_correct: 0
    restore_mode: ALWAYS_OFF
    effects:
      - pulse:
          name: "Soft Breath"
          min_brightness: 0%
          max_brightness: 25%
          transition_length:
            on_length: 1s
            off_length: 1500ms
          update_interval: 2s

button:
  - platform: restart
    name: "Reinicio ESP32 Media Player"
    
  - platform: template
    name: "Test Speaker WAV"
    on_press:
      - script.execute: audio_enter_speaker_mode
      - script.wait: audio_enter_speaker_mode
      - media_player.play_media:
          id: va_mediaplayer
          media_url: "http://192.168.1.101:8123/local/beep.wav"
          announcement: true

switch:
  - platform: template
    id: mute_mic_sw
    name: "Mute microphone"
    optimistic: true
    on_turn_on:
      - script.execute: mute_mic

    on_turn_off:
      - script.execute: unmute_mic

script:
  - id: audio_enter_speaker_mode
    mode: single
    then:
      - logger.log: "[AUDIO] Entering speaker mode: stopping wake word, VA and microphone"
      - micro_wake_word.stop:
      - voice_assistant.stop:
      - delay: 300ms
      - microphone.stop_capture: va_mic
      - delay: 500ms # Give the I2S RX side time to release before the speaker starts using TX.

  - id: audio_leave_speaker_mode
    mode: single
    then:
      - logger.log: "[AUDIO] Leaving speaker mode: stopping hardware speaker and restoring microphone if allowed"
      - speaker.stop: va_speaker_hw # Stop the real I2S hardware speaker, not the mixer.
      - delay: 700ms # Give the I2S TX side time to release before restoring RX/microphone.

      - if:
          condition:
            and:
              - switch.is_off: mute_mic_sw
              - not:
                  voice_assistant.is_running:
              - not:
                  media_player.is_playing: va_mediaplayer
          then:
            - logger.log: "[AUDIO] Restoring microphone capture + wake word"
            - microphone.capture: va_mic
            - delay: 250ms
            - micro_wake_word.start:
          else:
            - logger.log: "[AUDIO] Microphone restore skipped: muted, VA running or media still playing"

  - id: mute_mic
    mode: single
    then:
      - logger.log: "[MIC] Muting microphone"
      - micro_wake_word.stop:
      - voice_assistant.stop:
      - delay: 300ms
      - microphone.stop_capture: va_mic
      - light.turn_on:
          id: light1
          red: 100%
          green: 30%
          blue: 18%
          brightness: 20%
          transition_length: 800ms

  - id: unmute_mic
    mode: single
    then:
      - logger.log: "[MIC] Unmuting microphone"
      - light.turn_off:
          id: light1
          transition_length: 300ms

      - script.execute: audio_leave_speaker_mode

i2s_audio:
  - id: i2s
    i2s_lrclk_pin: GPIO8
    i2s_bclk_pin: GPIO9

microphone:
  - platform: i2s_audio
    id: va_mic
    i2s_audio_id: i2s
    adc_type: external
    i2s_din_pin: GPIO7
    channel: left
    pdm: false

speaker:
  - platform: i2s_audio
    id: va_speaker_hw
    i2s_audio_id: i2s
    dac_type: external
    i2s_dout_pin: GPIO10
    channel: mono
    bits_per_sample: 16bit
    sample_rate: 16000

  - platform: mixer
    id: va_speaker
    output_speaker: va_speaker_hw
    source_speakers:
      - id: va_speaker_announcement
      - id: va_speaker_media

media_player:
  - platform: speaker
    id: va_mediaplayer
    name: "Corridor Speaker"
    buffer_size: 50000
    announcement_pipeline:
        speaker: va_speaker_announcement
        format: WAV # Ask Home Assistant to transcode to a low‑cost WAV stream
        #format: FLAC 
        num_channels: 1
        sample_rate: 16000
    media_pipeline:
      speaker: va_speaker_media
      format: WAV
      num_channels: 1
      sample_rate: 16000
    on_turn_on:
      then:
        - logger.log: "[MEDIA] Media Player turned on"
        - script.execute: audio_enter_speaker_mode
        - script.wait: audio_enter_speaker_mode
    on_play:
      then:
        - logger.log: "Media playback started."

    on_idle:
      then:
        - logger.log: "[MEDIA] Playback finished"
        - delay: 5s
        - if:
            condition:
              media_player.is_idle: va_mediaplayer
            then:
              - media_player.turn_off: va_mediaplayer
            else:
              - logger.log: "[MEDIA] Media Player not idle, not turning off"

    on_turn_off:
      then:
        - logger.log: "[MEDIA] Media Player turned off"
        - script.execute: audio_leave_speaker_mode

micro_wake_word:
  id: my_micro_wake_word
  vad:
    model: github://esphome/micro-wake-word-models/models/v2/vad.json
  models:
    - model: github://esphome/micro-wake-word-models/models/v2/hey_jarvis.json
  on_wake_word_detected:
    - micro_wake_word.stop:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
    - light.turn_on:
        id: light1
        red: 0%
        green: 0%
        blue: 100%
        brightness: 30%

voice_assistant:
  id: va
  microphone: va_mic
  media_player: va_mediaplayer
  noise_suppression_level: 2
  volume_multiplier: 3.0
  auto_gain: 31dBFS
  micro_wake_word: my_micro_wake_word
  use_wake_word: true
  on_start:
    - logger.log: "[VA] Starting"
    - micro_wake_word.stop:
    - light.turn_on:
        id: light1
        red: 0%
        green: 100%
        blue: 0%
        brightness: 50%
  on_listening:
    - logger.log: "[VA] Listening"
    - light.turn_on:
        id: light1
        red: 0%
        green: 0%
        blue: 100%
        brightness: 100%
        effect: "Soft Breath"
  on_stt_end:
    then:
      - logger.log: "[VA] STT END"
      - light.turn_off: light1
      - script.execute: audio_enter_speaker_mode
  on_error:
    - logger.log: "[VA] ERROR"
    - delay: 1s
    - script.execute: audio_leave_speaker_mode
  on_end:
    then:
      - logger.log: "[VA] END"
      - light.turn_off: light1
      - wait_until:
          not:
            voice_assistant.is_running:
      - delay: 500ms
      - script.execute: audio_leave_speaker_mode