Voice Assistant doesn't listen to wakeword while playing media

Hi there, i’ve set up a DIY smart speaker with a ESP32-S3 and a pair of audio ADC/DAC which are working quite well for listening to wakewords, playing responses and, recently, playing media after following this example.

Now I can play media such as local files or radio stations on this device as a media device, but while playing, it doesn’t listen to the wake word at all.

I couldn’t find any information about this situation here in the forum so I’m posting this as a new topic, but I’d really appreciate if someone could point me to an existing content in this subject.

Below is my yaml config for the esphome device.

esphome:
  name: voice-assistant-s3
  friendly_name: Voice Assistant S3
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    - light.turn_on:
        id: led_ww
        blue: 100%
        brightness: 60%
        effect: fast pulse

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
   
psram:
  mode: octal  # quad for N8R2 and octal for N16R8
  speed: 80MHz

# Enable logging
logger:
  hardware_uart: UART0

# Enable Home Assistant API
api:
  encryption:
    key: !secret secret_key
 
  on_client_connected:
        then:
          - delay: 50ms
          - light.turn_off: led_ww
          - micro_wake_word.start:
  on_client_disconnected:
        then:
          - voice_assistant.stop: 


ota:
  - platform: esphome

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password


captive_portal:
    
button:
  - platform: restart
    name: "Restart"
    id: but_rest

switch:
  - platform: template
    id: mute
    name: mute
    optimistic: true
    on_turn_on: 
      - micro_wake_word.stop:
      - voice_assistant.stop:
      - light.turn_on:
          id: led_ww           
          red: 100%
          green: 0%
          blue: 0%
          brightness: 60%
          effect: fast pulse 
      - delay: 2s
      - light.turn_off:
          id: led_ww
      - light.turn_on:
          id: led_ww           
          red: 100%
          green: 0%
          blue: 0%
          brightness: 30%
    on_turn_off:
      - micro_wake_word.start:
      - light.turn_on:
          id: led_ww           
          red: 0%
          green: 100%
          blue: 0%
          brightness: 60%
          effect: fast pulse 
      - delay: 2s
      - light.turn_off:
          id: led_ww 
   
light:
  - platform: esp32_rmt_led_strip
    id: led_ww
    rgb_order: GRB
    pin: GPIO48
    num_leds: 1
    #rmt_channel: 0
    chipset: ws2812
    name: "on board light"
    effects:
      - pulse:
      - pulse:
          name: "Fast Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
          min_brightness: 0%
          max_brightness: 100%
          
          
 # Audio and Voice Assistant Config          
i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO3  #WS 
    i2s_bclk_pin: GPIO2 #SCK
  - id: i2s_speaker
    i2s_lrclk_pin: GPIO6  #LRC 
    i2s_bclk_pin: GPIO7 #BLCK

microphone:
  - platform: i2s_audio
    id: va_mic
    adc_type: external
    i2s_din_pin: GPIO4 #SD pin on the INMP441
    channel: left
    pdm: false
    i2s_audio_id: i2s_in
    bits_per_sample: 32 bit
    
speaker:
  - platform: i2s_audio
    id: va_speaker
    i2s_audio_id: i2s_speaker
    dac_type: external
    i2s_dout_pin: GPIO8   #  DIN Pin of the MAX98357A Audio Amplifier
    sample_rate: 16000
  - platform: mixer
    id: mixer_speaker_id
    output_speaker: va_speaker
    source_speakers:
      - id: announcement_spk_mixer_input
      - id: media_spk_mixer_input
  - platform: resampler
    id: media_spk_resampling_input
    output_speaker: media_spk_mixer_input
  - platform: resampler
    id: announcement_spk_resampling_input
    output_speaker: announcement_spk_mixer_input
media_player:
  - platform: speaker
    name: "Nabu Media Player"
    id: speaker_media_player_id
    media_pipeline:
        speaker: media_spk_resampling_input
        num_channels: 2
    announcement_pipeline:
        speaker: announcement_spk_resampling_input
        num_channels: 1

micro_wake_word:
  on_wake_word_detected:
     # then:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
    - light.turn_on:
        id: led_ww           
        red: 30%
        green: 30%
        blue: 70%
        brightness: 60%
        effect: fast pulse 
  models:
    - model: okay_nabu
    
voice_assistant:
  id: va
  microphone: va_mic
  noise_suppression_level: 2.0
  volume_multiplier: 4.0
  speaker: announcement_spk_resampling_input

  on_error:
          - micro_wake_word.start:  
  on_end:
        then:
          - light.turn_off: led_ww
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start: 

Try using the configuration from this thread

Your code looks correct, but apparently there are some nuances in the work of components.