ESPHome / Voice Assistant on S3-devkitc-1

Is it possible to get a voice assistant working on an ESP32-S3-devkitc-1?
I also tried looking at the code for the s3-box on github, but it doesn’t say what pins to use for speaker/mic/i2s.

I tried to compile but I’m getting:

Compiling .pioenvs/kitchen-va/src/main.o
Compiling .pioenvs/kitchen-va/components/audio_board/lyrat_v4_3/board_pins_config.o
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_i2c_pins':
components/audio_board/lyrat_v4_3/board_pins_config.c:39:34: error: 'GPIO_NUM_23' undeclared (first use in this function); did you mean 'GPIO_NUM_43'?
         i2c_config->scl_io_num = GPIO_NUM_23;
                                  ^~~~~~~~~~~
                                  GPIO_NUM_43
components/audio_board/lyrat_v4_3/board_pins_config.c:39:34: note: each undeclared identifier is reported only once for each function it appears in
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_i2s_pins':
components/audio_board/lyrat_v4_3/board_pins_config.c:54:33: error: 'GPIO_NUM_25' undeclared (first use in this function); did you mean 'GPIO_NUM_45'?
         i2s_config->ws_io_num = GPIO_NUM_25;
                                 ^~~~~~~~~~~
                                 GPIO_NUM_45
In file included from /data/cache/platformio/packages/framework-espidf/components/esp_rom/include/esp32s3/rom/ets_sys.h:19,
                 from /data/cache/platformio/packages/framework-espidf/components/log/include/esp_log.h:19,
                 from components/audio_board/lyrat_v4_3/board_pins_config.c:25:
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'i2s_mclk_gpio_select':
components/audio_board/lyrat_v4_3/board_pins_config.c:95:53: error: 'FUNC_GPIO0_CLK_OUT1' undeclared (first use in this function); did you mean 'FUNC_GPIO20_CLK_OUT1'?
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_GPIO0_U, FUNC_GPIO0_CLK_OUT1);
                                                     ^~~~~~~~~~~~~~~~~~~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/soc.h:136:45: note: in definition of macro 'REG_WRITE'
             (*(volatile uint32_t *)(_r)) = (_v);                                                                       \
                                             ^~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/io_mux_reg.h:93:46: note: in expansion of macro 'REG_SET_FIELD'
 #define PIN_FUNC_SELECT(PIN_NAME, FUNC)      REG_SET_FIELD(PIN_NAME, MCU_SEL, FUNC)
                                              ^~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:95:13: note: in expansion of macro 'PIN_FUNC_SELECT'
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_GPIO0_U, FUNC_GPIO0_CLK_OUT1);
             ^~~~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:98:53: error: 'FUNC_U0TXD_CLK_OUT3' undeclared (first use in this function); did you mean 'FUNC_U0TXD_CLK_OUT1'?
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_U0TXD_U, FUNC_U0TXD_CLK_OUT3);
                                                     ^~~~~~~~~~~~~~~~~~~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/soc.h:136:45: note: in definition of macro 'REG_WRITE'
             (*(volatile uint32_t *)(_r)) = (_v);                                                                       \
                                             ^~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/io_mux_reg.h:93:46: note: in expansion of macro 'REG_SET_FIELD'
 #define PIN_FUNC_SELECT(PIN_NAME, FUNC)      REG_SET_FIELD(PIN_NAME, MCU_SEL, FUNC)
                                              ^~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:98:13: note: in expansion of macro 'PIN_FUNC_SELECT'
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_U0TXD_U, FUNC_U0TXD_CLK_OUT3);
             ^~~~~~~~~~~~~~~
In file included from components/audio_board/lyrat_v4_3/board.h:29,
                 from components/audio_board/lyrat_v4_3/board_pins_config.c:28:
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_green_led_gpio':
components/audio_board/lyrat_v4_3/board_def.h:42:35: error: 'GPIO_NUM_22' undeclared (first use in this function); did you mean 'GPIO_NUM_42'?
 #define GREEN_LED_GPIO            GPIO_NUM_22
                                   ^~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:186:12: note: in expansion of macro 'GREEN_LED_GPIO'
     return GREEN_LED_GPIO;
            ^~~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:187:1: error: control reaches end of non-void function [-Werror=return-type]
 }
 ^
cc1: some warnings being treated as errors
*** [.pioenvs/kitchen-va/components/audio_board/lyrat_v4_3/board_pins_config.o] Error 1

That code is specifically for the ESP32-S3-BOX

How come code for the LyraT is being compiled if you’re using a “plain” ESP32 devkit? Please show your YAML.

I’m using code from here with the S3 compile option.

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32_S3_KORVO2_V3_BOARD: y

Adding the sdkconfig_option was the only way to get to compile. It does’t actually connect to wifi though.

I was eventally able to get this to work with the following options. Currently in testing with micro_wake_word, and so far on-device wake word seems to be as good as streaming to Open Wake Word

esphome:
  name: s3test
  friendly_name: S3Test
  platformio_options:
    board_build.flash_mode: dio
              
esp32:
  board:   esp32-s3-devkitc-1
  variant: esp32s3
  framework:
    type:  esp-idf
    components:
      - name:    esphome_board
        source:  github://jesserockz/esphome-esp-adf-board@main
        refresh: 0s
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB:      "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B:  "y"
      CONFIG_AUDIO_BOARD_CUSTOM:           "y"

psram:
  mode: octal
  speed: 80MHz

Can you share the full yaml configuration, I can’t get it to work. Thank you

esp32-s3-zero - replace the values

Config has been derived from here

substitutions:
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready 
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'  
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

  #pins
  i2s_out_lrclk_pin: GPIO11 # LRC on Max98357
  i2s_out_bclk_pin: GPIO9 # BCLK on Max98357
  i2s_in_lrclk_pin: GPIO3 # WS on INMP441
  i2s_in_bclk_pin: GPIO2 # SLK on INMP441
  light_pin: GPIO21 # on-board LED
  speaker_pin: GPIO8 # DIN on Max98357
  mic_pin: GPIO4 # SD on INMP441

  ip: <redacted>
  dns: <redacted>

esphome:
  name: vatest
  friendly_name: VATest
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
      priority: 600
      then: 
        # Run the script to refresh the LED status
        - script.execute: control_led
        # - output.turn_off: set_low_speaker
        # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
        - delay: 30s
        - if:
            condition:
              lambda: return id(init_in_progress);
            then:
              - lambda: id(init_in_progress) = false;
              - script.execute: control_led

esp32:
  board: esp32-s3-devkitc-1
  variant: ESP32S3
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB:      "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B:  "y"

psram:
  mode: quad
  speed: 80MHz

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
    components: 
      - adf_pipeline
      - i2s_audio
    refresh: 0s

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: ${speaker_pin}

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: ${mic_pin}
    pdm: false
    channel: left
    sample_rate: 16000
    bits_per_sample: 32bit
    

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    gain_log2: 3
    keep_pipeline_alive: false
    pipeline:
      - adf_i2s_in
      - self

media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: s3-dev_media_player
    keep_pipeline_alive: false
    internal: false
    pipeline:
      - self
      - adf_i2s_out

# This is our two i2s buses with the correct pins. 
# You can refer to the wirinng diagram of our voice assistant for more details 
i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: ${i2s_out_lrclk_pin}
    i2s_bclk_pin: ${i2s_out_bclk_pin}
  - id: i2s_in
    i2s_lrclk_pin: ${i2s_in_lrclk_pin}
    i2s_bclk_pin: ${i2s_in_bclk_pin}

# This is the declaration of our voice assistant
# It references the microphone and speaker declared above.
voice_assistant:
  id: va
  microphone: adf_microphone
  media_player: adf_media_player
  
  # use_wake_word: true
  
  # This is how I personally tune my voice assistant, you may have to test a few values for the 4 parameters above
  noise_suppression_level: 4 #4
  auto_gain: 31dBFS # 31dBFS
  volume_multiplier: 8 # 8.0
  # vad_threshold: 3

  # When the voice assistant connects to HA:
  # Set init_in_progress to false (Initialization is over).
  # If the switch is on, start the voice assistant
  # In any case: Set the correct phase and run the script to refresh the LED status
  on_client_connected:
    - lambda: id(init_in_progress) = false; 
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - script.execute: control_led

  # When the voice assistant disconnects to HA: 
  # Stop the voice assistant
  # Set the correct phase and run the script to refresh the LED status
  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};  
    - micro_wake_word.stop
    - script.execute: control_led

  # When the voice assistant starts to listen: Set the correct phase and run the script to refresh the LED status
  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: control_led

  # When the voice assistant starts to think: Set the correct phase and run the script to refresh the LED status
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: control_led

  # When the voice assistant starts to reply: Set the correct phase and run the script to refresh the LED status
  # on_tts_stream_start:
  on_tts_start: 
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: control_led
  
  on_end:
    - if:
        condition:
          - switch.is_on: use_wake_word
        then:
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start
  # When the voice assistant finished to reply: Set the correct phase and run the script to refresh the LED status
  # on_tts_stream_end:
  # on_stt_end: 
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: control_led

  # When the voice assistant encounters an error: 
  # Set the error phase and run the script to refresh the LED status
  # Wait 1 second and set the correct phase (idle or muted depending on the state of the switch) and run the script to refresh the LED status 
  on_error:
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};  
          - script.execute: control_led
          - delay: 1s
          - if:
              condition:
                switch.is_on: use_wake_word
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: control_led

# Enable logging
logger:
  # level: VERBOSE
  # logs:
  #   micro_wake_word: DEBUG

ota:
  password: "<redacted>"
  # If the device connects, or disconnects, to Home Assistant: Run the script to refresh the LED status

# Enable Home Assistant API
api:
  encryption:
    key: <redacted>"
  on_client_connected:
    - script.execute: control_led
  on_client_disconnected:
    - script.execute: control_led

wifi:
  ssid: !secret tp_wifi_ssid
  password: !secret tp_wifi_password
  power_save_mode: none

  manual_ip:
    static_ip: ${ip}
    gateway: <redacted>
    subnet: 255.255.255.0
    dns1: ${dns}
  # If the device connects, or disconnects, to the Wifi: Run the script to refresh the LED status
  on_connect:
    - script.execute: control_led
  on_disconnect:
    - script.execute: control_led

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}


sensor:
  - platform: wifi_signal
    name: "WiFi Signal Sensor"
    update_interval: 120s

light:
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: ${light_pin}
    num_leds: 1
    rmt_channel: 0
    chipset: WS2812
    name: "Status LED"
    id: led
    disabled_by_default: True
    # entity_category: diagnostic
    icon: mdi:led-on
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

script:
  # Master script controlling the LED, based on different conditions: initialization in progress, wifi and API connected, and the current voice assistant phase.
  # For the sake of simplicity and re-usability, the script calls child scripts defined below.
  # This script will be called every time one of these conditions is changing.
  - id: control_led
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                  wifi.connected:
                then:
                  - if:
                      condition:
                        api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(control_led_voice_assist_listening_phase).execute();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(control_led_voice_assist_thinking_phase).execute();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(control_led_voice_assist_replying_phase).execute();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(control_led_voice_assist_error_phase).execute();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(control_led_voice_assist_muted_phase).execute();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(control_led_voice_assist_not_ready_phase).execute();
                                break;
                              default:
                                id(control_led_voice_assist_idle_phase).execute();
                                break;
                            }
                      else:
                        - script.execute: control_led_no_ha_connection_state
                else:
                  - script.execute: control_led_no_ha_connection_state
          else:
            - script.execute: control_led_init_state


  # Script executed during initialisation: In this example: Turn the LED in green with a slow pulse 🟢
  - id: control_led_init_state
    then:
      - light.turn_on:
          id: led
          blue: 0%
          red: 0%
          green: 100%
          effect: "Fast Pulse"
  

  # Script executed when the device has no connection to Home Assistant: In this example: Turn off the LED 
  - id: control_led_no_ha_connection_state
    then:
      - light.turn_off:
          id: led  


  # Script executed when the voice assistant is idle (waiting for a wake word): In this example: Turn the LED in white with 20% of brightness ⚪
  - id: control_led_voice_assist_idle_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 100%
          green: 100%
          brightness: 20%
          effect: "none"


  # Script executed when the voice assistant is listening to a command: In this example: Turn the LED in blue with a slow pulse 🔵
  - id: control_led_voice_assist_listening_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 0%
          green: 0%
          effect: "Slow Pulse"


  # Script executed when the voice assistant is processing the command: In this example: Turn the LED in blue with a fast pulse 🔵         
  - id: control_led_voice_assist_thinking_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 0%
          green: 0%
          effect: "Fast Pulse"


  # Script executed when the voice assistant is replying to a command: In this example: Turn the LED in blue, solid (no pulse) 🔵         
  - id: control_led_voice_assist_replying_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 0%
          green: 0%
          brightness: 100%
          effect: "none"


  # Script executed when the voice assistant encounters an error: In this example: Turn the LED in red, solid (no pulse) 🔴        
  - id: control_led_voice_assist_error_phase
    then:
      - light.turn_on:
          id: led
          blue: 0%
          red: 100%
          green: 0%
          brightness: 100%
          effect: "none"


  # Script executed when the voice assistant is muted: In this example: Turn off the LED 
  - id: control_led_voice_assist_muted_phase
    then:
      - light.turn_off:
          id: led


  # Script executed when the voice assistant is not ready: In this example: Turn off the LED 
  - id: control_led_voice_assist_not_ready_phase
    then:
      - light.turn_off:
          id: led

# Declaration of the switch that will be used to turn on or off (mute) or voice assistant
button: 
#system     
  - platform: restart
    name: Restart
    id: restart_switch
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:      
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - micro_wake_word.start
            - script.execute: control_led
    # When the switch is turned off (on Home Assistant):
    # Stop the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:      
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: control_led


micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
    then:
      - media_player.stop:
      # - media_player.play_media: https://steriku.duckdns.org:8123/local/va/din-ding.mp3
      - voice_assistant.start:
1 Like

I can only use it once and cannot continue to wake up. The log shows the following content:
[D] [esp.adf_pipeline: 302]: State changed from STOPPING to UNITIALIZED
Detailed logs:

[15:04:26][D][media_player:061]: 's3-dev_media_player' - Setting
[15:04:26][D][media_player:068]:   Media URL: http://192.168.3.242:8123/api/tts_proxy/2bea3af15933f57a7d0e53bf47780474ebef10f5_zh-cn_6c2e43c6c1_edge_tts.mp3
[15:04:26][D][media_player:074]:  Announcement: yes
[15:04:26][D][adf_media_player:030]: Got control call in state 1
[15:04:26][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[15:04:26][V][esp-idf:000]: I (25310) MP3_DECODER: MP3 init

[15:04:26][V][esp-idf:000]: I (25322) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4

[15:04:26][D][i2s_audio:072]: Installing driver : yes
[15:04:26][D][esp_adf_pipeline:358]: pipeline tag 0, http
[15:04:26][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[15:04:26][D][esp_adf_pipeline:358]: pipeline tag 2, i2s_out
[15:04:26][V][esp-idf:000]: I (25343) AUDIO_PIPELINE: link el->rb, el:0x3d81c0f8, tag:http, rb:0x3d81c64c

[15:04:26][V][esp-idf:000]: I (25351) AUDIO_PIPELINE: link el->rb, el:0x3d81c2e8, tag:decoder, rb:0x3d81d68c

[15:04:26][D][esp_adf_pipeline:370]: Setting up event listener.
[15:04:26][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[15:04:26][I][adf_media_player:135]: got new pipeline state: 1
[15:04:26][D][adf_i2s_out:127]: Set final i2s settings: 16000
[15:04:26][W][component:237]: Component voice_assistant took a long time for an operation (124 ms).
[15:04:26][W][component:238]: Components should block for at most 30 ms.
[15:04:26][D][voice_assistant:625]: Event Type: 2
[15:04:26][D][voice_assistant:715]: Assist Pipeline ended
[15:04:26][V][esp-idf:000]: I (25423) AUDIO_THREAD: The http task allocate stack on external memory

[15:04:26][V][esp-idf:000]: I (25426) AUDIO_ELEMENT: [http-0x3d81c0f8] Element task created

[15:04:26][V][esp-idf:000]: I (25433) AUDIO_THREAD: The decoder task allocate stack on external memory

[15:04:26][V][esp-idf:000]: I (25443) AUDIO_ELEMENT: [decoder-0x3d81c2e8] Element task created

[15:04:26][V][esp-idf:000][http]: I (25453) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[15:04:26][V][esp-idf:000][decoder]: I (25464) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[15:04:26][D][esp_aud:000][decoStreamer status: 2
[15:04:26][D][esp_audio_sources:098]: decoder status: 2
[15:04:26][W][component:237]: Component adf_pipeline.media_player took a long time for an operation (61 ms).
[15:04:26][W][component:237]: Component adf_pipeline.media_player took a long time for an operation (61 ms).
[15:04:26][W][component:238]: Components should block for at most 30 ms.
[15:04:26][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[15:04:26][D][adf_i2s_out:114]: update i2s clk settings: rate:24000 bits:16 ch:1
[15:04:26][V][esp-idf:000]: I (25519) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[15:04:26][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:04:26][V][esp-idf:000][decoder]: W (25536) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[15:04:26][V][esp-idf:000][decoder]: W (25546) MP3_DECODER: output aborted -3

[15:04:26][V][esp-idf:000][decoder]: I (25557) MP3_DECODER: Closed

[15:04:26][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[15:04:26][I][adf_media_player:135]: got new pipeline state: 2
[15:04:26][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:04:26][V][esp-idf:000]: I (25588) AUDIO_ELEMENT: [i2s_out-0x3d81c4b4] Element task created

[15:04:26][V][esp-idf:000]: I (25597) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8424887 Bytes, Inter:174452 Bytes, Dram:174452 Bytes


[15:04:26][V][esp-idf:000][http]: I (25607) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[15:04:26][V][esp-idf:000][decoder]: I (25618) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[15:04:26][V][esp-idf:000][i2s_out]: I (25628) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1

[15:04:26][V][esp-idf:000][i2s_out]: I (25639) I2S_STREAM: AUDIO_STREAM_WRITER

[15:04:26][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[15:04:26][D][esp_adf_pipeline:131]: Check element [http] status, 3
[15:04:26][D][esp_adf_pipeline:131]: Check element [decoder] status, 2
[15:04:27][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[15:04:27][D][esp_adf_pipeline:131]: Check element [http] status, 3
[15:04:27][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[15:04:27][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[15:04:27][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[15:04:27][I][adf_media_player:135]: got new pipeline state: 3
[15:04:27][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:04:27][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[15:04:27][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:04:27][V][esp-idf:000][http]: W (26660) HTTP_STREAM: No more data,errno:0, total_bytes:14221, rlen = 0

[15:04:27][V][esp-idf:000][http]: I (26663) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[15:04:27][I][esp_adf_pipeline:214]: [ http ] status: 15
[15:04:27][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[15:04:27][I][adf_media_player:135]: got new pipeline state: 4
[15:04:28][V][esp-idf:000][decoder]: I (27363) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[15:04:29][V][esp-idf:000][decoder]: I (27723) MP3_DECODER: Closed

[15:04:29][V][esp-idf:000][i2s_out]: I (27829) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[15:04:29][D][esp_adf_pipeline:400]: Called deinit_all
[15:04:29][V][esp-idf:000]: I (28003) AUDIO_PIPELINE: audio_pipeline_unlinked

[15:04:29][V][esp-idf:000]: W (28005) AUDIO_ELEMENT: [http] Element has not create when AUDIO_ELEMENT_TERMINATE

[15:04:29][V][esp-idf:000]: W (28008) AUDIO_ELEMENT: [decoder] Element has not create when AUDIO_ELEMENT_TERMINATE

[15:04:29][V][esp-idf:000]: W (28010) AUDIO_ELEMENT: [i2s_out] Element has not create when AUDIO_ELEMENT_TERMINATE

[15:04:29][V][esp-idf:000]: I (28025) I2S: DMA queue destroyed

[15:04:29][D][esp_adf_pipeline:302]: State changed from STOPPING to UNINITIALIZED
[15:04:29][I][adf_media_player:135]: got new pipeline state: 0

Hi, I’ve also been working on trying to get this working for a few days now.

I’ve recently had a similar issue to you. It wakes once, responds, but then never picks up another wake-word.

I’ve taken the source that esphome generates and poured over the voice_assistant.cpp implementation. And noticed that voice_assistant does not move itself into the IDLE state (accepting subsequent commands) unless the media_player has completed its announcement.

Now, the media_player object that this config builds does not seem to support announce fully. So it never informs voice_assistant that it’s completed.

In voice_assistant.cpp, if I change .set_announce(true) to false. And I changed the MEDIA_PLAYER_STATE check from ANNOUNCING to PLAYING

This way, the voice_assistant pipeline can correctly determine if a response has finished playing, and then set itself to IDLE, expecting another command.

I’m still working on this, I’ll update here (and probably make a seperate post) once I get an adapted solution working well

Mmh interesting…
I’ve also encountered this issue and I noticed that when the condition is changed to not mediaplayer.is_playing it does go back to idle, indicating that the on_end event had fired and the voice assistant is idle?

 on_end:
    - if:
        condition:
          - switch.is_on: use_wake_word
        then:
          - wait_until:
              not:
                media_player.is_playing:
          - micro_wake_word.start

After this another wake word can be detected and it does, but unfortunately the voice assist does not (re)start listening and the device basically halts after detecting the wake word.

Ah yes, I also had to make this change. Using media_player.is_playing in combination with my above changes, I have a working solution

(but still unstable with frequent crashes, we’ll get there eventually!)

Ah that clears things up, I must say I’ve reverted to the normal microphone and speaker config as I don’t need the mediaplayer component for now:

microphone:
  - platform: i2s_audio
    id: onboard_microphone
    channel: left
    i2s_din_pin: ${mic_pin}
    i2s_audio_id: i2s_in
    adc_type: external
    pdm: false

speaker:
  - platform: i2s_audio
    id: onboard_speaker
    i2s_dout_pin: ${speaker_pin}
    i2s_audio_id: i2s_out
    dac_type: external
    mode: mono

And this made it work reliably, unfortunately this plays the response at max volume which is kind of loud. I’m looking for a solution.

I have this exact issue. I’m trying to build a satellite style device with a media player to use in general and also voice assistant. It does work on the first try, detects wakeword, runs pipeline and executes command. But for the second time it does not unfortunaly. In logs I see that the wakeword is detected but the pipeline is not running.

Has anyone ever found a workaround for this except using speaker instead of media_player?