Getting voice assistant to work with whisper/piper/openwakeword

Hi,
I have a esp32-s3-devkitc-1 with MAX98357A and a speaker and an inmp441 mic. The pipeline is working, as I can see from the esp32 logs:

[23:34:49][D][api.connection:1446]: Home Assistant 2025.1.2 (192.168.0.2): Connected successfully
[23:34:50][D][voice_assistant:511]: State changed from IDLE to START_MICROPHONE
[23:34:50][D][voice_assistant:518]: Desired state set to START_PIPELINE
[23:34:50][D][voice_assistant:222]: Starting Microphone
[23:34:51][D][ring_buffer:034]: Created ring buffer with size 16384
[23:34:51][D][voice_assistant:511]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[23:34:51][D][voice_assistant:511]: State changed from STARTING_MICROPHONE to START_PIPELINE
[23:34:51][D][voice_assistant:276]: Requesting start...
[23:34:51][D][voice_assistant:511]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:34:51][D][voice_assistant:533]: Client started, streaming microphone
[23:34:51][D][voice_assistant:511]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:34:51][D][voice_assistant:518]: Desired state set to STREAMING_MICROPHONE
[23:34:51][D][voice_assistant:635]: Event Type: 1
[23:34:51][D][voice_assistant:638]: Assist Pipeline running
[23:34:51][D][voice_assistant:635]: Event Type: 9
[23:35:35][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[23:35:35][D][esp32.preferences:114]: Saving 1 preferences to flash...
[23:35:35][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

In the home assistant logs I see this authentication error:

Logger: homeassistant
Quelle: components/esphome/assist_satellite.py:333
Erstmals aufgetreten: 23:00:04 (1 Vorkommnisse)
Zuletzt protokolliert: 23:00:04

Error doing job: Task exception was never retrieved (None)
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 1417, in execute
    detect_result = await self.run.wake_word_detection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        stt_processed_stream, stt_audio_buffer
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 804, in wake_word_detection
    self.process_event(
    ~~~~~~~~~~~~~~~~~~^
        PipelineEvent(
        ^^^^^^^^^^^^^^
    ...<2 lines>...
        )
        ^
    )
    ^
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 617, in process_event
    self.event_callback(event)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/assist_satellite/entity.py", line 384, in _internal_on_pipeline_event
    self.on_pipeline_event(event)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/esphome/assist_satellite.py", line 333, in on_pipeline_event
    self.cli.send_voice_assistant_event(event_type, data_to_send)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/aioesphomeapi/client.py", line 1444, in send_voice_assistant_event
    self._get_connection().send_message(req)
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/aioesphomeapi/client.py", line 388, in _get_connection
    raise APIConnectionError(
    ...<2 lines>...
    )
aioesphomeapi.core.APIConnectionError: Authenticated connection not ready yet for esp32-mic-speaker @ 192.168.0.165; current state is ConnectionState.INITIALIZED!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/assist_satellite/entity.py", line 345, in async_accept_pipeline_from_satellite
    await self._pipeline_task
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/__init__.py", line 135, in async_pipeline_from_audio_stream
    await pipeline_input.execute()
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 1509, in execute
    await self.run.end()
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 647, in end
    self.process_event(
    ~~~~~~~~~~~~~~~~~~^
        PipelineEvent(
        ^^^^^^^^^^^^^^
            PipelineEventType.RUN_END,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
        )
        ^
    )
    ^
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 617, in process_event
    self.event_callback(event)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/assist_satellite/entity.py", line 384, in _internal_on_pipeline_event
    self.on_pipeline_event(event)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/esphome/assist_satellite.py", line 333, in on_pipeline_event
    self.cli.send_voice_assistant_event(event_type, data_to_send)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/aioesphomeapi/client.py", line 1444, in send_voice_assistant_event
    self._get_connection().send_message(req)
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/aioesphomeapi/client.py", line 388, in _get_connection
    raise APIConnectionError(
    ...<2 lines>...
    )
aioesphomeapi.core.APIConnectionError: Authenticated connection not ready yet for esp32-mic-speaker @ 192.168.0.165; current state is ConnectionState.INITIALIZED!

Why is the connection state initialized, but connection is not ready, and this two times? Has anyone seen this error before? Google is of no help sadly.
For clarification, my home assistant runs at ip 192.168.0.2 and my esp on 192.168.0.165.

Additionally, here is my yml:

esphome:
  name: esp32-mic-speaker
  friendly_name: esp32-mic-speaker
  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - delay: 1s
         - if:
             condition:
               switch.is_on: use_wake_word
             then:
               - voice_assistant.start_continuous:

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
  variant: esp32s3

wifi:
  ssid: "x"
  password: "xx"
  ap:
    ssid: "Fallback mic speaker"
    password: "xx"
# Enable logging
logger:

debug:

# Enable Home Assistant API
api:

ota:
  - platform: esphome
    id: ota_esphome

i2s_audio:
  i2s_lrclk_pin: GPIO4 
  i2s_bclk_pin: GPIO5 

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIO7
    pdm: false
    channel: right
    bits_per_sample: 32bit

speaker:
  - platform: i2s_audio
    id: big_speaker
    dac_type: external
    i2s_dout_pin: GPIO17

voice_assistant:
  microphone: mic
  use_wake_word: false
  noise_suppression_level: 2
  #auto_gain: 31dBFS
  volume_multiplier: 2.0
  speaker: big_speaker
  id: assist

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);

I have no idea why your yaml is not working. But I do know when I had this working on a devkit before moving to esp32 n16r8 boards I used this yaml which is lot more involved. It did work though. This might help you.

esphome:
  name: robot-winston
  friendly_name: robot-winston

  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - delay: 2s
         - if:
             condition:
               switch.is_on: use_wake_word
             then:
               - voice_assistant.start_continuous:

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: recommended

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "stuff"

ota:
    - platform: esphome
      password: "stuff"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Robot-Winston Fallback Hotspot"
    password: "2O8MlitRJYmG"

improv_serial:

i2s_audio:
  i2s_lrclk_pin: GPIO27
  i2s_bclk_pin: GPIO26

microphone:
  - platform: i2s_audio
    id: echo_microphone
    i2s_din_pin: GPIO13
    adc_type: external
    pdm: false

speaker:
  - platform: i2s_audio
    id: echo_speaker
    i2s_dout_pin: GPIO25
    dac_type: external
    mode: mono

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 10%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 3s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO39
      inverted: true
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_click:
      - if:
          condition:
            switch.is_off: use_wake_word
          then:
            - if:
                condition: voice_assistant.is_running
                then:
                  - voice_assistant.stop:
                  - script.execute: reset_led
                else:
                  - voice_assistant.start:
          else:
            - voice_assistant.stop
            - delay: 2s
            - script.execute: reset_led
            - script.wait: reset_led
            - voice_assistant.start_continuous:

light:
  - platform: esp32_rmt_led_strip
    id: led
    name: Robot light
    disabled_by_default: false
    entity_category: config
    pin: GPIO14
    default_transition_length: 0s
    chipset: ws2812
    num_leds: 2
    rgb_order: rgb
    rmt_channel: 0
    effects:
      - pulse:
          transition_length: 250ms
          update_interval: 250ms

script:
  - id: reset_led
    then:
      - if:
          condition:
            - switch.is_on: use_wake_word
            - switch.is_on: use_listen_light
          then:
            - light.turn_on:
                id: led
                blue: 0%
                red: 0%
                green: 100%
                brightness: 30%
                effect: none
          else:
            - light.turn_off: led

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - script.execute: reset_led
  - platform: template
    name: Use Listen Light
    id: use_listen_light
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - script.execute: reset_led
    on_turn_off:
      - script.execute: reset_led

external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s

esp_adf:

Hi Arh,
I tried your yml but cant get it compiled. I changed the board to

board: esp32-s3-devkitc-1

It is looking for gpio pins that arent even used:

components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_i2c_pins':
components/audio_board/lyrat_v4_3/board_pins_config.c:39:34: error: 'GPIO_NUM_23' undeclared (first use in this function); did you mean 'GPIO_NUM_43'?
         i2c_config->scl_io_num = GPIO_NUM_23;
                                  ^~~~~~~~~~~
                                  GPIO_NUM_43
components/audio_board/lyrat_v4_3/board_pins_config.c:39:34: note: each undeclared identifier is reported only once for each function it appears in
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_i2s_pins':
components/audio_board/lyrat_v4_3/board_pins_config.c:54:33: error: 'GPIO_NUM_25' undeclared (first use in this function); did you mean 'GPIO_NUM_45'?
         i2s_config->ws_io_num = GPIO_NUM_25;
                                 ^~~~~~~~~~~
                                 GPIO_NUM_45
In file included from C:/Users/dabozz/.platformio/packages/framework-espidf/components/esp_rom/include/esp32s3/rom/ets_sys.h:19,
                 from C:/Users/dabozz/.platformio/packages/framework-espidf/components/log/include/esp_log.h:19,
                 from components/audio_board/lyrat_v4_3/board_pins_config.c:25:
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'i2s_mclk_gpio_select':
components/audio_board/lyrat_v4_3/board_pins_config.c:95:53: error: 'FUNC_GPIO0_CLK_OUT1' undeclared (first use in this function); did you mean 'FUNC_GPIO20_CLK_OUT1'?
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_GPIO0_U, FUNC_GPIO0_CLK_OUT1);
                                                     ^~~~~~~~~~~~~~~~~~~
C:/Users/dabozz/.platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/soc.h:137:45: note: in definition of macro 'REG_WRITE'
             (*(volatile uint32_t *)(_r)) = (_v);                                                                       \
                                             ^~
C:/Users/dabozz/.platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/io_mux_reg.h:93:46: note: in expansion of macro 'REG_SET_FIELD'
 #define PIN_FUNC_SELECT(PIN_NAME, FUNC)      REG_SET_FIELD(PIN_NAME, MCU_SEL, FUNC)
                                              ^~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:95:13: note: in expansion of macro 'PIN_FUNC_SELECT'
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_GPIO0_U, FUNC_GPIO0_CLK_OUT1);
             ^~~~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:98:53: error: 'FUNC_U0TXD_CLK_OUT3' undeclared (first use in this function); did you mean 'FUNC_U0TXD_CLK_OUT1'?
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_U0TXD_U, FUNC_U0TXD_CLK_OUT3);
                                                     ^~~~~~~~~~~~~~~~~~~
C:/Users/dabozz/.platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/soc.h:137:45: note: in definition of macro 'REG_WRITE'
             (*(volatile uint32_t *)(_r)) = (_v);                                                                       \
                                             ^~
C:/Users/dabozz/.platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/io_mux_reg.h:93:46: note: in expansion of macro 'REG_SET_FIELD'
 #define PIN_FUNC_SELECT(PIN_NAME, FUNC)      REG_SET_FIELD(PIN_NAME, MCU_SEL, FUNC)
                                              ^~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:98:13: note: in expansion of macro 'PIN_FUNC_SELECT'
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_U0TXD_U, FUNC_U0TXD_CLK_OUT3);
             ^~~~~~~~~~~~~~~
In file included from components/audio_board/lyrat_v4_3/board.h:29,
                 from components/audio_board/lyrat_v4_3/board_pins_config.c:28:
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_green_led_gpio':
components/audio_board/lyrat_v4_3/board_def.h:42:35: error: 'GPIO_NUM_22' undeclared (first use in this function); did you mean 'GPIO_NUM_42'?
 #define GREEN_LED_GPIO            GPIO_NUM_22
                                   ^~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:186:12: note: in expansion of macro 'GREEN_LED_GPIO'
     return GREEN_LED_GPIO;
            ^~~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:187:1: error: control reaches end of non-void function [-Werror=return-type]
 }
 ^
cc1.exe: some warnings being treated as errors
*** [.pioenvs\robot-winston\components\audio_board\lyrat_v4_3\board_pins_config.o] Error 1

Even when I uncomment all led stuff the errors remain the same.

You won’t be able to copy and paste my code as it will almost certainly be using a slightly different board and very different pinout. Change the pin config to suit what you wired up.

That I did also, of course, just forgot to mention it. Still the same mistakes, I do not use GPIO43 or 25 at all, also you do not use them. Weird that they show up. I do not know where the

function 'get_i2s_pins'

comes from…

I got it working so far. I used my approach and added your led and its behaviour. It comes down to this:

esphome:
  name: esp32-mic-speaker
  friendly_name: esp32-mic-speaker
  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - delay: 1s
         - if:
             condition:
               switch.is_on: use_wake_word
             then:
               - voice_assistant.start_continuous:

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
  variant: esp32s3

wifi:
  ssid: ""
  password: ""
  ap:
    ssid: "Fallback mic speaker"
    password: ""
# Enable logging
logger:

debug:

# Enable Home Assistant API
api:
  encryption:
    key: ""

ota:
  - platform: esphome
    id: ota_esphome

i2s_audio:
  i2s_lrclk_pin: GPIO4
  i2s_bclk_pin: GPIO5

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIO7
    pdm: false
    channel: left
    bits_per_sample: 32bit

speaker:
  - platform: i2s_audio
    id: big_speaker
    dac_type: external
    i2s_dout_pin: GPIO17

light:
  - platform: esp32_rmt_led_strip
    id: led
    name: Status-LED
    disabled_by_default: false
    entity_category: config
    pin: GPIO48
    default_transition_length: 0s
    chipset: ws2812
    num_leds: 1
    rgb_order: grb
    rmt_channel: 0
    effects:
      - pulse:
          transition_length: 250ms
          update_interval: 250ms

voice_assistant:
  microphone: mic
  use_wake_word: false
  noise_suppression_level: 2
  #auto_gain: 31dBFS
  volume_multiplier: 2.0
  speaker: big_speaker
  id: assist
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 10%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 3s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

script:
  - id: reset_led
    then:
      - if:
          condition:
            - switch.is_on: use_wake_word
            - switch.is_on: use_listen_light
          then:
            - light.turn_on:
                id: led
                blue: 0%
                red: 0%
                green: 100%
                brightness: 30%
                effect: none
          else:
            - light.turn_off: led

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);
      - script.execute: reset_led
  - platform: template
    name: Use Listen Light
    id: use_listen_light
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - script.execute: reset_led
    on_turn_off:
      - script.execute: reset_led

Glad you got it working, I am no expert on this programming stuff. It was a long time ago I was playing with this as I mentioned earlier I changed over to esp32 n16r8 boards so I can use media player instead of speaker. This allows me to have the assists announce things as well.

My board is a esp32 n16r8 and I want to use media_player as well to announce. Can you share your yml?

Try this, its what I based mine on.

Just remebered you will need to change the framework version to this for it to compile in the latest version of esphome.

  framework:
    type: esp-idf
    version: 4.4.8
    platform_version: 5.4.0