A presentable voice assistant satellite

It’s a USB-C cable, connected to a USB 3.0 hub I use daily. The small end of the USB-C cable goes into the Onn speaker connector, and the 5VDC is tapped directly from those two points on the board, where the cable is soldered. Both the Vcc and GND connections feed a pair of Wago lever-nut connectors, where ALL power and ALL grounds are connected. So while it seems like it could be a ground loop, it’s not very likely.

Let me get a few things managed here around the house, and I’ll post it shortly. Be aware, I have turned down the brightness settings in the control_led routine at the bottom of the code.

substitutions:
#
# ESPHome Device Name
#
  device_name:          'vasst-living-room'
  device_friendly_name: 'Voice Assistant Living Room'
#
# Voice Assist phase IDs
#
  voice_assist_idle_phase_id:       '1'
  voice_assist_listening_phase_id:  '2'
  voice_assist_thinking_phase_id:   '3'
  voice_assist_replying_phase_id:   '4'
  voice_assist_not_ready_phase_id: '10'
  voice_assist_error_phase_id:     '11'
  voice_assist_muted_phase_id:     '12'
#
#  P A R T S   L I S T
# ---------------------
# Small Rugged Speaker with Bluetooth Wireless Technology, Blue
# https://www.walmart.com/ip/onn-Small-Rugged-Speaker-with-Bluetooth-Wireless-Technology-Blue/883044562
# 
# AITRIP ESP32-S3-DevKitC-1-N16R8 Development Board
# https://www.amazon.com/dp/B0CGYXJB6Y
#
# KWMSTPLT WS2812B RGB LED Strip
# https://www.amazon.com/dp/B09P8MH56K
#
# Max98357 I2S 3W Class D Audio Amplifier
# https://www.amazon.com/dp/B0B4GK5R1R
#
# INMP441 I2S Microphone Module MEMS 
# https://www.amazon.com/dp/B09G4RNT3G
# 

#
# GPIO pin assignments
#
#
# Validated according to 
# https://www.studiopieters.nl/esp32-s3-wroom-pinout/
#
# I2S audio thanks to
# https://dronebotworkshop.com/esp32-i2s/
#
  i2s_in_lrclk_gpio:  'GPIO4'    # INP Left/Right Clock  4 YELLOW
  i2s_in_bclk_gpio:   'GPIO5'    # INP Bit Clock         5 ORANGE
  i2s_in_mic_gpio:    'GPIO6'    # INP Digital Data      6 BLUE
#
  ws2812_led_gpio:    'GPIO18'   # LED strip            18 GRAY
#
  i2s_out_lrclk_gpio: 'GPIO15'   # OUT Left/Right Clock 15 YELLOW
  i2s_out_bclk_gpio:  'GPIO16'   # OUT Bit Clock        16 ORANGE
  i2s_out_spkr_gpio:  'GPIO10'   # OUT Digital Data     10 PURPLE
#

esphome:
  name:          $device_name
  friendly_name: $device_friendly_name
  platformio_options:
    board_build.flash_mode: dio

  #
  # Startup - init the LED strip and phase ID
  #
  on_boot:
      priority: 600
      then:
        - script.execute: control_led
        - delay: 30s
        - if:
            condition:
              lambda: return id(init_in_progress);
            then:
              - lambda: id(init_in_progress) = false;
              - script.execute: control_led

psram:
  mode: octal
  speed: 80MHz

esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  framework:
    type: esp-idf
    components:
      - name: esphome_board
        source: github://jesserockz/esphome-esp-adf-board@main
        refresh: 0s
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"

#
# Logging
#
logger:
  level: VERBOSE

#
# Enable Home Assistant API
#
api:
  encryption:
    key: !secret encryption_key
  #
  # toggle the LED on HA connect and HA disconnect
  #
  on_client_connected:
    - script.execute: control_led
  on_client_disconnected:
    - script.execute: control_led

#
# Allow Over-the-air firmware updates
#
ota:
  password: !secret ota_password

#
# WiFi ssid and password
#
wifi:
  ssid:     !secret wifi_ssid
  password: !secret wifi_password
  #
  # toggle the LED on WiFi connect and WiFi disconnect
  #
  on_connect:
    - script.execute: control_led
  on_disconnect:
    - script.execute: control_led

#
# ESP-ADF audio component
#
esp_adf:
external_components:
  - source: github://pr#5230
    components:
    - esp_adf 
    refresh: 0s

#
# Establish a device web page
# for recovery via WiFi
#
captive_portal:

#
# WS2812 LED Strip
#
light:
  - platform:  esp32_rmt_led_strip
    chipset:   WS2812
    rgb_order: GRB
    pin:       $ws2812_led_gpio
    num_leds:    3
    rmt_channel: 0
    name:      "Status LED"
    id:        led
    default_transition_length: 0s
    effects:
      - pulse:
          name: "extra_slow_pulse"
          transition_length: 800ms
          update_interval: 800ms
          min_brightness: 0%
          max_brightness: 30%
      - pulse:
          name: "slow_pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "fast_pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%
  
#
# I2S Audio IN and OUT
#
i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: $i2s_in_lrclk_gpio
    i2s_bclk_pin:  $i2s_in_bclk_gpio

  - id: i2s_out
    i2s_lrclk_pin: $i2s_out_lrclk_gpio
    i2s_bclk_pin:  $i2s_out_bclk_gpio

#
# INMP441 I2S INPUT Microphone
#
microphone:
  platform: i2s_audio 
  id: INMP441_I2S_MICROPHONE
  adc_type: external 
  i2s_audio_id: i2s_in
  i2s_din_pin: $i2s_in_mic_gpio
  pdm: false

#
# MAX98357a IS2 OUTPUT Amplifier
#
speaker:
  platform: i2s_audio 
  id: MAX98357_I2S_AMP
  dac_type: external
  i2s_audio_id: i2s_out
  i2s_dout_pin: $i2s_out_spkr_gpio

#
# ESPHome Voice Assistant
#
voice_assistant:
  id: va
  microphone: INMP441_I2S_MICROPHONE 
  speaker:    MAX98357_I2S_AMP
  use_wake_word: true
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 8.0

  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: control_led

  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: control_led

  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: control_led

  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: control_led

  on_error: 
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: control_led
          - delay: 1s
          - if:
              condition:
                switch.is_on: use_wake_word
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: control_led

  on_client_connected: 
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - script.execute: control_led          

  on_client_disconnected: 
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: control_led

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                    not:
                      - voice_assistant.is_running
                then:
                  - voice_assistant.start_continuous
            - script.execute: control_led          
 
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: control_led          

globals:
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}
  
script:
  - id: control_led
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                    wifi.connected:
                then:
                  - if:
                      condition:
                          api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(led).turn_on().set_rgb(1, 0, 0).set_brightness(0.3).set_effect("none").perform();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(led).turn_on().set_rgb(1, 0, 0).set_effect("slow_pulse").perform();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(led).turn_on().set_rgb(1, 0, 0).set_brightness(0.3).set_effect("fast_pulse").perform();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(led).turn_on().set_rgb(1, 1, 1).set_brightness(0.3).set_effect("none").perform();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(led).turn_off().perform();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(led).turn_on().perform();
                                break;
                              default:
                                id(led).turn_on().set_rgb(1, 0, 0).set_brightness(0.2).set_effect("none").perform();
                                break;
                            }
                      else:
                        - light.turn_off:
                            id: led
                else:
                  - light.turn_off:
                      id: led
          else:
            - light.turn_on:
                id: led
                blue: 30%
                red: 30%
                green: 30%
                effect: "fast_pulse"
2 Likes

You’ve inspired me to put my Wyoming Satellite based system in a couple of the Onn Surf Speakers along with an additional Bluetooth adapter to make portable powered speakers out of them. I may do a video and take pictures of the project and share them.

1 Like

I also have yet to see a bt speaker or soundbar set in bt mode which doesn’t turn itself off after some amount of idle time, for that matter. This has been my experience when trying to use them as media player targets for tts notifications over the past 5 years. Maybe you will have better luck.

In case you didn’t know, I’m hoping you find the magic incantation of GPIO pin assignments that resolves the amplifier noise with this ESP32-S3 board. I spent today shortening wires, soldering, and removing the Wago connectors, but haven’t finished getting it fully assembled yet. Won’t know if it helps until I do.

Try moving the i2s_bclk_pin for Audio out to pin 20
I have no audio noise but cant get the microphone to work…

I’ve wondered if it was working. The noise from the amp was so bad, I’d keep pulling the power. Sometimes, it seems like it takes forever to connect to Home Assistant. Hopefully, shortening the wires and your suggestion provide results. Thanks for that! :slight_smile:

EDIT: Reviewing the INM441 microphone connection: The L/R signal at the MIC is tied to GND, and not to the I2S Input channel. What is available is the WS Word Select signal connected to I2S Input. The ESPHome I2S Audio page describes WS as the LRCLK signal. I’m testing the new wiring this morning, and will update with any details as I get them.

I can confirm your suggestion to eliminate the amp noise worked. It’s as quiet as can be now. :smiley: Even with the MIC configuration updated, it does not appear to be working correctly. :frowning: More tracing, more troubleshooting to go . . . MIC wiring confirmed okay with a multi-meter, all pins show connectivity from the MIC to the ESP32. Hrmmm, so why isn’t it detecting any voice input? I set log_level: DEBUG and don’t see anything no matter what wakeword is spoken. :frowning: These are my current pin assignments. Note I changed the name of i2s_in_lrclk_gpio to follow the MIC pin name, and it’s now i2s_in_ws_gpio.

#
  i2s_in_ws_gpio:     'GPIO4'    # INP WordSelect/LRCLK  4 YELLOW
  i2s_in_bclk_gpio:   'GPIO21'   # INP Bit Clock        21 ORANGE
  i2s_in_mic_gpio:    'GPIO6'    # INP Digital Data      6 BLUE
#
  ws2812_led_gpio:    'GPIO18'   # LED strip            18 BLUE
#
  i2s_out_lrclk_gpio: 'GPIO15'   # OUT Left/Right Clock 15 YELLOW
  i2s_out_bclk_gpio:  'GPIO20'   # OUT Bit Clock        20 ORANGE
  i2s_out_spkr_gpio:  'GPIO10'   # OUT Digital Data     10 BLUE
#

So, this Espressif ESP32-S3 I2S page shows code samples using the following pin assignments:

    .gpio_cfg = {
        .mclk = I2S_GPIO_UNUSED,
        .bclk = GPIO_NUM_4,
        .ws =   GPIO_NUM_5,
        .dout = I2S_GPIO_UNUSED,
        .din =  GPIO_NUM_19,
        .invert_flags = {
            .mclk_inv = false,
            .bclk_inv = false,
            .ws_inv   = false,
        },
    },

Trying these pin assignments now… First attempt, no change. :frowning: Second attempt, still no change. Suspecting something in the ESP32-S3 code may possibly be amiss with I2S Audio?

The table of pin you posted was a little off based on your code.

GPIO 32 is connected to LRCLK.  
GPIO 12 is connected to DIN. 

The audio output is not always clear. It studders. Any thoughts on why that might happen? Also, at some points when trying to play the response the system never gets back to a state where it is waiting for the wake word. When this happens the LEDs just keep flashing blue. Below shows the log when this happens.

[12:29:57][D][voice_assistant:519]: Event Type: 9
[12:30:02][D][voice_assistant:519]: Event Type: 0
[12:30:02][D][voice_assistant:519]: Event Type: 2
[12:30:02][D][voice_assistant:609]: Assist Pipeline ended
[12:30:02][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[12:30:02][D][voice_assistant:418]: Desired state set to WAITING_FOR_VAD
[12:30:02][D][voice_assistant:170]: Waiting for speech...
[12:30:02][D][voice_assistant:412]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[12:30:02][D][voice_assistant:183]: VAD detected speech
[12:30:02][D][voice_assistant:412]: State changed from WAITING_FOR_VAD to START_PIPELINE
[12:30:02][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[12:30:02][D][voice_assistant:200]: Requesting start...
[12:30:02][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[12:30:02][D][voice_assistant:433]: Client started, streaming microphone
[12:30:02][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[12:30:02][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[12:30:02][D][voice_assistant:519]: Event Type: 1
[12:30:02][D][voice_assistant:522]: Assist Pipeline running
[12:30:02][D][voice_assistant:519]: Event Type: 9
[12:30:05][D][voice_assistant:519]: Event Type: 10
[12:30:05][D][voice_assistant:528]: Wake word detected
[12:30:05][D][voice_assistant:519]: Event Type: 3
[12:30:05][D][voice_assistant:533]: STT started
[12:30:05][D][light:036]: 'Status LED' Setting:
[12:30:05][D][light:051]:   Brightness: 100%
[12:30:05][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[12:30:07][D][voice_assistant:519]: Event Type: 11
[12:30:07][D][voice_assistant:670]: Starting STT by VAD
[12:30:08][D][voice_assistant:519]: Event Type: 12
[12:30:08][D][voice_assistant:674]: STT by VAD end
[12:30:08][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[12:30:08][D][voice_assistant:418]: Desired state set to AWAITING_RESPONSE
[12:30:08][D][voice_assistant:412]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[12:30:08][D][light:036]: 'Status LED' Setting:
[12:30:08][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[12:30:08][D][light:109]:   Effect: 'slow_pulse'
[12:30:08][D][esp-idf:000]: I (102738) I2S: DMA queue destroyed

[12:30:08][D][voice_assistant:412]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[12:30:10][D][voice_assistant:519]: Event Type: 4
[12:30:10][D][voice_assistant:547]: Speech recognised as: " Turn on Porch lamp."
[12:30:10][D][voice_assistant:519]: Event Type: 5
[12:30:10][D][voice_assistant:552]: Intent started
[12:30:10][D][voice_assistant:519]: Event Type: 6
[12:30:10][D][voice_assistant:519]: Event Type: 7
[12:30:10][D][voice_assistant:575]: Response: "Turned on the light"
[12:30:10][D][voice_assistant:519]: Event Type: 8
[12:30:10][D][voice_assistant:595]: Response URL: "https://ha.connectionblocks.com/api/tts_proxy/104c89b5f9053e4751d03002aab527c96124bd77_en-us_d6114071a7_tts.piper.wav"
[12:30:10][D][voice_assistant:412]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[12:30:10][D][voice_assistant:418]: Desired state set to STREAMING_RESPONSE
[12:30:10][D][esp-idf:000]: I (105191) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[12:30:10][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[12:30:11][D][voice_assistant:519]: Event Type: 99
[12:30:11][D][voice_assistant:665]: TTS stream end
[12:30:11][D][voice_assistant:283]: End of audio stream received
[12:30:11][D][voice_assistant:412]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[12:30:11][D][voice_assistant:418]: Desired state set to RESPONSE_FINISHED

There were no more debug messages printed out after this point until I power cycled the device. I could turn off the wake word and have the LED stop flashing blue, then turn it back on to solid red, but this didn’t start audio processing. Do you experience this issue? Any thoughts on how to address it?

A couple of the ones I made I had to change a couple pins.
The speaker buffer full is what is causing that. I dont know why it happens, but it happens much less using OpenAi as the conversation agent.
Im able to query some pretty long responses from OpenAI and not have that issue.
I did notice before I switched to OpenAI, that it was doing it less often. Hopefully its an issue that will be fixed soon.

I’ve been chatting with @robgough1970 in the ESPhome discord, and had to update a few things today. He told me the trick to getting the MIC input is to add channel: left to the I2S Microphone config.

Another error popping up (due to ESPhome 2024.2.0, apparently) requires changes to the esp32: section:

esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      CONFIG_ESP32_S3_BOX_BOARD: "y"
1 Like

Here is a demo of a long response without that error.

Hi Keith,
just so you have another pinout to confuse matters even more ! This is my goto for the ESP32-S3-N16R8

i2s_audio:
  - id: va_mic
    i2s_lrclk_pin: GPIO3
    i2s_bclk_pin: GPIO2
  - id: va_spk
    i2s_lrclk_pin: GPIO6
    i2s_bclk_pin: GPIO7

microphone:
  platform: i2s_audio
  id: mic
  adc_type: external
  i2s_audio_id: va_mic
  i2s_din_pin: GPIO4
  channel: left
  pdm: false

speaker:
  platform: i2s_audio
  id: spk
  dac_type: external
  i2s_audio_id: va_spk
  i2s_dout_pin: GPIO8

that is with an inmp441 mic and the Max98357 DAC
No issue with noise and get clear audio input from the mic :+1:

Cheers
Rob

3 Likes

I found a thread where someone indicated that they got better audio out by disabling wake word detection during play back. They controlled this on the HA side. I couldn’t see figure out exactly what they were doing so I figured I’d try to adjust things on the ESP side. I made a couple of small changes to the code as follows. First I added a disable of wake word detection in the voice_assistant section.

  on_tts_start:
    - switch.turn_off: use_wake_word
    - delay: .25s

This turns off wake_word detection, however the delay didn’t have the desired effect. Ideally the delay would give a little time for wake word detection to be turned off. I added a web interface and I could see the wake word switch, but it happened at the same time the audio stated playing, As a result the initial part of the playback has a studder, but after the initial studder, playback was normally clear. If anyone has any insight into how to get this delay in place that would be helpful.

Next I added two to the on_tts_stream_end action so it looked like this:

  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: control_led
    - delay: 1s
    - switch.turn_on: use_wake_word

This turns wake word detection back on after playback.

This did not fix the issue with the system failing to complete playback and return to wait word detection state periodically.

I was hoping to be able to try things on an M5StampS3. Here is the yaml I attempted to use:

substitutions:
  voice_assist_idle_phase_id: '1'
  voice_assist_listening_phase_id: '2'
  voice_assist_thinking_phase_id: '3'
  voice_assist_replying_phase_id: '4'
  voice_assist_not_ready_phase_id: '10'
  voice_assist_error_phase_id: '11'
  voice_assist_muted_phase_id: '12'
esphome:
  name: voice-satellite-eps32-s3-test
  friendly_name: voice satellite eps32 S3 test
  
  on_boot:
    priority: 600
    then:
      - script.execute: control_led
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
            - script.execute: control_led

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    version: recommended

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: !secret api_key

ota:
  password: !secret ota_password

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "voicething"
    password: !secret wifi_password

# Enable Web server.
web_server:
  port: 80
  
captive_portal:

# restart-button
button:
  - platform: restart
    name: "Reboot"

esp_adf:
external_components:
  - source: github://pr#5230
    components:
    - esp_adf 
    refresh: 0s

light:
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: GPIO3
    num_leds: 3
    rmt_channel: 0
    chipset: WS2812
    name: "Status LED"
    id: led
    default_transition_length: 0s
    effects:
      - pulse:
          name: "extra_slow_pulse"
          transition_length: 800ms
          update_interval: 800ms
          min_brightness: 0%
          max_brightness: 30%
      - pulse:
          name: "slow_pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "fast_pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO15
    i2s_bclk_pin: GPIO13
  - id: i2s_out
    i2s_lrclk_pin: GPIO11
    i2s_bclk_pin: GPIO09

microphone:
  platform: i2s_audio 
  id: external_microphone 
  adc_type: external 
  i2s_audio_id: i2s_in
  i2s_din_pin: GPIO7
  pdm: false
  bits_per_sample: 32bit


speaker:
  platform: i2s_audio 
  id: external_speaker 
  dac_type: external
  i2s_audio_id: i2s_out
  i2s_dout_pin: GPIO5
  mode: mono 

voice_assistant:
  id: va
  microphone: external_microphone 
  speaker: external_speaker
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.5

  on_tts_start:
    - switch.turn_off: use_wake_word
    - delay: 1s

  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: control_led

  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: control_led

  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: control_led

  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: control_led
    - delay: 1s
    - switch.turn_on: use_wake_word

  on_error: 
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: control_led
          - delay: 1s
          - if:
              condition:
                switch.is_on: use_wake_word
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: control_led

  on_client_connected: 
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - script.execute: control_led          

  on_client_disconnected: 
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: control_led

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                    not:
                      - voice_assistant.is_running
                then:
                  - voice_assistant.start_continuous
            - script.execute: control_led          
 
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: control_led          

globals:
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}
  
script:
  - id: control_led
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                    wifi.connected:
                then:
                  - if:
                      condition:
                          api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(led).turn_on().set_rgb(0, 0, 1).set_brightness(1.0).set_effect("none").perform();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(led).turn_on().set_rgb(0, 1, 0).set_effect("slow_pulse").perform();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(led).turn_on().set_rgb(0, 0, 1).set_brightness(1.0).set_effect("fast_pulse").perform();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(led).turn_on().set_rgb(1, 1, 1).set_brightness(.5).set_effect("none").perform();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(led).turn_off().perform();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(led).turn_on().perform();
                                break;
                              default:
                                id(led).turn_on().set_rgb(1, 0, 0).set_brightness(0.2).set_effect("none").perform();
                                break;
                            }
                      else:
                        - light.turn_off:
                            id: led
                else:
                  - light.turn_off:
                      id: led
          else:
            - light.turn_on:
                id: led
                blue: 50%
                red: 50%
                green: 50%
                effect: "fast_pulse"
    

This gives the following errors:

INFO ESPHome 2023.12.8
INFO Reading configuration /config/esphome/voice-satellite-eps32-s3-test.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Generating C++ source...
INFO Updating https://github.com/espressif/[email protected]
INFO Updating submodules (components/esp-sr, components/esp-adf-libs) for https://github.com/espressif/[email protected]
INFO Updating https://github.com/espressif/[email protected]
INFO Compiling app...
Processing voice-satellite-eps32-s3-test (board: esp32-s3-devkitc-1; framework: espidf; platform: platformio/[email protected])
--------------------------------------------------------------------------------
HARDWARE: ESP32S3 240MHz, 320KB RAM, 8MB Flash
 - framework-espidf @ 3.40405.230623 (4.4.5) 
 - tool-cmake @ 3.16.9 
 - tool-ninja @ 1.10.2 
 - toolchain-esp32ulp @ 2.35.0-20220830 
 - toolchain-riscv32-esp @ 8.4.0+2021r2-patch5 
 - toolchain-xtensa-esp32s3 @ 8.4.0+2021r2-patch5
Reading CMake configuration...
Dependency Graph
|-- noise-c @ 0.1.4
|-- ArduinoJson @ 6.18.5
Compiling .pioenvs/voice-satellite-eps32-s3-test/components/audio_board/lyrat_v4_3/board_pins_config.o
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_i2c_pins':
components/audio_board/lyrat_v4_3/board_pins_config.c:39:34: error: 'GPIO_NUM_23' undeclared (first use in this function); did you mean 'GPIO_NUM_43'?
         i2c_config->scl_io_num = GPIO_NUM_23;
                                  ^~~~~~~~~~~
                                  GPIO_NUM_43
components/audio_board/lyrat_v4_3/board_pins_config.c:39:34: note: each undeclared identifier is reported only once for each function it appears in
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_i2s_pins':
components/audio_board/lyrat_v4_3/board_pins_config.c:54:33: error: 'GPIO_NUM_25' undeclared (first use in this function); did you mean 'GPIO_NUM_45'?
         i2s_config->ws_io_num = GPIO_NUM_25;
                                 ^~~~~~~~~~~
                                 GPIO_NUM_45
In file included from /data/cache/platformio/packages/framework-espidf/components/esp_rom/include/esp32s3/rom/ets_sys.h:19,
                 from /data/cache/platformio/packages/framework-espidf/components/log/include/esp_log.h:19,
                 from components/audio_board/lyrat_v4_3/board_pins_config.c:25:
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'i2s_mclk_gpio_select':
components/audio_board/lyrat_v4_3/board_pins_config.c:95:53: error: 'FUNC_GPIO0_CLK_OUT1' undeclared (first use in this function); did you mean 'FUNC_GPIO20_CLK_OUT1'?
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_GPIO0_U, FUNC_GPIO0_CLK_OUT1);
                                                     ^~~~~~~~~~~~~~~~~~~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/soc.h:136:45: note: in definition of macro 'REG_WRITE'
             (*(volatile uint32_t *)(_r)) = (_v);                                                                       \
                                             ^~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/io_mux_reg.h:93:46: note: in expansion of macro 'REG_SET_FIELD'
 #define PIN_FUNC_SELECT(PIN_NAME, FUNC)      REG_SET_FIELD(PIN_NAME, MCU_SEL, FUNC)
                                              ^~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:95:13: note: in expansion of macro 'PIN_FUNC_SELECT'
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_GPIO0_U, FUNC_GPIO0_CLK_OUT1);
             ^~~~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:98:53: error: 'FUNC_U0TXD_CLK_OUT3' undeclared (first use in this function); did you mean 'FUNC_U0TXD_CLK_OUT1'?
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_U0TXD_U, FUNC_U0TXD_CLK_OUT3);
                                                     ^~~~~~~~~~~~~~~~~~~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/soc.h:136:45: note: in definition of macro 'REG_WRITE'
             (*(volatile uint32_t *)(_r)) = (_v);                                                                       \
                                             ^~
/data/cache/platformio/packages/framework-espidf/components/soc/esp32s3/include/soc/io_mux_reg.h:93:46: note: in expansion of macro 'REG_SET_FIELD'
 #define PIN_FUNC_SELECT(PIN_NAME, FUNC)      REG_SET_FIELD(PIN_NAME, MCU_SEL, FUNC)
                                              ^~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:98:13: note: in expansion of macro 'PIN_FUNC_SELECT'
             PIN_FUNC_SELECT(PERIPHS_IO_MUX_U0TXD_U, FUNC_U0TXD_CLK_OUT3);
             ^~~~~~~~~~~~~~~
In file included from components/audio_board/lyrat_v4_3/board.h:29,
                 from components/audio_board/lyrat_v4_3/board_pins_config.c:28:
components/audio_board/lyrat_v4_3/board_pins_config.c: In function 'get_green_led_gpio':
components/audio_board/lyrat_v4_3/board_def.h:42:35: error: 'GPIO_NUM_22' undeclared (first use in this function); did you mean 'GPIO_NUM_42'?
 #define GREEN_LED_GPIO            GPIO_NUM_22
                                   ^~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:186:12: note: in expansion of macro 'GREEN_LED_GPIO'
     return GREEN_LED_GPIO;
            ^~~~~~~~~~~~~~
components/audio_board/lyrat_v4_3/board_pins_config.c:187:1: error: control reaches end of non-void function [-Werror=return-type]
 }
 ^
cc1: some warnings being treated as errors
Archiving .pioenvs/voice-satellite-eps32-s3-test/esp-idf/audio_hal/libaudio_hal.a
Compiling .pioenvs/voice-satellite-eps32-s3-test/components/audio_pipeline/audio_event_iface.o
Compiling .pioenvs/voice-satellite-eps32-s3-test/components/audio_pipeline/audio_pipeline.o
Compiling .pioenvs/voice-satellite-eps32-s3-test/components/audio_pipeline/ringbuf.o
*** [.pioenvs/voice-satellite-eps32-s3-test/components/audio_board/lyrat_v4_3/board_pins_config.o] Error 1
Compiling .pioenvs/voice-satellite-eps32-s3-test/components/audio_recorder/recorder_encoder.o
========================= [FAILED] Took 12.74 seconds =========================

I didn’t configure the pins it’s complaining about, so they must be hard coded in the voice assistant esp code. Any thoughts on how to get past this error?

Can you post yoiur full config? Im getting nowhere with this board…

1 Like

That config doesnt work with the board you are using.

this is one with micro wake word - just trying to dig one out that hasn’t got a load of other waffle in it - that’s the most basic one i can lay my hands on at the mo

esphome:
  name: obww2
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    - script.execute: idle_pg   


esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  framework:
    type: esp-idf

    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
   
psram:
  mode: octal
  speed: 80MHz

logger:

ota:

wifi:
    ssid: !secret wifi_ssid
    password: !secret wifi_password

api:
  on_client_connected:
        then:
          - delay: 50ms
          - micro_wake_word.start:
  on_client_disconnected:
        then:
          - voice_assistant.stop: 

button:
  - platform: restart
    name: "Restart"
    id: but_rest

switch:
  - platform: template
    id: mute
    name: mute
    optimistic: true
    on_turn_on: 
      - micro_wake_word.stop:
      - voice_assistant.stop:
      - homeassistant.service:
          service: select.select_option
          data:
              option: Heartbeat
              entity_id: select.tree_lamp_preset  
    on_turn_off:
      - micro_wake_word.start:  
      - homeassistant.service:
          service: select.select_option
          data:
              option: ok
              entity_id: select.tree_lamp_preset  
              
i2s_audio:
  - id: i2s_mic
    i2s_lrclk_pin: GPIO3
    i2s_bclk_pin: GPIO2
  - id: i2s_spk
    i2s_lrclk_pin: GPIO6
    i2s_bclk_pin: GPIO7

microphone:
  platform: i2s_audio
  id: va_mic
  adc_type: external
  i2s_audio_id: i2s_mic
  i2s_din_pin: GPIO4
  channel: left
  pdm: false

speaker:
  platform: i2s_audio
  id: va_spk
  dac_type: external
  i2s_audio_id: i2s_spk
  i2s_dout_pin: GPIO8


micro_wake_word:
  model: hey_jarvis
  on_wake_word_detected:
    - voice_assistant.start:
    - script.stop: idle_pg
    - display.page.show: va_pg_wk
    - homeassistant.service:
        service: select.select_option
        data:
            option: wake
            entity_id: select.tree_lamp_preset
    - homeassistant.service:
        service: media_player.play_media
        data:
          media_content_id: media-source://media_source/local/vad.mp3
          media_content_type: music
          entity_id: media_player.tree_media
    
voice_assistant:
  id: va
  microphone: va_mic
  # speaker: va_spk
  noise_suppression_level: 2.0
  volume_multiplier: 10
  on_error:
    - display.page.show: va_pg_err
    - micro_wake_word.start:  
  on_stt_end: 
    - display.page.show: va_pg_ok
    - homeassistant.service:
        service: select.select_option
        data:
            option: ok
            entity_id: select.tree_lamp_preset
  on_end:
        then:
          - wait_until:
              not:
                voice_assistant.is_running:
          - voice_assistant.stop
          - micro_wake_word.start:
          - script.execute: idle_pg        
          - homeassistant.service:
              service: select.select_option
              data:
                  option: flame
                  entity_id: select.tree_lamp_preset  
  on_tts_end:
  - homeassistant.service:
      service: media_player.play_media
      data:
        entity_id: media_player.tree_media
        media_content_id: !lambda 'return x;'
        media_content_type: music
        announce: "true"

forgot there’s this one here too kahrendt_micro_wake_word/obww_esp32_s3_mic_and_speaker.yaml

try adding this to your esp32: component

    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"

Three steps forward…

  1. MIC is working with your pin assignments (slightly shifted, see below). :slight_smile:
#
  i2s_in_ws_gpio:     'GPIO2'    # INP WordSelect/LRCLK  2 YELLOW
  i2s_in_bclk_gpio:   'GPIO3'    # INP Bit Clock         3 ORANGE
  i2s_in_mic_gpio:    'GPIO4'    # INP Digital Data      4 BLUE
#
  ws2812_led_gpio:    'GPIO20'   # LED strip            20 BLUE
#
  i2s_out_lrclk_gpio: 'GPIO6'    # OUT Left/Right Clock  6 YELLOW
  i2s_out_bclk_gpio:  'GPIO7'    # OUT Bit Clock         7 ORANGE
  i2s_out_spkr_gpio:  'GPIO8'    # OUT Digital Data      8 BLUE
#
  1. ESPhome logs are displaying on the USB/Serial port used to load the firmware (see YAML below).
logger:
  level: DEBUG
  hardware_uart: uart0
  1. This is the most progress with this board in days, and I’ll gladly take that. :smiley:

Positive progress :revolving_hearts: is a wonderful thing for motivation.

Later observation: When powering the ESP32 board from the USB port of my computer, the MIC is working, and the wake word is detected. When powering the ESP32 board using the 5Vin and GND pins, it’s not. Very wonky. :thinking: This sounds entirely like the LDO voltage regulator is delivering insufficient voltage or current, a problem previously seen on many D1mini clone boards.It’s marked 1117c, which is supposed to be the good LDO VR component. Except it’s limited to 800 mA, which is too low for the MAX98357 audio amplifier, honestly.