Voice Assistant - No Sound

Hi,

Can I please have some help with my ESPHome voice assistant?

It’s detecting my wake word, but I can’t hear any response through the speaker. Although, the ESPHome logs suggest that it’s trying to output something.

I’ve tried swapping out the hardware to see if I have faulty boards. I’ve tried different configurations, including a shared I2S bus, and separate buses.
Text to speech works if I pick a different speaker and don’t use speech to text to trigger it.
My logs show that speech to text also works when talking to this satellite that won’t make sound.


Assist Configuration


ESPHome Configuration

substitutions:
  name: esphome-voice-satellite-dev
  friendly_name: ESPHome Voice Satellite Dev

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  platformio_options:
    board_build.flash_mode: dio
  project:
    name: "dan.voice_assistant"
    version: '1.0'
  min_version: 2023.11.5

esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3      # This shouldn't be needed.
  flash_size: 16MB
  framework:
    type: esp-idf             #arduino
    version: recommended #4.4.6
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"

psram:
  mode: octal
  speed: 80MHz

# Enable logging
logger:

# Enable Home Assistant API
api:
  on_client_connected:
    then:
      - delay: 50ms
#     - light.turn_off: led_ww
      - micro_wake_word.start:
  on_client_disconnected:
    then:
      - voice_assistant.stop: 

# Allow Over-The-Air updates
ota:
  - platform: esphome
    password: !secret ota_password


# Allow provisioning Wi-Fi via serial
improv_serial:

wifi:
  ssid: !secret wifi_iot_ssid
  password: !secret wifi_iot_password
  # Set up a wifi access point
  ap: {}

# In combination with the `ap` this allows the user
# to provision wifi credentials to the device via WiFi AP.
captive_portal:

dashboard_import:
  package_import_url: github://esphome/firmware/esphome-web/esp32s3.yaml@v2
  import_full_config: true

# Sets up Bluetooth LE (Only on ESP32) to allow the user
# to provision wifi credentials to the device.
esp32_improv:
  authorizer: none

# To have a "next url" for improv serial
web_server:


i2s_audio:
  - id: i2s_mic
    i2s_lrclk_pin: GPIO3    #WS 
    i2s_bclk_pin: GPIO5     #SCK
  - id: i2s_speaker
    i2s_lrclk_pin: GPIO6    #LRC 
    i2s_bclk_pin: GPIO7     #BLCK
  #id: i2s_main
  #i2s_lrclk_pin: GPIO7
  #i2s_bclk_pin: GPIO6
  #access_mode: duplex

microphone:
  - platform: i2s_audio
    id: va_mic
    i2s_audio_id: i2s_mic
    adc_type: external
    i2s_din_pin: GPIO4        # SD Pin of INMP441 Microphone
    channel: left             # worked without this?
    pdm: false
    bits_per_sample: 32 bit

speaker:
  - platform: i2s_audio
    id: va_speaker
    i2s_audio_id: i2s_speaker
    dac_type: external
    i2s_dout_pin: GPIO8       # DIN Pin of the MAX98357A Audio Amplifier
    mode: mono

micro_wake_word:
  on_wake_word_detected:
    # then:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
        silence_detection: true    # defaults to true.
#    - light.turn_on:
#        id: led_ww           
#        red: 30%
#        green: 30%
#        blue: 70%
#        brightness: 60%
#        effect: fast pulse 
  model: hey_jarvis

voice_assistant:
#  use_wake_word: false
  id: va
  microphone: va_mic
  auto_gain: 31dBFS
  noise_suppression_level: 2
  volume_multiplier: 2.0            #2.0
  speaker: va_speaker
  on_stt_end:
       then: 
#         - light.turn_off: led_ww
  on_error:
          - micro_wake_word.start:  
  on_end:
        then:
#          - light.turn_off: led_ww
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start:  

ESPHome logs
“Hey Jarvis, what’s the time?”

[18:56:56][D][micro_wake_word:363]: Wake word sliding average probability is 0.574 and most recent probability is 0.957
[18:56:56][D][micro_wake_word:129]: Wake Word Detected
[18:56:56][D][micro_wake_word:178]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[18:56:56][D][micro_wake_word:135]: Stopping Microphone
[18:56:56][D][micro_wake_word:178]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:56:56][D][esp-idf:000]: I (4556305) I2S: DMA queue destroyed
[18:56:56]
[18:56:56][D][micro_wake_word:178]: State changed from STOPPING_MICROPHONE to IDLE
[18:56:56][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[18:56:56][D][voice_assistant:510]: Desired state set to START_PIPELINE
[18:56:56][D][voice_assistant:221]: Starting Microphone
[18:56:56][D][ring_buffer:024]: Created ring buffer with size 16384
[18:56:56][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:56:56][D][esp-idf:000]: I (4556311) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[18:56:56]
[18:56:56][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[18:56:56][D][voice_assistant:275]: Requesting start...
[18:56:56][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:56:56][D][voice_assistant:525]: Client started, streaming microphone
[18:56:56][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:56:56][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[18:56:56][D][voice_assistant:627]: Event Type: 1
[18:56:56][D][voice_assistant:630]: Assist Pipeline running
[18:56:56][D][voice_assistant:627]: Event Type: 3
[18:56:56][D][voice_assistant:641]: STT started
[18:56:57][D][voice_assistant:627]: Event Type: 11
[18:56:57][D][voice_assistant:781]: Starting STT by VAD
[18:56:58][D][voice_assistant:627]: Event Type: 12
[18:56:58][D][voice_assistant:785]: STT by VAD end
[18:56:58][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[18:56:58][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE
[18:56:58][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:56:58][D][esp-idf:000]: I (4558783) I2S: DMA queue destroyed
[18:56:58]
[18:56:58][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[18:57:04][D][voice_assistant:627]: Event Type: 4
[18:57:04][D][voice_assistant:655]: Speech recognised as: " What's the time?"
[18:57:04][D][voice_assistant:627]: Event Type: 5
[18:57:04][D][voice_assistant:660]: Intent started
[18:57:06][D][voice_assistant:627]: Event Type: 6
[18:57:06][D][voice_assistant:627]: Event Type: 7
[18:57:06][D][voice_assistant:683]: Response: "Sorry, I am not aware of any device called time?"
[18:57:06][D][voice_assistant:627]: Event Type: 98
[18:57:06][D][voice_assistant:768]: TTS stream start
[18:57:06][D][esp-idf:000][speaker_task]: I (4567203) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[18:57:06]
[18:57:06][D][voice_assistant:627]: Event Type: 2
[18:57:06][D][voice_assistant:717]: Assist Pipeline ended
[18:57:06][D][i2s_audio.speaker:206]: Started I2S Audio Speaker
[18:57:09][D][voice_assistant:627]: Event Type: 99
[18:57:09][D][voice_assistant:776]: TTS stream end
[18:57:09][D][voice_assistant:375]: End of audio stream received
[18:57:09][D][voice_assistant:504]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[18:57:09][D][voice_assistant:510]: Desired state set to RESPONSE_FINISHED
[18:57:10][D][i2s_audio.speaker:210]: Stopping I2S Audio Speaker
[18:57:10][D][i2s_audio.speaker:222]: Stopped I2S Audio Speaker
[18:57:10][D][voice_assistant:407]: Speaker has finished outputting all audio
[18:57:10][D][voice_assistant:504]: State changed from RESPONSE_FINISHED to IDLE
[18:57:10][D][voice_assistant:510]: Desired state set to IDLE
[18:57:10][D][micro_wake_word:178]: State changed from IDLE to START_MICROPHONE
[18:57:10][D][micro_wake_word:116]: Starting Microphone
[18:57:10][D][micro_wake_word:178]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:57:10][D][esp-idf:000]: I (4570425) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[18:57:10]
[18:57:10][D][micro_wake_word:178]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD

Hardware


Guides and Resources I used

I thought this bug might be relevant, but others seem to have resolved the issue, while I have not.

I have same issue :confused:

Hi

do you use 5V pinout on ESP2-S3 for MAX98357 Vin ?
You have to solder the pad here

Thanks. I hear a loud crackling sound between powering the voice assistant and the first time it speaks. But that’s progress.

Although, in order for the voice assistant to hear me I have to muffle the speaker with my hands.

I’m facing same issue, nothing seems to help.

I think there is a issue with reading tts wav file, my log:

[13:50:41][D][voice_assistant:715]: Response URL: "http://192.168.0.160:8123/api/tts_proxy/bb4c44570a13d4d6785b9dd975a41a337846fa48_pl_2c82848529_tts.google_en_com.wav"
[13:50:41][D][voice_assistant:514]: State changed from IDLE to STREAMING_RESPONSE
[13:50:41][D][voice_assistant:520]: Desired state set to STREAMING_RESPONSE
[13:50:41][D][voice_assistant:381]: End of audio stream received
[13:50:41][D][voice_assistant:514]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[13:50:41][D][voice_assistant:520]: Desired state set to RESPONSE_FINISHED
[13:50:42][D][i2s_audio.speaker:228]: Started I2S Audio Speaker
[13:50:42][D][voice_assistant:637]: Event Type: 2
[13:50:42][D][voice_assistant:729]: Assist Pipeline ended
[13:50:42][D][i2s_audio.speaker:233]: Stopping I2S Audio Speaker
[13:50:42][D][i2s_audio.speaker:242]: Stopped I2S Audio Speaker
[13:50:42][D][voice_assistant:417]: Speaker has finished outputting all audio
[13:50:42][D][voice_assistant:514]: State changed from RESPONSE_FINISHED to IDLE
[13:50:42][D][voice_assistant:520]: Desired state set to IDLE

I doubt it can download wav file within 1 millisecond.