Hi,
Can I please have some help with my ESPHome voice assistant?
It’s detecting my wake word, but I can’t hear any response through the speaker. Although, the ESPHome logs suggest that it’s trying to output something.
I’ve tried swapping out the hardware to see if I have faulty boards. I’ve tried different configurations, including a shared I2S bus, and separate buses.
Text to speech works if I pick a different speaker and don’t use speech to text to trigger it.
My logs show that speech to text also works when talking to this satellite that won’t make sound.
Assist Configuration
ESPHome Configuration
substitutions:
name: esphome-voice-satellite-dev
friendly_name: ESPHome Voice Satellite Dev
esphome:
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
platformio_options:
board_build.flash_mode: dio
project:
name: "dan.voice_assistant"
version: '1.0'
min_version: 2023.11.5
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3 # This shouldn't be needed.
flash_size: 16MB
framework:
type: esp-idf #arduino
version: recommended #4.4.6
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_AUDIO_BOARD_CUSTOM: "y"
psram:
mode: octal
speed: 80MHz
# Enable logging
logger:
# Enable Home Assistant API
api:
on_client_connected:
then:
- delay: 50ms
# - light.turn_off: led_ww
- micro_wake_word.start:
on_client_disconnected:
then:
- voice_assistant.stop:
# Allow Over-The-Air updates
ota:
- platform: esphome
password: !secret ota_password
# Allow provisioning Wi-Fi via serial
improv_serial:
wifi:
ssid: !secret wifi_iot_ssid
password: !secret wifi_iot_password
# Set up a wifi access point
ap: {}
# In combination with the `ap` this allows the user
# to provision wifi credentials to the device via WiFi AP.
captive_portal:
dashboard_import:
package_import_url: github://esphome/firmware/esphome-web/esp32s3.yaml@v2
import_full_config: true
# Sets up Bluetooth LE (Only on ESP32) to allow the user
# to provision wifi credentials to the device.
esp32_improv:
authorizer: none
# To have a "next url" for improv serial
web_server:
i2s_audio:
- id: i2s_mic
i2s_lrclk_pin: GPIO3 #WS
i2s_bclk_pin: GPIO5 #SCK
- id: i2s_speaker
i2s_lrclk_pin: GPIO6 #LRC
i2s_bclk_pin: GPIO7 #BLCK
#id: i2s_main
#i2s_lrclk_pin: GPIO7
#i2s_bclk_pin: GPIO6
#access_mode: duplex
microphone:
- platform: i2s_audio
id: va_mic
i2s_audio_id: i2s_mic
adc_type: external
i2s_din_pin: GPIO4 # SD Pin of INMP441 Microphone
channel: left # worked without this?
pdm: false
bits_per_sample: 32 bit
speaker:
- platform: i2s_audio
id: va_speaker
i2s_audio_id: i2s_speaker
dac_type: external
i2s_dout_pin: GPIO8 # DIN Pin of the MAX98357A Audio Amplifier
mode: mono
micro_wake_word:
on_wake_word_detected:
# then:
- voice_assistant.start:
wake_word: !lambda return wake_word;
silence_detection: true # defaults to true.
# - light.turn_on:
# id: led_ww
# red: 30%
# green: 30%
# blue: 70%
# brightness: 60%
# effect: fast pulse
model: hey_jarvis
voice_assistant:
# use_wake_word: false
id: va
microphone: va_mic
auto_gain: 31dBFS
noise_suppression_level: 2
volume_multiplier: 2.0 #2.0
speaker: va_speaker
on_stt_end:
then:
# - light.turn_off: led_ww
on_error:
- micro_wake_word.start:
on_end:
then:
# - light.turn_off: led_ww
- wait_until:
not:
voice_assistant.is_running:
- micro_wake_word.start:
ESPHome logs
“Hey Jarvis, what’s the time?”
[18:56:56][D][micro_wake_word:363]: Wake word sliding average probability is 0.574 and most recent probability is 0.957
[18:56:56][D][micro_wake_word:129]: Wake Word Detected
[18:56:56][D][micro_wake_word:178]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[18:56:56][D][micro_wake_word:135]: Stopping Microphone
[18:56:56][D][micro_wake_word:178]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:56:56][D][esp-idf:000]: I (4556305) I2S: DMA queue destroyed
[18:56:56]
[18:56:56][D][micro_wake_word:178]: State changed from STOPPING_MICROPHONE to IDLE
[18:56:56][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[18:56:56][D][voice_assistant:510]: Desired state set to START_PIPELINE
[18:56:56][D][voice_assistant:221]: Starting Microphone
[18:56:56][D][ring_buffer:024]: Created ring buffer with size 16384
[18:56:56][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:56:56][D][esp-idf:000]: I (4556311) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[18:56:56]
[18:56:56][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[18:56:56][D][voice_assistant:275]: Requesting start...
[18:56:56][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:56:56][D][voice_assistant:525]: Client started, streaming microphone
[18:56:56][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:56:56][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[18:56:56][D][voice_assistant:627]: Event Type: 1
[18:56:56][D][voice_assistant:630]: Assist Pipeline running
[18:56:56][D][voice_assistant:627]: Event Type: 3
[18:56:56][D][voice_assistant:641]: STT started
[18:56:57][D][voice_assistant:627]: Event Type: 11
[18:56:57][D][voice_assistant:781]: Starting STT by VAD
[18:56:58][D][voice_assistant:627]: Event Type: 12
[18:56:58][D][voice_assistant:785]: STT by VAD end
[18:56:58][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[18:56:58][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE
[18:56:58][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:56:58][D][esp-idf:000]: I (4558783) I2S: DMA queue destroyed
[18:56:58]
[18:56:58][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[18:57:04][D][voice_assistant:627]: Event Type: 4
[18:57:04][D][voice_assistant:655]: Speech recognised as: " What's the time?"
[18:57:04][D][voice_assistant:627]: Event Type: 5
[18:57:04][D][voice_assistant:660]: Intent started
[18:57:06][D][voice_assistant:627]: Event Type: 6
[18:57:06][D][voice_assistant:627]: Event Type: 7
[18:57:06][D][voice_assistant:683]: Response: "Sorry, I am not aware of any device called time?"
[18:57:06][D][voice_assistant:627]: Event Type: 98
[18:57:06][D][voice_assistant:768]: TTS stream start
[18:57:06][D][esp-idf:000][speaker_task]: I (4567203) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[18:57:06]
[18:57:06][D][voice_assistant:627]: Event Type: 2
[18:57:06][D][voice_assistant:717]: Assist Pipeline ended
[18:57:06][D][i2s_audio.speaker:206]: Started I2S Audio Speaker
[18:57:09][D][voice_assistant:627]: Event Type: 99
[18:57:09][D][voice_assistant:776]: TTS stream end
[18:57:09][D][voice_assistant:375]: End of audio stream received
[18:57:09][D][voice_assistant:504]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[18:57:09][D][voice_assistant:510]: Desired state set to RESPONSE_FINISHED
[18:57:10][D][i2s_audio.speaker:210]: Stopping I2S Audio Speaker
[18:57:10][D][i2s_audio.speaker:222]: Stopped I2S Audio Speaker
[18:57:10][D][voice_assistant:407]: Speaker has finished outputting all audio
[18:57:10][D][voice_assistant:504]: State changed from RESPONSE_FINISHED to IDLE
[18:57:10][D][voice_assistant:510]: Desired state set to IDLE
[18:57:10][D][micro_wake_word:178]: State changed from IDLE to START_MICROPHONE
[18:57:10][D][micro_wake_word:116]: Starting Microphone
[18:57:10][D][micro_wake_word:178]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:57:10][D][esp-idf:000]: I (4570425) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[18:57:10]
[18:57:10][D][micro_wake_word:178]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
Hardware
- ESP32-S3 N16R8
https://www.aliexpress.com/item/1005006266375800.html?spm=a2g0o.order_list.order_list_main.5.394318022EFfZ4 - MAX98357
https://www.aliexpress.com/item/1005006382608935.html?spm=a2g0o.order_list.order_list_main.10.394318022EFfZ4 - INMP441
https://www.aliexpress.com/item/1005006109471759.html?spm=a2g0o.order_list.order_list_main.15.394318022EFfZ4 - 3W 4R speaker
https://www.aliexpress.com/item/32860336112.html?spm=a2g0o.order_list.order_list_main.20.394318022EFfZ4
Guides and Resources I used
- ESP32 & ESPHome Voice Assistant · GitHub
- How To Setup On-Device Wake Word Detection For Voice Assistant using Micro Wake Word | Smart Home Circle
- How I Created My Voice Assistant With On-Device Wake Word Detection On ESP32 Using Micro Wake Word | Smart Home Circle
I thought this bug might be relevant, but others seem to have resolved the issue, while I have not.