(This is about configuring ESPHome as voice assistant, I’m not sure if this belongs in the ESPHome or in the Voice Assistant category.)
I’m trying to get the Voice PE working with a server-side wake word.
I have removed micro_wake_word
from the config and I have set use_wake_word
to true:
voice_assistant:
id: va
microphone: comm_mic
media_player: nabu_media_player
use_wake_word: true
noise_suppression_level: 0
auto_gain: 0 dbfs
volume_multiplier: 1
I have assigned voice_assistant.start
to the double click and when I do the double click it successfully starts waiting for the wake word.
My issue is now that when I say the wake word, it doesn’t wait for any command, it immediately ends the recording:
- type: wake_word-end
data:
wake_word_output:
wake_word_id: americano
wake_word_phrase: americano
timestamp: 56570
timestamp: "2025-01-06T06:00:05.785091+00:00"
- type: stt-start
data:
engine: stt.rhasspy_speech
metadata:
language: en_US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timestamp: "2025-01-06T06:00:05.785806+00:00"
- type: stt-vad-start
data:
timestamp: 56710
timestamp: "2025-01-06T06:00:05.885309+00:00"
- type: stt-vad-end
data:
timestamp: 57610
timestamp: "2025-01-06T06:00:06.784492+00:00"
As you can see stt-vad-end
happens just a few milliseconds after stt-vad-start
.
The first question I need to answer in order to troubleshoot this is: Where is the silence detection supposed to happen, in ESPHome, in Home Assistant or in the speech-to-text engine (wyoming-rhasspy-speech in my case)?