Hi everyone,
I recently picked up one of these over the holidays
It differs from the other echo model in two major ways:
- No RGB LED (there is a green one behind the reset button, but it isn’t exposed via pins to the esp32 as far as I can tell)
- Much larger PSRAM
The lack of LED means a typical voice pipeline now needs to issue some sort of ping/ding or sound to let the user know when the wake word is heard. Esphome has some great options in the components for this. Overall, really good, with some minor quirks I’m documenting here so we can continue improving voice.
The Echo S3R only has one I2S bus (not two like the Voice PE). This means the device "walkie talkie"s with the speaker and microphone taking turns on the bus. The public yaml for this device (esphome-yaml/common/atom-echos3r-satellite-base.yaml at 2fd326380b3ee362ddeaa0f101b1a77c195bd393 · m5stack/esphome-yaml · GitHub) works, but has some quirks. It’s easy enough to play a sound over the I2S bus to a speaker in the “on_wake_word_detected” section, but we need to clear the bus before it gets re-occupied by the next call to voice_assistant.start (which uses the bus for microphone in the STT step). If we don’t clear the bus after playing the sound, the voice_assistant component will retry occupying the bus every 1 second (which is way too long for interactive voice commands). The public yaml tries to play a ding sound, followed by a 300ms delay, which may or may not be long enough to clear the bus.
A more reliable option to get this all working smoothly is to forcibly stop the media_player and the speaker in rapid succession and waiting for the bus to clear as part of the micro_wake_word component. This lets the voice_assistant component start capturing audio for STT very quickly after the wake word ding. We also get the chance to simply drop audio that’s too long. The ding from the Voice PE project is 1 second long by default. This config trims it to 250ms, and feels much more natural in my testing.
on_wake_word_detected:
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(wake_word_triggered_sound);
- delay: 250ms
- media_player.stop:
- speaker.stop:
- wait_until:
condition:
speaker.is_stopped: i2s_speaker
- voice_assistant.start:
wake_word: !lambda return wake_word;
My testing also revealed that this won’t work within the “on_wake_word_detected” option within the voice_assistant component itself. It seemed like stopping either the microphone or speaker while inside the voice_assistant pipe resulted in the pipe stopping itself. This means this wake word ding config only works with on-device microwakewords.
Overall, really fun project, with a nice new device.