Managed to get the file down to 7kb, and it seems to play okay — unfortunately, still having an issue actually getting the playback to happen. Sometimes it starts okay, but then after finishing the response it flicks straight back into “assist mode” and flashes the lights, causing a loop.
Normally though, it hears the wake word, plays the sound, and then flashes red after my voice command has finished and struggles to play back the response.
[17:34:45][D][voice_assistant:636]: Wake word detected
[17:34:45][D][switch:012]: 'Play Wakeword sound' Turning ON.
[17:34:45][D][switch:055]: 'Play Wakeword sound': Sending state ON
[17:34:45][D][voice_assistant:620]: Signaling stop...
[17:34:45][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[17:34:45][D][voice_assistant:510]: Desired state set to IDLE
[17:34:45][D][voice_assistant:627]: Event Type: 3
[17:34:45][D][voice_assistant:641]: STT started
[17:34:45][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[17:34:45][D][light:036]: 'Atom Echo' Setting:
[17:34:45][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[17:34:45][D][light:109]: Effect: 'Slow Pulse'
[17:34:45][D][esp-idf:000]: I (60569) I2S: DMA queue destroyed
[17:34:45]
[17:34:45][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to IDLE
[17:34:45][D][esp-idf:000][speaker_task]: I (60581) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[17:34:45]
[17:34:45][D][i2s_audio.speaker:203]: Starting I2S Audio Speaker
[17:34:45][D][i2s_audio.speaker:206]: Started I2S Audio Speaker
[17:34:46][D][esp-idf:000][speaker_task]: I (60908) I2S: DMA queue destroyed
[17:34:46]
[17:34:46][D][i2s_audio.speaker:210]: Stopping I2S Audio Speaker
[17:34:46][D][i2s_audio.speaker:222]: Stopped I2S Audio Speaker
[17:34:46][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[17:34:46][D][esp32.preferences:114]: Saving 2 preferences to flash...
[17:34:46][D][esp32.preferences:143]: Saving 2 preferences to flash: 0 cached, 2 written, 0 failed
[17:34:46][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[17:34:46][D][voice_assistant:510]: Desired state set to START_PIPELINE
[17:34:46][D][voice_assistant:221]: Starting Microphone
[17:34:46][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[17:34:46][D][esp-idf:000]: I (61586) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
...
[17:37:51][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:37:51][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:37:51][E][voice_assistant:804]: Cannot receive audio, buffer is full
[17:37:51][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:37:51][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:37:51][W][i2s_audio.speaker:042]: Called start while task has been already created.
Tried without delay, a 0.5s and a 1s delay and currently all performing the same “assist start” > “flash red” > “fail to respond” loop.
And a full log from restarting the device, speaking the wakeword, and asking it to turn on the bedroom light (which it misheard as “battery light” )
[17:47:04][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[17:47:04][D][voice_assistant:510]: Desired state set to WAITING_FOR_VAD
[17:47:04][D][voice_assistant:245]: Waiting for speech...
[17:47:04][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[17:47:04][D][voice_assistant:258]: VAD detected speech
[17:47:04][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE
[17:47:04][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[17:47:04][D][voice_assistant:275]: Requesting start...
[17:47:04][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[17:47:04][D][voice_assistant:525]: Client started, streaming microphone
[17:47:04][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[17:47:04][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[17:47:04][D][light:036]: 'Atom Echo' Setting:
[17:47:04][D][light:051]: Brightness: 60%
[17:47:04][D][light:059]: Red: 100%, Green: 89%, Blue: 71%
[17:47:04][D][voice_assistant:627]: Event Type: 1
[17:47:04][D][voice_assistant:630]: Assist Pipeline running
[17:47:04][D][voice_assistant:627]: Event Type: 9
[17:47:09][D][voice_assistant:627]: Event Type: 10
[17:47:09][D][voice_assistant:636]: Wake word detected
[17:47:09][D][switch:012]: 'Play Wakeword sound' Turning ON.
[17:47:09][D][switch:055]: 'Play Wakeword sound': Sending state ON
[17:47:09][D][voice_assistant:620]: Signaling stop...
[17:47:09][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[17:47:09][D][voice_assistant:510]: Desired state set to IDLE
[17:47:09][D][voice_assistant:627]: Event Type: 3
[17:47:09][D][voice_assistant:641]: STT started
[17:47:09][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[17:47:09][D][light:036]: 'Atom Echo' Setting:
[17:47:09][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[17:47:09][D][light:109]: Effect: 'Slow Pulse'
[17:47:09][D][esp-idf:000]: I (55511) I2S: DMA queue destroyed
[17:47:09]
[17:47:09][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to IDLE
[17:47:09][D][esp-idf:000][speaker_task]: I (55523) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[17:47:09]
[17:47:09][D][i2s_audio.speaker:203]: Starting I2S Audio Speaker
[17:47:09][D][i2s_audio.speaker:206]: Started I2S Audio Speaker
[17:47:09][D][esp-idf:000][speaker_task]: I (55850) I2S: DMA queue destroyed
[17:47:09]
[17:47:09][D][i2s_audio.speaker:210]: Stopping I2S Audio Speaker
[17:47:09][D][i2s_audio.speaker:222]: Stopped I2S Audio Speaker
[17:47:09][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[17:47:09][D][voice_assistant:510]: Desired state set to START_PIPELINE
[17:47:09][D][voice_assistant:221]: Starting Microphone
[17:47:09][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[17:47:09][D][esp-idf:000]: I (56026) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[17:47:09]
[17:47:09][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[17:47:09][D][voice_assistant:275]: Requesting start...
[17:47:09][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[17:47:09][D][voice_assistant:525]: Client started, streaming microphone
[17:47:09][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[17:47:09][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[17:47:09][D][voice_assistant:627]: Event Type: 1
[17:47:09][D][voice_assistant:630]: Assist Pipeline running
[17:47:09][D][voice_assistant:627]: Event Type: 3
[17:47:09][D][voice_assistant:641]: STT started
[17:47:09][D][light:036]: 'Atom Echo' Setting:
[17:47:09][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[17:47:11][D][voice_assistant:627]: Event Type: 11
[17:47:11][D][voice_assistant:781]: Starting STT by VAD
[17:47:12][D][voice_assistant:627]: Event Type: 12
[17:47:12][D][voice_assistant:785]: STT by VAD end
[17:47:12][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[17:47:12][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE
[17:47:12][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[17:47:12][D][light:036]: 'Atom Echo' Setting:
[17:47:12][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[17:47:12][D][light:109]: Effect: 'Fast Pulse'
[17:47:12][D][esp-idf:000]: I (58467) I2S: DMA queue destroyed
[17:47:12]
[17:47:12][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[17:47:15][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[17:47:15][D][esp32.preferences:114]: Saving 2 preferences to flash...
[17:47:15][D][esp32.preferences:143]: Saving 2 preferences to flash: 0 cached, 2 written, 0 failed
[17:47:16][D][voice_assistant:627]: Event Type: 4
[17:47:16][D][voice_assistant:655]: Speech recognised as: " ."
[17:47:16][D][switch:016]: 'Play Wakeword sound' Turning OFF.
[17:47:16][D][switch:055]: 'Play Wakeword sound': Sending state OFF
[17:47:16][D][voice_assistant:627]: Event Type: 5
[17:47:16][D][voice_assistant:660]: Intent started
[17:47:16][D][voice_assistant:627]: Event Type: 6
[17:47:16][D][voice_assistant:627]: Event Type: 7
[17:47:16][D][voice_assistant:683]: Response: "Sorry, I couldn't understand that"
[17:47:16][D][light:036]: 'Atom Echo' Setting:
[17:47:16][D][light:051]: Brightness: 100%
[17:47:16][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[17:47:16][D][light:109]: Effect: 'None'
[17:47:16][D][voice_assistant:627]: Event Type: 98
[17:47:16][D][voice_assistant:768]: TTS stream start
[17:47:16][D][esp-idf:000][speaker_task]: I (63323) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[17:47:16]
[17:47:16][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[17:47:16][D][voice_assistant:510]: Desired state set to START_PIPELINE
[17:47:16][D][voice_assistant:221]: Starting Microphone
[17:47:16][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[17:47:16][D][esp-idf:000][speaker_task]: I (63430) I2S: DMA queue destroyed
[17:47:16]
[17:47:19][D][voice_assistant:627]: Event Type: 4
[17:47:19][D][voice_assistant:655]: Speech recognised as: " Turn on the battery light"
[17:47:19][D][voice_assistant:627]: Event Type: 5
[17:47:19][D][voice_assistant:660]: Intent started
[17:47:20][D][voice_assistant:627]: Event Type: 6
[17:47:20][D][voice_assistant:627]: Event Type: 7
[17:47:20][D][voice_assistant:683]: Response: "Sorry, I am not aware of any device called battery"
[17:47:20][D][light:036]: 'Atom Echo' Setting:
[17:47:20][D][light:051]: Brightness: 100%
[17:47:20][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[17:47:20][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:47:20][D][voice_assistant:627]: Event Type: 98
[17:47:20][D][voice_assistant:768]: TTS stream start
[17:47:20][D][voice_assistant:627]: Event Type: 8
[17:47:20][D][voice_assistant:703]: Response URL: "http://<IP_ADDR>:8123/api/tts_proxy/c53d8f663e718f89133e7251914040faa6866ede_en-gb_799a32846e_tts.piper.wav"
[17:47:20][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to STREAMING_RESPONSE
[17:47:20][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE
[17:47:20][D][voice_assistant:627]: Event Type: 2
[17:47:20][D][voice_assistant:717]: Assist Pipeline ended
[17:47:20][D][light:036]: 'Atom Echo' Setting:
[17:47:20][D][light:051]: Brightness: 60%
[17:47:20][D][light:059]: Red: 100%, Green: 89%, Blue: 71%
[17:47:21][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:47:21][W][i2s_audio.speaker:042]: Called start while task has been already created.
[17:47:21][W][i2s_audio.speaker:042]: Called start while task has been already created.
Not sure why it won’t work for you, I suspect there are some race condition happening. I tried to simply the setup a bit and got it working by just using (not needing the on_stt_end and play_wakeword_sound switch)
You can also try even longer delays before re-enabling the wake word. This whole setup is quite hacky, it would have been nice if wakeword sound was added to voice assistant component in esphome properly.
This one’s worked an absolute treat! Seems to be holding up alright after a few test runs.
Thank you so much for your assistance on getting this to work, it’s really appreciated! It’s such a shame this isn’t built-in functionality, I agree — a basic feature of pretty much every voice assistant is some kind of noise to say they’re listening, so to see it missing from ESPHome is quite frustrating.
Your code is working great for me as a stop-gap though, so thank you again for helping to diagnose and solve this issue!
Hi @umglurf,
awesome. Thanks a lot for the quick reply and for sharing the code.
Took a little while until the sound file I am using had the correct format but now it works. So helpful to get some response from the Echo without having to look for the pulsating light.
One thing that I am never sure with the ESP devices. What happens when the next ESP update is released? Will this overwrite all the settings in the config file and remove the wakeword sound again?
Hi @Merc, glad to hear it worked for you.
If you update through homeassistant, you will loose the changes. What I did now when the new feature with timer was released, was look through the published config file and apply the new changes to my config and build and upload the firmware.
Thanks umglurf,
I thought something like that might be the case.
Hope that at some point sooner or later the sound will be included as option in the official release.
Sorry if this is a stupid question, but I wanted to check if it would be possible to not receive the answer via the M5Echo Speaker.
Wife uses Alexa for everything (turn on lights, AC, etc) and used to use as well for Shopping List. However Amazon shut down the Shopping List API so I cannot “transfer the shopping list to HA”.
My solution was to use the HA Assistant to add things to the Shopping List directly (and allow the wife to still use Alexa for everything else)
Right now, I have items being added to the shopping list and the answer being played at the M5 Echo and at Alexa. Is there a way to still have the listening beep (googleok-sound-beep.raw) but surpress the response (i.e. “added butter”) ?
I don’t think there are any options for this out of the box, maybe someone else knows? A couple of possible ideas that might work are setting speaker in the voice_assistant section to a non valid id, for example no_speaker. Another option could be to create a new voice assistant config in Home Assistant, set speach to text in that to none and assign that to the m5 echo in device settings. A third idea is, if you are using AI, you can modify the prompt and tell it not to respond when adding to the shopping list.
I’ve found that commenting out all the speaker stuff really helps to stabilize things. I didn’t like using the esp speaker for feedback, so I switched to using browser_mod if I’m at my computer, a sonos speaker, or another network endpoint.
I’m likely to delete my fork some time soon, but here’s a link for a config that is working for me at the moment.
Note: you’ll need to upload a file to your home assistant’s local media and grab the content ID for that. Mine is ack.mp3 in this example. Also I have the tts entity specified in the config, but that isn’t currently doing anything. Was just experimenting with having tts say “yes?” instead of playing a sound as was described at the top of this post. The sound seems a bit neater/cleaner and faster.
I do notice that on the odd occasion, the sound being played gets picked up on the Echo, and I get an error regarding “no speech detected.” But that happens far less often than the myriad other strange crashes and issues (audio buffer full, reboots, etc) I had when the internal speaker was enabled.
Since updating to ESPHome 2024.10.0b1 I can’t update any of my Atom Echoes anymore.
INFO ESPHome 2024.10.0b1
INFO Reading configuration /config/esphome/m5stack-atom-echo-803494.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
INFO Updating https://github.com/jesserockz/esphome-components.git@None
INFO Unable to import component file: No module named 'magic'
Failed config
file: [source /data/packages/********/voice-assistant/m5stack-atom-echo.yaml:305]
Component not found: file.
- id: timer_finished_wave_file
file: |-
https://github.com/esphome/firmware/raw/main/voice-assistant/sounds/timer_finished.wav
I’m trying this on my build with a wemos d1 mini, max98357, ICS43434, and adafruit 1314 speaker and find my wake sound is heavily distorted. Does anyone else get this and were you able to resolve it?
Modifying auto gain and volume multiplier don’t seem to help.