Year of the Voice - Chapter 4: Wake words

pcwii · November 17, 2023, 2:08am

I got the original code from discord. Can’t locate the original message but if I find it I will link it. A quick look on github and I don’t see it there either. Things may have changed but I still use the code I linked you too and it is working for me. Good luck!

pcwii · November 17, 2023, 2:11am

And just after I posted I found the github link that I originally tried the code from.

github.com/esphome/issues

M5Stack Atom Echo no longer detects voice to start wakeword detection after a while

opened 04:03PM - 26 Oct 23 UTC

rccoleman

### The problem There's been quite a bit of discussion about the M5 Atom Echo… no longer responding to the programmed wakeword after "a while" (like overnight for me), but I don't see anything here and it seems related to the ESPHome build on this device. See the tail end of https://github.com/home-assistant/core/issues/101995 for observations and logs from the 2023.10.3 ESPHome build. In summary, the device seems to get stuck in WAITING_FOR_VAD mode until I either shout directly at it at close range, or sometimes even that's not enough. Without it transitioning from that state to recording voice, it won't detect the wakeword. Toggling the "Use wake word" switch in HA always restores normal behavior for me, and the device is otherwise responsive to commands from HA (like Use listen light) even when it's not detecting voice. ### Which version of ESPHome has the issue? 2023.10.3 ### What type of installation are you using? Docker ### Which version of Home Assistant has the issue? 2023.11.0b0 currently, but also 2023.10.x ### What platform are you using? ESP32 ### Board M5Stack Atom Echo ### Component causing the issue Voice Assistant ### Example YAML snippet _No response_ ### Anything in the logs that might be useful for us? _No response_ ### Additional information I'm using the default, unmodified voice assistant configuration for the Atom Echo, as first flashed via <https://www.home-assistant.io/voice_control/thirteen-usd-voice-remote/> and then adopted by ESPHome and clean/build/install with updated ESPHome releases. There are various reports of updates fixing this issue (including from me), but I think it's just that it's not consistent and takes a while to reproduce. I started absolutely clean with the ESPHome 2023.10.3 container image, cleaned build files, and reinstalled the image on both devices yesterday and one was already exhibiting this behavior this morning.

hanogs · November 24, 2023, 5:21pm

anyone getting climate requests working. Z2M trv. Asking “set lounge radiator to 19” response “sorry I didn’t understand that”
using M5Stack Atom Echo

hanogs · November 26, 2023, 1:02pm

Not a fix but was having the same issue with “ok nabu” switched to “hey rhasspy” and it works better.

Merc · November 26, 2023, 4:19pm

Hi all,
I wanted to try the porcupine1 wake word engine and also generated a custom wake word on picovoice.
It did a good job and I downloaded the .ppn file.

Is there a way to use this in the porcupine add-on?

Thanks,

Merc

will35 · November 27, 2023, 11:29am

Hello,

No way to use custom wake word with pocupine1

Merc · November 27, 2023, 12:31pm

Thanks will35.

Had not found that one despite googeling quite a bit.

Cheers,
Merc

Doni49 · November 27, 2023, 8:18pm

I have a bunch of unused Android phones and tablets. They all have HA Companion App installed.

It would be fantastic if the HA app supported wake word detection. Is that on the roadmap?

Endlessvoid · November 28, 2023, 4:20am

Its in the works per mikes earlier comment in this thread:

sparkydave · November 29, 2023, 2:11am

Not sure if I should start a new thread but I have an existing ESP32 that was acting as a Bluetooth Proxy and wanted to add voice detection to it so I added the below code. It doesn’t have any errors (comiles fine and doesn’t give serial log errors while running) but I don’t get any wake word action from it so not sure where to go next…

Code ‘borrowed’ from this forum.

ESPhome code

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO19
    i2s_bclk_pin: GPIO18

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO23
    
voice_assistant:
  id: va
  microphone: mic_i2s
  noise_suppression_level: 2

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

Voice Assist settings:

ESPhome logs

[10:15:40][D][voice_assistant:529]: Event Type: 0
[10:15:40][D][voice_assistant:529]: Event Type: 2
[10:15:40][D][voice_assistant:619]: Assist Pipeline ended
[10:15:40][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[10:15:40][D][voice_assistant:428]: Desired state set to IDLE
[10:15:40][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[10:15:40][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[10:15:40][D][voice_assistant:206]: Requesting start…
[10:15:40][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:15:40][D][voice_assistant:443]: Client started, streaming microphone
[10:15:40][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:15:40][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[10:15:40][D][voice_assistant:529]: Event Type: 1
[10:15:41][D][voice_assistant:532]: Assist Pipeline running
[10:15:41][D][voice_assistant:529]: Event Type: 9
[10:15:46][D][voice_assistant:529]: Event Type: 0
[10:15:46][D][voice_assistant:529]: Event Type: 2
[10:15:46][D][voice_assistant:619]: Assist Pipeline ended
[10:15:46][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[10:15:46][D][voice_assistant:428]: Desired state set to IDLE
[10:15:46][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[10:15:46][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[10:15:46][D][voice_assistant:206]: Requesting start…
[10:15:46][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:15:46][D][voice_assistant:443]: Client started, streaming microphone
[10:15:46][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:15:46][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[10:15:46][D][voice_assistant:529]: Event Type: 1
[10:15:46][D][voice_assistant:532]: Assist Pipeline running
[10:15:46][D][voice_assistant:529]: Event Type: 9
[10:15:51][D][voice_assistant:529]: Event Type: 0
[10:15:51][D][voice_assistant:529]: Event Type: 2
[10:15:51][D][voice_assistant:619]: Assist Pipeline ended
[10:15:51][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[10:15:51][D][voice_assistant:428]: Desired state set to IDLE
[10:15:51][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[10:15:51][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[10:15:51][D][voice_assistant:206]: Requesting start…
[10:15:51][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:15:51][D][voice_assistant:443]: Client started, streaming microphone
[10:15:51][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:15:51][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[10:15:51][D][voice_assistant:529]: Event Type: 1
[10:15:51][D][voice_assistant:532]: Assist Pipeline running
[10:15:51][D][voice_assistant:529]: Event Type: 9
[10:15:56][D][voice_assistant:529]: Event Type: 0
[10:15:56][D][voice_assistant:529]: Event Type: 2
[10:15:56][D][voice_assistant:619]: Assist Pipeline ended
[10:15:56][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[10:15:56][D][voice_assistant:428]: Desired state set to IDLE
[10:15:56][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[10:15:56][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[10:15:56][D][voice_assistant:206]: Requesting start…
[10:15:56][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:15:56][D][voice_assistant:443]: Client started, streaming microphone
[10:15:56][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:15:56][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[10:15:56][D][voice_assistant:529]: Event Type: 1
[10:15:56][D][voice_assistant:532]: Assist Pipeline running
[10:15:56][D][voice_assistant:529]: Event Type: 9
[10:16:01][D][voice_assistant:529]: Event Type: 0
[10:16:01][D][voice_assistant:529]: Event Type: 2
[10:16:01][D][voice_assistant:619]: Assist Pipeline ended
[10:16:01][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[10:16:01][D][voice_assistant:428]: Desired state set to IDLE
[10:16:01][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[10:16:01][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[10:16:01][D][voice_assistant:206]: Requesting start…
[10:16:01][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:16:01][D][voice_assistant:443]: Client started, streaming microphone
[10:16:01][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:16:01][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[10:16:01][D][voice_assistant:529]: Event Type: 1
[10:16:01][D][voice_assistant:532]: Assist Pipeline running
[10:16:01][D][voice_assistant:529]: Event Type: 9

Any tips of what to test?

stuartiannaylor · November 29, 2023, 2:19am

Currently the code which is on github uses just a single part of the Esspressif ADF and has the VAD turned to max which is the most restrictive, likely due to false detection.
Also there seems to be little data on initial input volume or AGC and likely an option to trigger and capture the pcm so you can have a quick look in something like Audacity would tell much and having a setting for VAD threshold would likely be a good idea.
I guess a search for Event Type: 2 as must be documented somewhere but likely will have to trawl code.

sparkydave · November 29, 2023, 2:21am

I know of this software but how do you suggest using it in this instance?

stuartiannaylor · November 29, 2023, 2:23am

Dunno as its often an ommision but a option to record to file is a god send for debugging and setup to get optimal volumes and testing.

sparkydave · November 29, 2023, 2:24am

As in record to file using the ESP? No idea how I would do that…

stuartiannaylor · November 29, 2023, 2:26am

Nope your getting events so what audio are you recieving?
But also a websockets to ALSA source to actually test the equipment in setup or android app would likely be wise.

sparkydave · November 29, 2023, 2:30am

The events are coming through just during ambient background sound. Where do I see/hear what the ESP is actually picking up?

Well Googling ALSA didn’t help me (just a heap of Australian companies using that acronym), sorry, can you please explain?

stuartiannaylor · November 29, 2023, 2:33am

https://wiki.archlinux.org/title/Advanced_Linux_Sound_Architecture
Or portaudio which is a cross platform audio lib.
Currently you can hack it as the input to the KWS is a chunked raw audio stream that you could pipe into aplay on linux so at that point in the code pass it to stdout or save as a file.

It really needs devel I guess as it doesn’t have a setup app or debug currently as far as I know, as not a user.

sparkydave · November 29, 2023, 2:42am

Seems extremely complicated…

stuartiannaylor · November 29, 2023, 2:50am

Until its added likely needs a smattering of python, apols.

stuartiannaylor · November 29, 2023, 11:22am

github.com/esphome/issues

voice_assistant restarts every 5 seconds and does not respond to wake word

opened 06:05AM - 18 Nov 23 UTC

joshuaboniface

### The problem I'm running what seems like a fairly common INMP441-based wak…e-word enabled `voice_assistant` setup. Every 5 seconds without fail, the voice_assistant component restarts, I presume because it's ending a Wake Word cycle. The problem is, every time it does this, the assistant is unresponsive to the wake word for 0.5-1 seconds after the assistant restarts. This happens basically as soon as the assist pipeline restarts, and persists for a little bit after the last `Event Type: 9` is posted. It may seem like an insignificantly short time, but with a 5s restart interval that means that close to 10-20% of the time I try to wake it, on average, it ignores me, and I've really noticed this in trying to use it. I provide a time range there because it's actually really random. Sometimes it seems to catch the wake word just enough to work, but there's just a slight but noticeable delay before it starts listening; other times the wake word is completely ignored. I'm not sure if the bug is the fact that the voice_assistant is restarting so often, and if this is somehow configurable, or if the actual bug is deeper and is related to the fact the mic isn't listening to the wake word for some time after the restart. And I'm also at a bit of a loss for how to debug it further, but am open to suggestions. For what it's worth, it's also really slow to start listening to the wake word again after a response or on boot-up, by about the same 0.5-1s. I can ignore it there but I can't really ignore it when it happens all the time. ### Which version of ESPHome has the issue? 2023.10.1 through 2023.11.2 at least ### What type of installation are you using? pip ### Which version of Home Assistant has the issue? 2023.11.2 ### What platform are you using? ESP32-IDF ### Board esp32dev ### Component causing the issue voice_assistant ### Example YAML snippet ```yaml i2s_audio: i2s_lrclk_pin: GPIO25 i2s_bclk_pin: GPIO26 microphone: - platform: i2s_audio id: mic adc_type: external i2s_din_pin: GPIO27 pdm: false voice_assistant: microphone: mic use_wake_word: false noise_suppression_level: 2 auto_gain: 31dBFS volume_multiplier: 3.0 id: assist on_wake_word_detected: on_listening: on_stt_end: on_tts_start: ``` Do note that all the `on_` entries there have things in them related to my feedback system, but that doesn't seem relevant to this issue. Also note that these are indeed the only ones I have defined. ### Anything in the logs that might be useful for us? ```txt [00:54:30][D][voice_assistant:502]: Event Type: 0 [00:54:30][D][voice_assistant:502]: Event Type: 2 [00:54:30][D][voice_assistant:589]: Assist Pipeline ended [00:54:30][D][voice_assistant:395]: State changed from STREAMING_MICROPHONE to IDLE [00:54:30][D][voice_assistant:401]: Desired state set to IDLE [00:54:30][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE [00:54:30][D][voice_assistant:401]: Desired state set to START_MICROPHONE [00:54:30][D][voice_assistant:206]: Requesting start... [00:54:30][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE [00:54:30][D][voice_assistant:416]: Client started, streaming microphone [00:54:30][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [00:54:30][D][voice_assistant:401]: Desired state set to STREAMING_MICROPHONE [00:54:30][D][voice_assistant:502]: Event Type: 1 [00:54:30][D][voice_assistant:505]: Assist Pipeline running [00:54:30][D][voice_assistant:502]: Event Type: 9 [00:54:35][D][voice_assistant:502]: Event Type: 0 [00:54:35][D][voice_assistant:502]: Event Type: 2 [00:54:35][D][voice_assistant:589]: Assist Pipeline ended [00:54:35][D][voice_assistant:395]: State changed from STREAMING_MICROPHONE to IDLE [00:54:35][D][voice_assistant:401]: Desired state set to IDLE [00:54:35][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE [00:54:35][D][voice_assistant:401]: Desired state set to START_MICROPHONE [00:54:35][D][voice_assistant:206]: Requesting start... [00:54:35][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE [00:54:35][D][voice_assistant:416]: Client started, streaming microphone [00:54:35][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [00:54:35][D][voice_assistant:401]: Desired state set to STREAMING_MICROPHONE [00:54:35][D][voice_assistant:502]: Event Type: 1 [00:54:35][D][voice_assistant:505]: Assist Pipeline running [00:54:35][D][voice_assistant:502]: Event Type: 9 ``` Log of two restart events. The failure to listen to the wake word persists after the final `Event Type: 9` by 0.5-1s each time. ### Additional information _No response_

Do that as is there is a debug