How to add a delay between wakeword and speech recognition

Hello, I am currently converting my voice control system from Rhasspy to Assist. I have a Sonos system that I use for all my output. So far it’s all working well, I’m excited about the wildcards. But what is giving me a bit of a headache is the following:

Instead of a wake sound, I have a text signaling that the system is now listening. Currently still a simple “Yes?”

It seems that Home Assistant listens as soon as the wake word is activated. This is certainly good in most cases, but my “human” reaction to the call reduces the time I have left to express my wish.

So a delay would be very good here, which I could ideally even link to the length of the initial text transmitted.
I could also see in the logs that the VAD delay is 0.5s. It would also be great to know where I can set this.

Thanks in advance :slight_smile:
Marcus

Pretty sure both settings would be in the firmware you put on your “assist” device.
But I’m wondering if you really need a delay at all.
I use my sonos as the speaker for Assist and I have it play an mp3 and it works very well. The file is exactly one second long which I would think is longer than “Yes?”.
What device are you using and what’s the firmware look like?

I use a raspberry as a satellite for each room, running openwakeword and the assist microphone. The centerpiece is a PC, which is a bit more powerful and handles all the scripts and automation, including ZHA. The computer also runs Piper and Whisper. Each raspberry has its assist pipeline, STT and TTS are bound to the central computer to get better TTS results and to be able to run larger Whisper models. When the wakeword is activated, a series of automations are triggered on the central computer, which then activate Piper for all possible outputs, among other things. My Sonos soundbar is connected to the TV and plays voice information via media_player → announce when the TV is on. When I listen to music, the behavior changes so that I play a normal sound file instead of announce. The reason for this is that the other Sonos are in a different subnet and I simulate the announce function on the Sonos speakers in this way by waiting for the sound file to finish and then setting the Sonos speakers back to the volume set before the output.

The whole thing works well so far.
What bothers me is that, on the one hand, the “Yes?” is recorded - which can be removed - but it is also interpreted as speech input.If I then wait a few hundredths of a second too long, the pipe is finished because no further speech is recognized.
A delay would be good to avoid this.
Unfortunately, I have not found anything in the openwakeword or the Assist Microphone Addon that could influence this behavior.

Oh, all devices and instances are up to date :slight_smile:

Can’t help you there but I might look into your setup. I’m using an Atom Echo now and the only problem I have is the mic sucks! No wonder every video made about Assist has no other sounds going in the room. I need to be within a foot or so from the speaker if I have the TV on or any music playing, but if it’s silent I can be on the other side of the room and it picks me up.
My setup actually seems to waits for the “announce” sound to play and then I speak. I actually just switched it to my Google Home Mini and even with the wake-up sound the Mini plays, and then my mp3, it still let’s me say what I want with plenty of time.

I also noticed the “laboratory conditions” :slight_smile:
I guess I somehow managed to make all the dependencies that Assist needs - Whisper, Piper, Wakeword - work together, but are also somehow decoupled^^

If I play the wake sound and then speak immediately, everything is fine. The wake sound and the output from Piper start at about the same time, which then leads to the wake voice being recorded.

I would be happy to send you the parts of my implementations. A few new ideas are always good :slight_smile:

add to wyoming-satellite.service
–mic-seconds-to-mute-after-awake-wav 0.2 \

for myself, I have reduced the value. You can increase it
the standard setting for waiting for a request after the wakeword is triggered is 15 seconds. This parameter is also configurable.
In real use, the stt engine recognizes the end of the request and does not wait for the end of this period. Unless you keep quiet

Thank you for the tip. :slight_smile: I use the addons in Home Assistant OS, unfortunately these parameters are not available here. I have also not yet found out where the configuration files are located. Somehow everything is different from Rhasspy^^

I have found a solution:
It seems that the activation of the sound in the microphone addon must be activated so that there is a delay between the wakeword and the following recognition.

I have downloaded a Windows sound file that is about 1.5 seconds long. In the microphone addon I then set the sound volume multiplier to 0.01, which makes the sound almost inaudible.

It may not be the best solution, but it works :slight_smile:

1 Like

I now see why you needed a delay. I moved my Atom Echo and now it’s close enough to the Mini to hear the wake response. It always identifies the response as my “speaking” and doesn’t know what to do.

Anyone have a way to play a wake response on the Atom speaker? Not sure how to do it. I figure then I can use the wait until speaker not paying.

Hi, what works very well for me so far is a sound file that actually contains nothing at all and is 600ms long. I created the file in audacity. Since I use the addons, I simply loaded the file into the /share folder and adjusted the path in the addon configuration and activated the sound output. This gives me a delay that covers my wakeword well. Theoretically, it should not be a problem to realize different responses of approximately the same length this way.

What can I do
How can I help
What do you want

should have approximately the same length. This would be a list with these options that uses | random to pass one of the entries to - in my case it is Piper - via script and outputs it via the speaker.

I actually already do that to “mute” the onboard speaker (Atom Echo).
But the problem is the mic is picking up the sound from the Google Mini I use.
Still playing around but I might just have to move the Echo back to the old location I had it. Didn’t pick up the Mini from there.

What do you play on the Mini? Music or the signal that your assistant is listening to? Do you also use the add-ons or do you use the Wyoming Satellite solution?

so IS there a way of delaying as per the original ask? is it some config changes within any of the wyoming pipeline apps? I’m sure that it must be because on both of my setups (using a tablet with streamassist etc, and a atom m5stack thing) you pretty much have to say the wakeword and the question within the same breath, which is very frustrating given that most voice assistants have trained us into waiting for the confirmation beep before asking.