Rhasspy offline voice assistant toolkit

Hi, thanks a lot for your work on Rhasspy ! Using the official doc and the info here I managed to have a basically working setup. (I basically do everything as in the previously cited post except that I don’t use NodeRed, instead I use AppDaemon for advanced automation.)

My only problem is that the system seems not to hear the first word after the wakeword. For example, I say “Maison, allume la lumière”, which is French for “Home, turn on the light” (“Maison” is my wake word), and according to the dialogue manager Rhasspy only understands “la lumière” (“the light”).

Initially I thought it was the wakeword detection that was erratic or slow, but no, it works excellently and waiting between the wakeword and the command doesn’t change anything (“maison allume la lumière” produces the same behavior). On the other hand, adding some random speech before the command (“maison blah allume la lumière”) works (“blah” is discarded and Rhasspy gets the full command).

So I currently think the issue is with the VAD component, that it’s too slow or too aggressive. But I had a look at the code and saw that by default, webrtcvad is already set to the least aggressive setting (vad_mode = 0).

Any idea what could cause the problem or how to fix it ? I’ve tried searching for people with similar issues but those forums are quite confusing, and I don’t see anyone complaining about webrtcvad.

If my diagnosis is correct, a hack would be to start listening immediately after Snowboy detects a wake word, and only use webrtcvad to detects when to stop listening. Would it seem reasonable to you ?