Rhasspy offline voice assistant toolkit

Hi, thanks a lot for your work on Rhasspy ! Using the official doc and the info here I managed to have a basically working setup. (I basically do everything as in the previously cited post except that I don’t use NodeRed, instead I use AppDaemon for advanced automation.)

My only problem is that the system seems not to hear the first word after the wakeword. For example, I say “Maison, allume la lumière”, which is French for “Home, turn on the light” (“Maison” is my wake word), and according to the dialogue manager Rhasspy only understands “la lumière” (“the light”).

Initially I thought it was the wakeword detection that was erratic or slow, but no, it works excellently and waiting between the wakeword and the command doesn’t change anything (“maison allume la lumière” produces the same behavior). On the other hand, adding some random speech before the command (“maison blah allume la lumière”) works (“blah” is discarded and Rhasspy gets the full command).

So I currently think the issue is with the VAD component, that it’s too slow or too aggressive. But I had a look at the code and saw that by default, webrtcvad is already set to the least aggressive setting (vad_mode = 0).

Any idea what could cause the problem or how to fix it ? I’ve tried searching for people with similar issues but those forums are quite confusing, and I don’t see anyone complaining about webrtcvad.

If my diagnosis is correct, a hack would be to start listening immediately after Snowboy detects a wake word, and only use webrtcvad to detects when to stop listening. Would it seem reasonable to you ?

Can you post your message directly on Rhasspy’s community at https://community.rhasspy.org

I too had strange issues with the « allume la lumière » intent. The ASR seems to transcribe the utterance correctly returning « allume la lumière » (OK) the intent is recognized correctly too but the raw_text in the payload is just « la lumière ».

Regarding webrtcvad… maybe using Kaldi decoder endpointing can help to avoid using it at all…

We’ll sort it out over to the Rhasspy community :wink:

2 Likes

Hi I tried everything I could find but I don’t think I can make it work the mini USB microphone .
Tried the above (.asoundrc) The test says working ( see attached) but Either I’m confused how to find if is working - tried the tab Speech with Use Rhasspy microphone nothing happens
If I use the the browser microphone - The Sentence field gets it.
So How can I use the USB Microphone? - I don’t have any other mike

I can use
Drives me nuts - Started fresh 2x the Buster + Rhasspy. 1st time was the same - I was playing with the profile.json as found here and the Microphone settings in the web GUI was broken - it only showed Default . As of now the profile looks like this

    "microphone": {
        "arecord": {
            "device": "null"
        },
        "pyaudio": {
            "device": "2"
        },
        "system": "arecord"
    }
}

Wanted to add that Using the browser microphone which works BTW- is using the USB Microphone plugged in the RPI4 from Chromium localhost:12101 - but no success with Use Rhasspy microphone on

Yes, my mistake, that’s clearly a Rhasspy question, it has nothing to do with the HA integration. Thanks.

Mmh, I was just giving that exemple but the problem seems to occur with all sentences. “éteins la lumière” has the same problème. “bonne nuit” gets shortened to just “nuit”. “quelle heure est-il” becomes “heure est-il”.

Good idea, will try, thanks !

EDIT: I tried, Kaldi is really great ! It seems it doesn’t completely solve the problem but it mitigates it a lot, and the general recognition rate is much better. On the other end, Kaldi is much slower, it takes 5-7 seconds to process a command (on a Rasberry Pi 4) whereas pocketsphinx is almost instant.

I had some similar issues initially, with the browser microphone working but neither pyaudio nor arecord. It turned out to be some permissions issue, the easiest solution for me was simply to run the Docker container with --privileged. (That’s probably a bad idea from a security point of view, there is certainly a better way to give just the right permissions to the container.)

Interesting - and how you do that? - Looks like Rhasspy is using systemd
No idea where i have to add --privileged
on
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock ?
Thx
PS
Using the .asoundrc config disabled the volume control on Buster desktop but at least is available on alsamixer

Not there - that’s the script that starts the Docker daemon (dockerd), not your container. You need to add --privileged to the command line that starts the Rhasspy container, which starts with docker run.
Personally I just used the command line from the official doc and added --privileged to the arguments.

I’m sorry - I cannot find this script - Actually I cannot find how rhasspy is autostarted - i grepped the whole Card grep -R rhasspy * | grep docker
nada

Actually why Rhasspy is auto started? - It s’d be an option to do so or not - changing the start options like this w’d be the way

Mmh, how did you start/install Rhasspy originally ? I initially started my Rhasspy by running the docker container (with docker run ...) and it is autostarted because I specifed --restart unless-stopped so Docker restarts it automatically at boot (unless I stopped it with docker stop) - I don’t know how Docker does that exactly, I’m no Docker expert.

If you are in a similar situation, you should probably stop the current container with docker stop <container id> (, delete it with docker rm), and restart a new one with docker run ... (follow the command line from the doc) and specify --privileged that time.

Ok Stopped all docker containers - BTW there were 3 synesthesiam/rhasspy-server:latest
And used the --privileged option
Rhasppy behave like a fresh install asked me to download and train
But the USB microphone - same
NOT WORKING
Browser mike with same USB mike works
I think I’ll pass… goodbye Rhasspy at least for now - way too much time wasted on it :frowning:
Even with the browser mike working only 'feedback " i get is the field sentence which display what i said and the Raw Intent JSON log - I want to hear the REPLY! - no reply whatsoever - where is it? how to enable it?
…meanwhile I installed mycroft and works amazingly well with the USB mike - but it looks that not as “open” to customization as Rhasspy - anyway I just had a glimpse of it

FYI, the first non-browser microphone I managed to make work is the GStreamer one. If you have some basic knowledge of GStreamer it is quite straightforward. The only catch is to open the corresponding UDP port in the container.

What replies ? You need to get something to generate the replies in question - Rhasspy doesn’t do that, it’s the job of whatever system handles your intents e.g. Home Assistant. Rhasspy looks for a “speech” key in the JSON returned by the intention handling backend and speaks if there is something there.

1 Like

Regarding Kaldi speed, you can try the TDNN 250 model from Paul Guyot (for french)

For now Rhasspy used the TDNN full model that weight 60MB. The 250 version performs pretty much the same and only weight 12MB. Tweaking the decode.sh script in the Rhasspy Kaldi profile folder I’m able to get ASR to return in less than a second.

@synesthesiam will probably switch to this version of the acoustic model in a future release.

See this for more info:

Hope this helps.

2 Likes

Excellent ! I will definitely try this when I have time to tweak and rebuild the docker image. Many thanks for your help.

No need to rebuild the docker image.

The Kaldi model folder is in your profile folder.

Simply swap the model files and change the options used inside the decode.sh script.

Note that downloading the profile files (in the settings tab of the web ui) will overwrite your changes.

:wink:

Hello,
I’m very new to this, and I’m trying to set this up, but I only have a 44 kHz usb microphone. By editing the audio_recorder.py file and forcing it to convert to 16 kHz, I managed to get text-to-speech to work fine, but I cannot get the wake work to work. My other alternative is sending audio input from my iphone, but it looks like I’d have to design my own app. Thank you

Is confusing since I see in Sentences
“what time is it
tell me the time” and whats the temperature
What’s the point if no answers provided
Which did appeared when i used the browser mike
I see i need a system handles your intents Beside HA what else?
IS there an example anywhere?

  • so is not for me at least for now, I may look into Home Assistant. but i don’t have any lights or switches in it - Mostly for sensors and multi players music - Lights are a totally separate Lutron installation ( pretty old HomeWorks or something like that - don’t even remember) I don’t really see how to use it in HA
    As for GStreamer i never heard of. Thanks for the tips - i learnt something! but the fact that it doesn’t work with a simple usb mike when everything else i tried works, is a turn off. Thanks again

Has anyone gotten the ReSpeaker 4 Mic Array to work with Rhasspy and Hass.io? If so, can someone please help me out? Do I need to create a asound.conf with something written in the /rhassp/ folder? I’ve tried searching everywhere but cannot get it to work.
Hassio has no underlying OS btw.

There’s a thread at the Rhasspy Community Forum for the ReSpeaker devices. Perhaps you might find something helpful there?