Rhasspy offline voice assistant toolkit

If I say “zet de woonkamerlamp aan”, Rhasspy transcribes it correctly here. This is running the Docker image synesthesiam/rhasspy-server:latest@sha256:6434cc16e726053d973b8af654d0a0f54c0e41cee725ad0cddea942adc28d25b.

I agree with @Romkabouter that the “aan” in the WAV file from Google’s TTS sounds a little odd. I would pronounce the “aan” longer, while Google’s TTS pronounces it very shortly. But even then, it doesn’t even remotely sound like “uit”…

And I just uploaded the WAV file from Google’s TTS to my local Rhasspy installation and clicked on Transcribe WAV, and it got transcribed correctly. So is this a regression introduced recently? What’s the easiest way to create a Docker container with the latest commits from your GitHub repository to test this?

So the solution turned out to be much simpler than I thought. Hermes Audio Server now just waits for (a configurable) 2 seconds of silence after every speech fragment and only then stops streaming the audio frames. This is enough for the WebRTCVAD command listener to do its work. If this value is less than min_sec in the command listener’s configuration, it fails to pick up the command in time. I will continue some testing, but with 2 seconds this seems to work.

There are quite some false positives with the VAD in my audio server thinking there’s speech when there’s not, even with aggressiveness mode 3, and maybe I can still fine-tune it, but these false positives don’t seem to affect the wake word and command listener components. The selective audio streaming is just meant to preserve some network bandwidth, and even with the false positives it meets this purpose, so for now I’m OK with it.

I have published a new version with this feature on PyPI.

1 Like

If you’re on a PC/laptop or a regular Raspberry Pi 3 (not the B+), you can pull the latest Rhasspy Docker image. I’m trying to fix some problems with the B+ (aarch64) before bumping the Hass.IO server.

Can I see your sentences.ini file? Maybe I just don’t know what I’m doing with my Dutch…

This is great! And this is my experience with WebRTCVAD as well. It took a long time for me to get all of the settings right for the command listener to work right. The is_speech property flips constantly, though it seems somewhat related to human speech :slight_smile:

I’d like to link to your PyPI package in one of my Rhasspy examples, if you don’t mind.

Sure, you can add a link to hermes-audio-server on PyPI. Maybe it’s also interesting to add a page to the documentation with links to @Romkabouter’s Matrix-Voice-ESP32-MQTT-Audio-Streamer and Hermes Audio Server as examples of companion projects to run a standalone ‘satellite’ that lets you talk to a Rhasspy installation on another machine?

1 Like

So I tried this with the latest Docker image you updated yesterday, and now I see the same behaviour as you: the WAV file from Google’s TTS is transcribed as “zet de woonkamerlamp uit”. But when I pronounce “zet de woonkamerlamp aan” myself, it’s transcribed correctly.

I think this is just a bad voice example. I’ll see if I can record a ‘real’ example that you can use in the tests.

OK, glad it’s not just happening on my machine. A WAV file from an actual human would be much better for testing :slight_smile:

:balloon: Version 2.14 has finally been released! :confetti_ball:

Highlights include:

Important: A profile name must be given on the command line (--profile).The RHASSPY_PROFILES environment variable is now deprecated too. If you need to change where Rhasspy writes profile settings, use the --user-profiles option. See the Docker example in the README for how to use these options with docker compose.

Also, if you run Rhasspy on a server and have “satellite” Raspberry Pi’s listening, checkout @koan’s hermes-audio-server!

As always, let me know what you think or what I broke. Feel free to create a Github issue or pull request; I should be receiving notifications for those now :slight_smile:

3 Likes

Updated to 2.14. Congratulations for one more release!

I noticed that umdl files for snowboy are being searched in “/usr/share/rhasspy/profiles/en/” while they are in “/share/rhasspy/profiles/en/”. Is it planned that way?

EDIT: I tryed to create the folders (same permissions) in order to let rhasspy find the umdl, but still the same error:

AssertionError: Can't find snowboy model file (expected at /usr/share/rhasspy/profiles/en/alexa.umdl)

By the way, @synesthesiam, am I correct that when I set up Rhasspy to use MQTT for audio playing and recording, it only listens to the site ID I have entered in the MQTT settings?

I ask because I have a Pi running Hermes Audio Server in my living room, and now I have set up another one in my home office. So currently it’s not possible to use both with Rhasspy because it’s bound to one site ID?

Whoops, I think this was a mistake. I went ahead and added /profiles to the default user_dir in the Hass.IO add-on. Thanks for catching this!

I guess not unless they share the same siteID. We could always have Rhasspy take a list of site IDs as well as a single site ID. Thoughts?

That would be very nice. Snips does this too: on the base you can define in /etc/snips.toml the list of site IDs of the satellites used for audio. So when you talk to a satellite with site ID livingroom, the wake word component will trigger on this site ID and listen there for the command, and the response will also be sent to the same site ID (and not to any other site ID). This is a very nice setup where you can use various satellites in your home, all talking to the same central Rhasspy server.

But of course then Rhasspy should know the concept of a session: in Snips there’s the dialogue manager which creates a session with a random session ID after you have triggered the wake word. Then after the ASR has captured the command, this session ID also gets passed on and when the right intent is activated, the JSON payload of the intent on MQTT also gets this session ID. The code that handles this intent then can end the session with the specified session ID to speak the response on the audio output of the right site ID.

I could take a look at how to implement this, but I don’t understand Rhasspy’s code well enough to do this now and I’m not sure I’m up to it. For instance, I’m not familiar with your style of working with “actors” in your code. Where can I read more about this?

So, I’m setting up a development environment for Rhasspy on Ubuntu 18.04, and after the following commands:

./download-dependencies.sh
./create-venv.sh

a lot of packages are installed, but eventually I get the following error:

Successfully installed snowboy-1.2.0b1
tar: /home/koan/rhasspy/download/precise-engine_0.2.0_x86_64.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

Did I miss something? I didn’t find any documentation about the development environment, so I just guessed those two commands above…

Got it; fixed in master now. You guessed right about running the download then the create script. Should be updated in the docs shortly :wink:

Thanks! Compiling… :slight_smile:

So Rhasspy has been installed and it is running, but when I try to run the tests with ./run-tests.sh, I get an error about fstminimize not being found:

(.venv) koan@pow:~/rhasspy$ ./run-tests.sh                                                                      
DEBUG:smart_open.smart_open_lib:{'transport_params': None, 'ignore_ext': False, 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'r', 'uri': '/home/koan/rhasspy/.venv/lib/python3
.6/site-packages/smart_open/VERSION'}                                               
INFO:gensim.summarization.textcleaner:'pattern' package not found; tag filters are not available for English
INFO:pytorch_pretrained_bert.modeling:Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/google: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)                     
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/mpl_toolkits: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)                                                         
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/google/logging: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)                            
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/google/cloud: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)   
ERROR:JsgfSentenceGenerator:generate                                  
Traceback (most recent call last):                                    
  File "/home/koan/rhasspy/rhasspy/train.py", line 54, in in_started         
    intent_fst = self.generate_sentences()                                   
  File "/home/koan/rhasspy/rhasspy/train.py", line 119, in generate_sentences                                   
    intent_fst = make_intent_fst(grammar_fsts)
  File "/home/koan/rhasspy/.venv/lib/python3.6/site-packages/jsgf2fst/jsgf2fst.py", line 303, in make_intent_fst
    subprocess.check_output(minimize_cmd, input=intent_fst.WriteToString())
  File "/usr/lib/python3.6/subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 403, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'fstminimize': 'fstminimize'

OK, just do an apt-get install libfst-tools to fix that. I’ll add it to the create-venv.sh script here shortly. Thanks!

Ok, now I have:

(.venv) koan@pow:~/rhasspy$ ./run-tests.sh                                                                                      
DEBUG:smart_open.smart_open_lib:{'transport_params': None, 'ignore_ext': False, 'opener': None, 'closefd': True, 'newline': None, 'errors': None, 'encoding': None, 'buffering': -1, 'mode': 'r', 'uri': '/home/koan/rhasspy/.venv/lib/python3
.6/site-packages/smart_open/VERSION'}                                                                                                                                        
INFO:gensim.summarization.textcleaner:'pattern' package not found; tag filters are not available for English
INFO:pytorch_pretrained_bert.modeling:Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/google: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/mpl_toolkits: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/google/logging: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)
/usr/lib/python3.6/importlib/_bootstrap_external.py:426: ImportWarning: Not importing directory /home/koan/rhasspy/.venv/lib/python3.6/site-packages/google/cloud: missing __init__
  _warnings.warn(msg.format(portions[0]), ImportWarning)
WARNING:PocketsphinxSpeechTrainer:There are 30 unknown word(s): ['green', 'room', 'light', 'cold', 'red', 'what', 'blue', 'to', 'the', 'lamp', 'off', 'garage', 'how', 'make', 'set', 'whats', 'tell', 'open', 'door', 'bedroom', 'time', 'tem
perature', 'living', 'it', 'is', 'hot', 'turn', 'closed', 'on', 'me']
WARNING:phonetisaurus-apply:2019-05-17 22:57:09:  No pronunciation for word: 'Failed to open --model file 'profiles/en/g2p.fst''
ERROR:PocketsphinxSpeechTrainer:Unexpected error                                  
Traceback (most recent call last):                 
  File "/home/koan/rhasspy/rhasspy/stt_train.py", line 182, in in_unknown_words
    self.write_unknown_words(self.unknown_words)                                                                                                                                                                                            
  File "/home/koan/rhasspy/rhasspy/stt_train.py", line 372, in write_unknown_words                                              
    ), f"No pronunciations for unknown word {word}"
AssertionError: No pronunciations for unknown word green

To run the tests, you’ll need to download and train all of the (working) profiles. If you’re just interested in Dutch for now, edit test.py and put “nl” only in the profile list.

Also, I forgot that the profile has to be downloaded into Rhasspy’s profiles directory instead of the default in ~/.config. One way to do this is just run profiles/nl/download-profile.sh profiles/nl/ right there in the main directory.