Rhasspy offline voice assistant toolkit

@Hitesh_Singh, This may be out of my skills, but maybe it could be wrong write access to .config, what’s your docker user and what are the access permissions to .config ? But again, I could be wrong.

@koan, Here are my Rhasspy logs :

DEBUG:__main__:Namespace(host='0.0.0.0', port=12101, profile='fr', set=[], ssl=None, system_profiles='/usr/share/rhasspy/profiles', user_profiles='/profiles')
 DEBUG:RhasspyCore:Loaded profile from /profiles/fr/profile.json
 DEBUG:RhasspyCore:Profile files will be written to /profiles/fr
 DEBUG:root:Loading default profile settings from /usr/share/rhasspy/profiles/defaults.json
 DEBUG:WebSocketObserver: -> started
 DEBUG:DialogueManager: -> started
 DEBUG:DialogueManager:started -> loading_mqtt
 DEBUG:DialogueManager:Loading MQTT first
 DEBUG:DialogueManager:Loading...will time out after 30 second(s)
 DEBUG:HermesMqtt: -> started
 DEBUG:HermesMqtt:started -> connecting
 DEBUG:HermesMqtt:Logging in as athena
 DEBUG:HermesMqtt:Connecting to MQTT broker 192.168.0.105:1883
 DEBUG:DialogueManager:loading_mqtt -> loading
 DEBUG:DialogueManager:Loading actors
 DEBUG:HermesMqtt:Connection successful.
 INFO:HermesMqtt:Connected to 192.168.0.105:1883
 DEBUG:HermesMqtt:connecting -> connected
 DEBUG:DialogueManager:Actors created. Waiting for ['recorder', 'player', 'speech', 'wake', 'command', 'decoder', 'recognizer', 'handler', 'hass_handler', 'sentence_generator', 'speech_trainer', 'intent_trainer', 'word_pronouncer'] to start.
 DEBUG:HermesAudioRecorder: -> started
 DEBUG:HermesAudioPlayer: -> started
 DEBUG:EspeakSentenceSpeaker: -> started
 DEBUG:DummyWakeListener: -> started
 DEBUG:DummyCommandListener: -> started
 DEBUG:FuzzyWuzzyRecognizer: -> started
 DEBUG:PocketsphinxDecoder: -> started
 DEBUG:HomeAssistantIntentHandler: -> started
 DEBUG:PocketsphinxSpeechTrainer: -> started
 DEBUG:FuzzyWuzzyIntentTrainer: -> started
 DEBUG:PhonetisaurusPronounce: -> started
 DEBUG:JsgfSentenceGenerator: -> started
 DEBUG:HermesMqtt:Subscribed to hermes/audioServer/default/audioFrame
 DEBUG:DialogueManager:recorder started
 DEBUG:EspeakSentenceSpeaker:started -> ready
 DEBUG:FuzzyWuzzyRecognizer:Loaded examples from /profiles/fr/intent_examples.json
 DEBUG:DialogueManager:player started
 DEBUG:FuzzyWuzzyRecognizer:started -> loaded
 DEBUG:DialogueManager:wake started
 DEBUG:DialogueManager:command started
 DEBUG:DialogueManager:speech_trainer started
 DEBUG:DialogueManager:intent_trainer started
 DEBUG:DialogueManager:word_pronouncer started
 DEBUG:DialogueManager:sentence_generator started
 DEBUG:DialogueManager:speech started
 DEBUG:DialogueManager:recognizer started
 DEBUG:PocketsphinxDecoder:Loading decoder with hmm=/profiles/fr/acoustic_model, dict=/profiles/fr/dictionary.txt, lm=/profiles/fr/language_model.txt
 DEBUG:DialogueManager:handler started
 DEBUG:PocketsphinxDecoder:started -> loaded
 DEBUG:DialogueManager:decoder started
 WARNING:DialogueManager:Actor timeout! Still waiting on ['hass_handler'] Loading anyway...
 DEBUG:DialogueManager:loading -> ready
 INFO:DialogueManager:Automatically listening for wake word
 DEBUG:DialogueManager:ready -> asleep
 INFO:__main__:Started
 DEBUG:__main__:Starting web server at http://0.0.0.0:12101

Edit : Something that maybe of interests, when I use the POST http request to api/listen-for-command I get these logs :

Edit 2 : My bad, wrong training lead to that, I have managed to fix the first edit, my issue is still here and Rhasspy doesn’t listen to MQTT

Thanks for the help @Ypresis :grinning:

I got it to work my placing the missing files from https://github.com/synesthesiam/rhasspy-profiles/tree/master/en to my /root/.config/rhasspy/profiles/en folder. now that it is working I can go now and start learning how it works.

1 Like

How big is your SD card?

1 Like

From that error message, I would guess that the profile download got interrupted and Rhasspy’s download cache is corrupted. Unfortunately, Rhasspy doesn’t try to verify the files once they’re present, so you would have to either (1) download them manually (which looks like what you did :slight_smile:) or (2) delete the download folder in your profile and restart Rhasspy.

For anyone in the future experiencing this problem, I’d recommend grabbing the files from the Github release page for your language instead and manually downloading them to the download folder in your profile. For example, the English profile would need cmusphinx-en-us-5.2.tar.gz, en-70k-0.2-pruned.lm.gz, and en-g2p.tar.gz. The .pt files are only needed if you use the flair intent recognizer (which you probably don’t).

1 Like

Thanks for sharing, I didn’t find answer here https://rhasspy.readthedocs.io/en/latest/wake-word/ but what is the behavior of Rhasspy if you don’t set a wake word system ? Does it listen to mqtt or does it still need to be activated some way ?

I ask for debugging right now Rhasspy still doesn’t listen to my mqtt hermes.

The default is the dummy wake word system. I agree, the documentation should be clearer about that. All of the defaults are present in the defaults.json file in the Rhasspy repo, if you’re curious :slight_smile:

From your logs above, it looks like Rhasspy is connecting to your MQTT broker correctly and subscribing to audio frames. You should be able to go to the web interface, hold down the “Hold to Record” button, and speak a command (then let go). Does this work for you?

I tried when I was playing with wake word services.

However, I tried recording with my browser audio input via mqtt this morning with the default configuration of the wake word (dummy) this morning. I get these error messages in my logs:

WARNING:HomeAssistantIntentHandler:Empty intent. Not sending to Home Assistant
WARNING:HermesMqtt:Empty intent. Not forwarding to MQTT
DEBUG:__main__:Recorded 120364 byte(s) of audio data
DEBUG:PocketsphinxDecoder:rate=16000, width=2, channels=1.
DEBUG:PocketsphinxDecoder:Decoded WAV in 0.9733586311340332 second(s)
DEBUG:PocketsphinxDecoder:Transcription confidence: 0.017280606925557645
WARNING:PocketsphinxDecoder:Transcription did not meet confidence threshold: 0.017280606925557645 < 0.8
DEBUG:__main__:
DEBUG:__main__:{"text": "", "intent": {"name": "", "confidence": 0}, "entities": [], "speech_confidence": 0, "slots": {}}
WARNING:HomeAssistantIntentHandler:Empty intent. Not sending to Home Assistant
WARNING:HermesMqtt:Empty intent. Not forwarding to MQTT

From what I understand, PocketSphinx does not understand what I’m saying. I will try to tune the confidence threshold.

Edit: Okay, by lowering pocketsphinx to 0.01 Minimum Confidence I got it working by recording via my web browser, thanks a lot ! I have a question though, my average confidence is between 0.02 and 0.03 in the logs, is this normal? And my intents are not so well recognize as Rhasspy tends to mess up with intents (ie, giving me a temperature when I try to close a curtain).

1 Like

Did you try to retrain Rhasspy? I also noticed that it sometimes does this (completely messed up intent understanding), but after retraining it works flawlessly again.

1 Like

Hi! After some time, having sentences implemented in english, I would like to migrate it to my native language (pt-br). When I train the sentences (kaldi must be used) I get the following error:

Training failed: Exception(“realpath: ‘${KALDI_PREFIX}/kaldi’: No such file or directory\n”,)

https://rhasspy.readthedocs.io/en/latest/speech-to-text/#kaldi

I understand that I have to install Kaldi in order it to work (at the moment a Pi3, which it seems that it be a bit challenging… :expressionless: ).

The link above points to a pre-build copy of Kaldi (v1.0):

https://github.com/synesthesiam/kaldi-docker/releases

I downloaded the “kaldi_armhf.tar.gz” version, which I believe is the correct version and unpacked, but don’t know how to proceed. The Kadi Url mention another way to install it, so I am afraid to doing the wrong way and have anything broken.

So no I have the version 1.0 that I mentioned above , unpacked in a /tmp/kaldi folder.
Can someone kindly point me to what should I do next in order to compile and install it? Thanks!

Pocketsphinx’s confidence seems to depend on the number of possible sentences, which means the “best” threshold will change if you add more intents or sentences. This is really unfortunate, and I’d love to hear from anyone who knows a better way of getting confidence values out of Pockesphinx!

I agree with @koan that you likely need to just re-train Rhasspy. You may also try using fsticuffs as your intent recognizer instead of fuzzywuzzy (which I saw being loaded from your logs). Fuzzywuzzy will start making mistakes if your sentences are very similar, since it just uses fuzzy string matching. Fsticuffs is more strict, but will have no trouble with similar sentences.

If you’re using the Docker image, you don’t! I messed up and forgot to set the KALDI_PREFIX environment variable (fixed in the latest version now).

Kaldi is installed in the Docker image under /opt/kaldi, so you just need to replace KALDI_PREFIX with /opt in your settings for it to work :slight_smile:

Got it. A lot easier than I thought :smiley: . Thanks.

1 Like

I need a help with something you probably already face it with languages other than english.

For example, in order to turn on or off a light, the following template is used:

service_template: 'light.turn_{{ trigger.event.data["state"] }}'

But in my case, on and off should be changed to “acender” and “apagar”. So, I believe a new template can be used to switch “acender” to “on” and “apagar” to “off”. Doing so, HA can understand the task correctly. Can anyone post how did you circunvent this? Thanks for your help.

Yes! This is probably a good use case for tag synonyms. This lets you change what Rhasspy puts in the JSON event sent to Home Assistant.

In your sentences.ini file, you might have something like this:

light_state = (acender | apagar){state}
...

To make Rhasspy put “on” in the state slot for “acender” and “off” for “apagar”, just do this:

light_state = (acender){state:on} | (apagar){state:off}
...

Now your Home Assistant template should work as is, because state will always be set to “on” or “off”.

I agree with @koan that you likely need to just re-train Rhasspy. You may also try using fsticuffs as your intent recognizer instead of fuzzywuzzy (which I saw being loaded from your logs). Fuzzywuzzy will start making mistakes if your sentences are very similar, since it just uses fuzzy string matching. Fsticuffs is more strict, but will have no trouble with similar sentences.

Thanks for your feedbackl. Well as Open FST does not seem to work for me (I get empty intents every time, event after a retrain). I think I’m gonna stick to Pocketsphinx.

Here are my logs, when I try a Get an Intent:

ERROR: State ID -1 not valid
ERROR:root:['ferme', 'les', 'du', 'salon', 'baie', 'vitree', 'est']
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/jsgf2fst/fstaccept.py", line 66, in fstaccept
    out_sentences = fstprintall(out_fst, exclude_meta=False)
  File "/usr/local/lib/python3.6/dist-packages/jsgf2fst/fstaccept.py", line 206, in fstprintall
    for arc in in_fst.arcs(state):
  File "pywrapfst.pyx", line 1406, in pywrapfst._Fst.arcs (pywrapfst.cc:17426)
  File "pywrapfst.pyx", line 1420, in pywrapfst._Fst.arcs (pywrapfst.cc:17365)
  File "pywrapfst.pyx", line 2878, in pywrapfst.ArcIterator.__init__ (pywrapfst.cc:31111)
pywrapfst.FstIndexError: State index out of range

Edit: Nvm, by switching some intents it works again with Open FST

I am having a issue in “pt” when the sentences have “alternatives” like below:

[GetHumidity]
humid_local = (externa | interna) {name}
qual [é] a umidade <humid_local>

All of them work in english, but none of them work in pt. When the sentence is simple, with no alternatives, then it work.

I also tryed to debug looking the intent arriving to the mqtt using “mosquitto_sub”, but nothing arrives.
How can I debug it a little further? May something be broken when using the pt profile?

EDIT: Using v2.22

EDIT 2:

Experimenting in the “Speech” tab, these sentences with alternatives also don’t work, returning:

"intent":
"entities":
"intent":
"confidence": 0
"name": ""
"slots":
"speech_confidence": 1
"text": ""

If I write the sentence and call the intent, then it works.
I have no idea why it is behaving like this.

I was able to get it to work by switching to fuzzywuzzy for intent recognition. The Portugese speech model does not seem to do very well, and the OpenFST recognizer is too sensitive.

I generated a WAV file with the sentence “qual a umidade externa” using Google WaveNet, and Rhasspy transcribed it as “qual o ar ligue ar de externa” (you can see this in the log). Fuzzywuzzy matches it correctly, but only because there are very few other sentences.

The underlying issue here is that all of the Kaldi models I have (Portugese, Vietnamese, Swedish) are not really intended for speech-to-text, and are trained on (expensive) data I don’t have access to. However, I did manage to find a Portugese speech dataset that I will try to train a model on – it’s about 10.5K WAV files, which should be plenty!

I hope Fuzzywuzzy will work for you in the mean time :slight_smile:

I tryed fuzzywuzzy and it only worked after a new “training”, otherwise it doesn’t work at all. Anyway it worked in a VERY unreliable way, returning an intend very different, which sentence sound is not even close from what it understood.
I also tryed to play with “minimum conficence” for fuzzywuzzy with no success.
I hope that the new Portuguese Speech Dataset could be successful.
It seems that I will have to switch back to the EN profile. Anyway thanks for your help and effort developing for other languages. Maybe you have another trick that I can test.
If you find it useful, for the PT profile I have 11 sentences. For the EN profile I have about 25.

After some false starts, I managed to train a custom acoustic model for Portuguese. I’ve updated Rhasspy to use this new model instead of the old Kaldi one, and I’d be interested to hear your feedback.

In my testing, the model does just a little better than the Kaldi one. I’m hoping its enough of an improvement that you find it useful. The major roadblock is the small amount of data in the dataset I mentioned. It turns out there are only 8 hours of data, where good models apparently need 100+ hours. Mozilla claims to have 11 hours of Portuguese on their Common Voice website, but its not available for download.

Another idea I had was to use public domain audiobooks for training. This has apparently been done for English, but the process looks daunting…

It still do some mistakes, but in my preliminar testing, it is MUCH better than Kaldi. I have to experiment more though. It seems to respond a lot faster also. It should be the way. I hope that the needed model hours could be improved somehow.
I will need some days experimenting more to give you a more solid feedback, anyway it seems to be promising.
Thanks one more time for your excellent work.

EDIT: It was a gift for my anniversary in the community. THANKS!