Rhasspy offline voice assistant toolkit

The PT and VI profiles use a completely different speech recognition system from the others. I typically run a Rhasspy instance on a server, which trains very quickly. I’ll try on of my Raspberry Pi’s and see what it’s doing.

What’s the spelling error?

The word “acenda” had and additional character that I can’t remember anymore, since I corrected it, sorry. I posted it above already corrected.

Probably it is ithe same misspelling in “Words Tab” - “acdenda A K D E~+ D AX”, where the correct is “acenda”.

I am a bit confused in the steps needed to switch profiles. It seems that I should change it in settings and also in Addon config: “default_profile”: “en” . I’m struggling to came back to “en” profile.

Should the Custom words show the words from all languages together?

When I change the profile to “en”, the sentences from “en” are not loaded.

There is no profile shown at left of the green button “Train”.

In Hass.io, the add-on config profile setting should override what’s in the Rhasspy settings. I might just remove the add-on config, though, since it can be confusing.

The custom words should only show words for the profile’s language.

It sounds like you might have gotten into weird state with the profile JSON files (a bug on my end). You can always delete the defaults.json file in the profiles directory and any profile.json files to start over. Maybe I should add some kind of “Revert to Default Settings” button?

I tried out the latest Rhasspy in Docker on my Rpi3 and the Portuguese profile trained in about 6 seconds. Recognizing a WAV file took 2-3 seconds.

I tryed to delete defaults.json but I noticed after a restart it was not created again. So I put it there again (just renamed because we never know…).
Then deleted al files is pt profile. Rhasspy downloaded them again and then “pt” in blue apperead in the left of Train button. But sentences remained the “en” ones. I saved new sentences, which were saved in the right place. But training still takes forever. Also in Words, there is no Custom Words for pt, that a saw before. Just “rhasspy R AE S P IY”, which I believe is a custom word from the “en” profile.
So I still couldn’t test portuguese profile yet.

I don’t know about your question about “revert to default settings”. It depends much on the programmer perspective, which I am not aware of. Thought about a load button for sentences and words, but I think it might be unnecessary.

I also noticed that there are a lot less files in the pt profile folder.

Hi there, I think there is an issue in the Snowboy addon in combination with Snips.

When I use the Snowboy Addon, a message gets published to “hermes/hotword/default/detected”

{“siteId”: “matrixvoice”, “modelId”: [“snowboy”], “modelVersion”: “”, “modelType”: “personal”, “currentSensitivity”: [0.5]}

This gives an error in the Snips Addon:

WARN:hermes_mqtt : Error while decoding object on topic “hermes/hotword/default/detected”: invalid type: sequence, expected a string at line 1 column 37

When I use the Snips Addon as hotword with “Hey Snips”, this gets published:

{“siteId”:“matrixvoice”,“modelId”:“hey_snips”,“modelVersion”:“workflow-hey_snips_subww_feedback_10seeds-2018_12_04T12_13_05_evaluated_model_0002”,“modelType”:“universal”,“currentSensitivity”:0.5,“detectionSignalMs”:-60,“endSignalMs”:-60}

I think the cause is the modelId":“hey_snips” vs “modelId”: [“snowboy”]

Got it fixed as long as you don’t use multiple hotwords…

1 Like

There are fewer files in the pt profile since it uses a different speech recognition system. I’ve tried removing the profile setting from Hass.IO in hopes that it won’t conflict any more with Rhasspy’s internal setting.

Would you be willing to delete your pt profile in /share/rhasspy and try again?

Sure, we are here to test it.
Updated to 2.12.
Deleted pt profile.
Changed in Addon the profile to pt (saved)
Restarted by addon page.
Included in settings my HA, MQTT, Snowboy configurations
Restarted by Rhasspy page.

What I got:
Sentences and Words still corresponds to the EN profile.
Now, in Addon, config has no profile anymore:

{
  "run_dir": "/share/rhasspy"
}

And I can’t remember the syntax to include it again or switch back to EN.

I tryed to switch back to default but it didn’t worked,

I got the right syntax here in the thread:

{
  "run_dir": "/share/rhasspy",
  "default_profile": "en"
}

But it doesn’t get memorized after saving, returning to no profile in the config. Just as before.
What can be done to be back at least to english profile?

You can select the default profile from the rhasspy web ui right? That is what I use.

It doesn’t work either (config in addon does not show profile (“default_profile”: “en”) anymore.
synesthesiam, told us some posts above that the config in addon overrides the rhasspy settings.

BUT after some reboots, en profile is working again. But profile in addon (“default_profile”: “en”) didn’t show up anymore…

I think what’s happening here is the different ways of setting Rhasspy’s profile are in conflict. After some thought, I’ve decided to simplify the process and just require a command-line argument on start-up. This can be automatically set for Hass.IO from the add-on setting (which I’ll put back in).

Setting the profile from the web interface has caused a lot of headaches because of other choices I made with how profiles are stored. Many of these problems go away if I just require the user to always specify the profile during start-up.

I have been a heavy Snips user for the last 7 months, but now I’m exploring other options that are completely open source, and I discovered Rhasspy, which is looking impressive!

I have some question, though. I have installed the Docker image and I want to use it without audio input and output on the Rhasspy machine. I want to use other devices such as the Raspberry Pi that’s now running Snips for audio input and output, and eventually I want to use @Romkabouter’s awesome Matrix-Voice-ESP32-MQTT-Audio-Streamer. I consider these as ‘satellites’ that talk to Rhasspy.

I have it working now except for the TTS part. In my Home Assistant automations that react to the events for the intents I use the snips.say service now to give voice feedback to my Snips machine. But I want to get rid of Snips TTS and use Rhasspy’s TTS instead. Is there a way to let the Home Assistant automation send a text to Rhasspy and let Rhasspy translate it to a wav and send the audio over the Hermes/MQTT protocol to a Snips audio server?

Glad to hear you’re interested in Rhasspy! Yes, this is possible through Rhasspy HTTP API. You should be able to use Home Assistant’s RESTful Command to POST the text to /api/text-to-speech. Specifically the parts of the RESTful command should be:

  • url: http://your-rhasspy-server:12101/api/text-to-speech
  • method: POST
  • payload: the text to say
  • content_type: text/plain

I’d recommend trying out MaryTTS for text to speech. I have a Docker image available for convenience, though it doesn’t have all of the voices installed.

That is just a good start for me! I was using Snips Addon as TTS but I want to move away from Snips.
I tested out the TTS, which works good. Now I need to make it speak via Google Wavenet.
Other TTS engine are just not good enough for Dutch

Here is a small demo with my Matrix Voice as microphone, all in Dutch, sorry guys.
I created a rest_command in the configuration.yaml:

rest_command:
rhasspy_speak:
url: ‘http://’< HA-IP>:12101/api/text-to-speech’
method: ‘POST’
payload: ‘{{ payload }}’
content_type: text/plain

Then added an automation:

id: ‘1556640424575’
alias: Rolluiken
trigger:

  • event_data: {}
    event_type: rhasspy_Covers
    platform: event
    condition:
    action:
  • data_template:
    payload: ‘Dat is goed, ik {{ trigger.event.data.actionType }} het {{ trigger.event.data.whichCover }} rolluik’
    service: rest_command.rhasspy_speak

My sentences look like this:

[Covers]
acties = (sluit | open) {actionType}
plaatsen = (voorste | achterste) {whichCover}
rolluik

When I give the command, rhasspy sends and event Covers to HA.
That triggers the automation, handling the TTS.
In my Hassio, I have the Rhasspy Addon and Snowboy Addon. All is set to hermes as protocol.

Here is a small video, I have noticed that you need to be quite precise in the way snowboy is pronounced, but it works for now. I am mainly focussing in getting audio-out correctly on the Voice.
The main issues are that only 44100 16 bit stereo is played well (others are playing to fast as you can see in the video) and the fact the the audio data is distorted. This is an issue of my software trying to process the wav data coming in over MQTT and has nothing to do with Rhasspy.

For a personal note, I would really like the Google Wavenet was implemted, with caching.
This guy has done a lot of great work, I have used it as well :slight_smile:
https://forum.snips.ai/t/snipssupertts-one-script-to-rule-them-all/747

This scripts is for Snips, but if you checkout how he has integrated Google Wavenet amongst others it is really a great script.

1 Like

Thanks, @synesthesiam and @Romkabouter! That RESTful command did the trick. Now only the espeak voice sounds really robotic in Dutch. Unfortunately, MaryTTS doesn’t seem an option for me because Dutch isn’t in the list of supported languages on the homepage.

This is awesome, @Romkabouter! Would you mind if I linked to this video from the Rhasspy Github page?

I will take a look at the Snips TTS script and see what I can do. Rhasspy can call an external program for TTS right now, but I forgot to add it to the documentation…

I’ve seen a number of very promising offline TTS solutions like deepvoice3, but most are English only due to the need for a massive amount of training data. I wonder how much training data is required for a new voice in MaryTTS. Maybe we could collaborate and create one? You guys supply the WAV files, and I’ll build the voice.

I have forked your repo and will try to add Google Wavenet as well.
Don’t know the ins and outs of the project, so will most likely ask some questions.

You can link that video, but I think it’s better if a make an english version first so people might actually understand it :smiley:

Sure, I certainly want to help with that. Just let me know what conditions these WAV files have to meet. But @Romkabouter is Dutch and I am Flemish, so I’m curious what a model trained on both language variants would sound like :slight_smile: