Rhasspy offline voice assistant toolkit

Checking the sourcecode and it looks like there is an issue.
The profile json says: “voice”: “Wavenet-B”, but when the code is called, it checks for a setting:
“text_to_speech.wavenet.wavenet_voice”, which defaults to “Wavenet-C” if not found.

That should be:
“text_to_speech.wavenet.voice”

Ah thanks, could have figured this out as well :slight_smile:
Manually changing the profile json works for fixing this and i get the english voice.

I played arround with the nodered timer example and have a few questions.
I am a total beginner with nodered, just installed it a few minutes ago.
The example works great if i dont send the intents to HA, but it seems the internal intent api (which the nodered flow relys) on wont fire with normal wakeword detection if home assistant is configured.
Is this by design or am i missing something?

i am currently trying to convert the example to use the ha event instead.

Hi

I’m trying to get Rhasspy up and running on a raspberry pi and docker. I’ve added a usb sound card an a web camera, but I can’t see that anything is recorded.

What am I missing in getting this right?

I am using HA events, because I used the event to tigger automations which send TTS messages back to Rhasspy

yeah got it working now as well. Biggest problem was with HA Events that the full message is not stored in the vent but only the event name and rules.
Because of this it was a bit of hassle to get the timer to repeat what time was set and i ended up with a lot of if and else for it to sound natural :slight_smile:

If someone wants my modified function just tell me.

Update:
Ok got my hands dirty with node-red and created a beefed up timer.
This one also support pausing and resuming timers as well as stopping them completely, it will also tell you the time left or if there is a paused timer if you ask for it.

@synesthesiam:
Any plans for sound playback with rhasspy? (stored files or urls?)
I know i could install a mediaplayer on my hermes satellite and use it thorugh home assistant but using rhasspy for this would be a bit simpler imo.
I thought about having a soundfile being played when my timer is ready instead of or in combination with tts.

Do you mean the /api/play-wav endpoint? You can use this REST endpoint to play a WAV file. Have a look at http://IPOFYOURRHASSPY:12101/api/ for a list of the available REST API URLs.

Oh, ok have to read the docs more carefully.

Tried this right now but it wont work, does the wav file has to be in a specific format? I just converted a mp3 to wav using audacity. (WAV Microsoft signed 16-bit PCM), put it in HAs www directory and send the url to the API (http://192.168.1.8:8123/local/kitchen-timer.wav)

The Rhasspy log shows the following:

[INFO:161907989] geventwebsocket.handler: 127.0.0.1 - - [2019-05-30 12:34:08] "POST /api/play-wav HTTP/1.1" 200 149 0.003127
[DEBUG:161907986] __main__: Playing 47 byte(s)

But no sound is played on my speaker (snips-audio-server atm). The File is also much bigger than 47 bytes…

That sounds like an error in how the service is called.

Can you try this on the commandline to see whether Rhasspy can play the WAV file?

curl -X POST "http://IPOFYOURRHASSPY:12101/api/play-wav" -H "Content-Type: audio/wav" --data-binary @"/local/path/of/kitchen-timer.wav"

When I call the above service with curl with the right URL and file path on my system (note the @ before the filename!), the WAV file is played on my Hermes audio player connected with Rhasspy via MQTT.

The play-wav service doesn’t seem to accept a URL, though… At least that doesn’t work here and if I read the source code correctly it just plays the request data of the POST request directly as a WAV file.

Ok your command works, but it only works with local files.
So the audiofile is send as data instead of a file path or url.

So to use this with node-red i have to read the file and then sends its content to the api. Lets see if i can get this working :slight_smile:

UPDATE:
Ok got it working, put the file in a directory my nodered docker can read and then use the “file read” node and send its content as a single buffered object to the api.

1 Like

Great! By the way, did you get a chance to run my Hermes Audio Server? I just published a new version with some bugs fixed. By trying to send some WAV files with Rhasspy’s API to answer your question, I discovered that the audio player crashed on invalid WAV files :smiley: That doesn’t happen anymore in the newest version.

If you go to the Settings page, there should be a Test button under the Microphone section. Does it say your microphone is working?

Thanks for catching this. It is fixed in master, and will be rolled into the next version. I changed @Romkabouter original code to make the voice property uniform with the other TTS systems, but failed to make the change everywhere.

This was a major oversight on my part! Thanks for finding it; fixed in master, up to Dockerhub soon.

If this is a problem, I can add an option to have Rhasspy play WAV files from URLs too. Let me know.

Yes, is say it is working but how do I test it? I can’t seem to get playback working :frowning:

Hi @synesthesiam I have been playing with the new RASA 1.0 since last week and I have a first RASA NLU + Core assistant running. Does Rhasspy have a way to link its ASR and TTS to RASA?

There should be a “Hold to record” button on the main page that will let you record a voice command. You can also upload a WAV file from there.

Rhasspy worked with RASA NLU about a year ago, and they’ve made some pretty significant changes. I haven’t been able to run their Docker image, so I can’t even debug it.

The way it’s supposed to work is that Rhasspy produces the YAML file needed for training a RASA agent and POSTs it to the training endpoint. Then, during intent recognition, the text transcription is POST-ed to the RASA server and the JSON intent is used.

If you’re interested in RASA, you may also want to check out the flair recognizer that I added. It does something pretty similar to RASA – it trains an intent classifier and a set of named entity recognizers using magical machine learning!

Great, that makes sense!

Flair looks nice too, but one of the thing that I like about RASA is that it also has a dialogue manager (RASA Core) that uses a machine learning model trained on example conversations to decide what to do next, so you don’t have to code a complex mess of if/then/else statements. And it also has an action server to do something with the recognized intents. For instance, I have now running RASA NLU + Core + actions on a toy example that lets me ask information about the state of a specific Home Assistant entity (using Home Assistant’s REST API) and answers me.

This is all using a text interface for now. But I have taken a look at the RASA integration in Rhasspy’s code and I’ll see if I can link my RASA setup to Rhasspy. I noticed that Rhasspy only uses RASA NLU, so I have to try if the RASA Core part keeps working then, because when I only train the NLU manually my RASA actions don’t work (they need a trained Core model too).

I’m not sure yet I will take the full RASA stack route. It’s impressive, but I’m probably not going to need it all. Currently I’m running no dialogue manager and I’m running actions in Home Assistant’s AppDaemon and that works fine for simple stuff. So just linking RASA NLU to Rhasspy is already very useful.

I also wasn’t able to run their latest Docker image, but after some debugging the reason became clear: their TensorFlow package is compiled with AVX extensions and my CPU doesn’t have these. So I just set up RASA in a Python virtual environment with an alternative TensorFlow wheel and this works.

if its not to much work it could be helpfull in the future? Such in option would benifit for more supported audio types like mp3 or ogg since you are most likely trying to playback a file from an external source.

What is the current status on some kind of dialog? Still working on my timer and tought it would ask the user if a timer is allready set if he wants to set a new one. So simple tts message and then listen for yes or no.

i could probably get this working with intents and events but for simple yes or no answers this is probably not the smartest solution.
Is something like that planned?

@koan:
I just tried your hermes-audio-server but it crashes with the following error:
Configuration file [Errno 2] No such file or directory.

I have created the config in /etc/hermes-audio-server.json and even if i start the player with -c /PATHTOCONFIG.json it wont work.