Rhasspy offline voice assistant toolkit

ntuseracc · May 30, 2019, 10:40am

Oh, ok have to read the docs more carefully.

Tried this right now but it wont work, does the wav file has to be in a specific format? I just converted a mp3 to wav using audacity. (WAV Microsoft signed 16-bit PCM), put it in HAs www directory and send the url to the API (http://192.168.1.8:8123/local/kitchen-timer.wav)

The Rhasspy log shows the following:

[INFO:161907989] geventwebsocket.handler: 127.0.0.1 - - [2019-05-30 12:34:08] "POST /api/play-wav HTTP/1.1" 200 149 0.003127
[DEBUG:161907986] __main__: Playing 47 byte(s)

But no sound is played on my speaker (snips-audio-server atm). The File is also much bigger than 47 bytes…

koan · May 30, 2019, 11:14am

That sounds like an error in how the service is called.

Can you try this on the commandline to see whether Rhasspy can play the WAV file?

curl -X POST "http://IPOFYOURRHASSPY:12101/api/play-wav" -H "Content-Type: audio/wav" --data-binary @"/local/path/of/kitchen-timer.wav"

When I call the above service with curl with the right URL and file path on my system (note the @ before the filename!), the WAV file is played on my Hermes audio player connected with Rhasspy via MQTT.

The play-wav service doesn’t seem to accept a URL, though… At least that doesn’t work here and if I read the source code correctly it just plays the request data of the POST request directly as a WAV file.

ntuseracc · May 30, 2019, 1:23pm

Ok your command works, but it only works with local files.
So the audiofile is send as data instead of a file path or url.

So to use this with node-red i have to read the file and then sends its content to the api. Lets see if i can get this working

UPDATE:
Ok got it working, put the file in a directory my nodered docker can read and then use the “file read” node and send its content as a single buffered object to the api.

koan · May 30, 2019, 3:43pm

Great! By the way, did you get a chance to run my Hermes Audio Server? I just published a new version with some bugs fixed. By trying to send some WAV files with Rhasspy’s API to answer your question, I discovered that the audio player crashed on invalid WAV files That doesn’t happen anymore in the newest version.

synesthesiam · May 30, 2019, 4:15pm

If you go to the Settings page, there should be a Test button under the Microphone section. Does it say your microphone is working?

synesthesiam · May 30, 2019, 4:16pm

Thanks for catching this. It is fixed in master, and will be rolled into the next version. I changed @Romkabouter original code to make the voice property uniform with the other TTS systems, but failed to make the change everywhere.

synesthesiam · May 30, 2019, 4:17pm

This was a major oversight on my part! Thanks for finding it; fixed in master, up to Dockerhub soon.

synesthesiam · May 30, 2019, 4:19pm

If this is a problem, I can add an option to have Rhasspy play WAV files from URLs too. Let me know.

hamped · May 30, 2019, 4:38pm

Yes, is say it is working but how do I test it? I can’t seem to get playback working

koan · May 30, 2019, 9:03pm

Hi @synesthesiam I have been playing with the new RASA 1.0 since last week and I have a first RASA NLU + Core assistant running. Does Rhasspy have a way to link its ASR and TTS to RASA?

synesthesiam · May 30, 2019, 9:15pm

There should be a “Hold to record” button on the main page that will let you record a voice command. You can also upload a WAV file from there.

synesthesiam · May 30, 2019, 9:25pm

Rhasspy worked with RASA NLU about a year ago, and they’ve made some pretty significant changes. I haven’t been able to run their Docker image, so I can’t even debug it.

The way it’s supposed to work is that Rhasspy produces the YAML file needed for training a RASA agent and POSTs it to the training endpoint. Then, during intent recognition, the text transcription is POST-ed to the RASA server and the JSON intent is used.

If you’re interested in RASA, you may also want to check out the flair recognizer that I added. It does something pretty similar to RASA – it trains an intent classifier and a set of named entity recognizers using magical machine learning!

koan · May 31, 2019, 8:14am

Great, that makes sense!

Flair looks nice too, but one of the thing that I like about RASA is that it also has a dialogue manager (RASA Core) that uses a machine learning model trained on example conversations to decide what to do next, so you don’t have to code a complex mess of if/then/else statements. And it also has an action server to do something with the recognized intents. For instance, I have now running RASA NLU + Core + actions on a toy example that lets me ask information about the state of a specific Home Assistant entity (using Home Assistant’s REST API) and answers me.

This is all using a text interface for now. But I have taken a look at the RASA integration in Rhasspy’s code and I’ll see if I can link my RASA setup to Rhasspy. I noticed that Rhasspy only uses RASA NLU, so I have to try if the RASA Core part keeps working then, because when I only train the NLU manually my RASA actions don’t work (they need a trained Core model too).

I’m not sure yet I will take the full RASA stack route. It’s impressive, but I’m probably not going to need it all. Currently I’m running no dialogue manager and I’m running actions in Home Assistant’s AppDaemon and that works fine for simple stuff. So just linking RASA NLU to Rhasspy is already very useful.

I also wasn’t able to run their latest Docker image, but after some debugging the reason became clear: their TensorFlow package is compiled with AVX extensions and my CPU doesn’t have these. So I just set up RASA in a Python virtual environment with an alternative TensorFlow wheel and this works.

ntuseracc · May 31, 2019, 10:07am

if its not to much work it could be helpfull in the future? Such in option would benifit for more supported audio types like mp3 or ogg since you are most likely trying to playback a file from an external source.

What is the current status on some kind of dialog? Still working on my timer and tought it would ask the user if a timer is allready set if he wants to set a new one. So simple tts message and then listen for yes or no.

i could probably get this working with intents and events but for simple yes or no answers this is probably not the smartest solution.
Is something like that planned?

@koan:
I just tried your hermes-audio-server but it crashes with the following error:
Configuration file [Errno 2] No such file or directory.

I have created the config in /etc/hermes-audio-server.json and even if i start the player with -c /PATHTOCONFIG.json it wont work.

koan · May 31, 2019, 10:15am

Are the permissions right for the configuration file? It should be readable by the user running the audio server commands. In my setup it’s:

$ ls -l /etc/hermes-audio-server.json
-rw-r--r-- 1 root root 74 May 17 16:37 /etc/hermes-audio-server.json

About the dialogue: see above about my experiments with RASA. The RASA Core component is a powerful dialogue manager that can even learn your dialogues from example conversations. But it’s not yet integrated into Rhasspy.

ntuseracc · May 31, 2019, 10:22am

as im running everything as root atm permission shouldnt be a problem and my output looks the same as yours.
btw running player or recorder list all ALSA devices and end with Connected to audio input/output default.
But i guessed this is normal behavior…
My json config looks exactly like the example on https://pypi.org/project/hermes-audio-server/ and i installed the server through pip. Of course i changed the config to connect to my own mqtt server and use the correct side id.

koan · May 31, 2019, 10:33am

Are you sure you have double-checked the filename? I can’t reproduce this error here. Are you running the last version (0.1.1)? I also don’t get the list of all ALSA devices here. This is what hermes-audio-player outputs in my setup:

$ hermes-audio-player
hermes-audio-player 0.1.1
Connected to audio output default.
Connected to MQTT broker mqtt:1883 with result code 0.
Subscribed to hermes/audioServer/livingroom/playBytes/+.

And this is what hermes-audio-recorder outputs:

$ hermes-audio-recorder
hermes-audio-recorder 0.1.1
Connected to audio input default.
Connected to MQTT broker mqtt:1883 with result code 0.
Started broadcasting audio from device default on site livingroom

Or did you maybe install an older version before, but not as root? Maybe Python is still picking up the older modules from .local/lib/python. Because when I enter an invalid filename, I get the following error:

$ hermes-audio-recorder -c foobar
hermes-audio-recorder 0.1.1
Configuration file foobar not found.

Note that you don’t have to run these commands as root. As long as your user is part of the audio group, it should run fine.

ntuseracc · May 31, 2019, 10:39am

It is the first time i installed this on my ip, full output of the player:

root@raspberrypi:/home/pi# hermes-audio-player
    hermes-audio-player 0.1.1
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
    ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
    ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
    Connected to audio output default.
    Configuration file [Errno 2] No such file or directory.

And like i said i tried to to set the conifg file directly with “-c” so filename shouldnt be a problem.

I have modified my asound.conf a bit for snips to work in my setup (respeaker head + plugged speakers). I wil check if this changes anyhting.

koan · May 31, 2019, 10:47am

Oh right, about the Unknown PCM cards warnings, now I remember: I renamed them to default in /usr/share/alsa/alsa.conf to suppress the warnings (see https://www.raspberrypi.org/forums/viewtopic.php?t=136974).

But I’m still not sure what’s happening with the configuration file error…

Can you try running this as the user pi?

koan · May 31, 2019, 11:00am

You did install it with pip3, right? Because Python2 is not supported.