Rhasspy offline voice assistant toolkit

synesthesiam · November 26, 2019, 3:34pm

Reduced training time for large voice command spaces. The timer example has about 8 million possible sentences. It takes a few minutes just to generate all the sentences on my laptop, but only 2 seconds to train with fsticuffs.

nanosonde · November 27, 2019, 9:42am

Thanks @koan for the detailed infos.
If even the Hermes stuff is not fully documented, I will also not spend any further minute with Snips compatibility. And you are right, if Sonos shuts down the Snips Console, then no skill developer will maintain his script(s) anymore. So another point why it is useless to spend time on further Snips compatibility.

However, Snips was a good example how to make such a voice assistant available for makers.
So obviously they made a few things right.

synesthesiam · November 28, 2019, 4:24pm

I think there’s a good opportunity here to work with the HA people to create some kind of open “standard” that works with Rhasspy and Ada. I don’t know what that should look like yet; maybe it’s just a set of HTTP end points and some JSON schemas?

synesthesiam · November 28, 2019, 4:26pm

I forgot to mention that in the latest Rhasspy for English, German, and Dutch, you can select Kaldi for speech recognition in the settings to try out the new speech models. These probably won’t run well on a Raspberry Pi, but should give you better recognition results.

majB · November 28, 2019, 7:32pm

I have just tried Kaldi,I am using dutch, I got this
“KaldiDecoder Missing HCLG.fst Graph not found at /share/rhasspy/profiles/nl/kaldi/model/graph/HCLG.fst. Did you train your profile?”

When I train the profile, it gives me this error:
Training failed: <Task: vocab_dict>: TaskError PythonAction Error Traceback (most recent call last): File “/usr/local/lib/python3.6/dist-packages/doit/action.py”, line 424, in execute returned_value = self.py_callable(*self.args, **kwargs) File “/usr/share/rhasspy/rhasspy/train/init.py”, line 404, in do_dict with open(custom_words, “a”) as words_file: FileNotFoundError: [Errno 2] No such file or directory: ‘profiles/nl/kaldi/custom_words.txt’

Romkabouter · November 28, 2019, 7:41pm

I did retrain Dutch as well, no errors here. But I had already custom words I guess
Going to test the recognition.

synesthesiam · November 28, 2019, 7:42pm

Did Rhasspy download extra files when you switched? It sounds like it didn’t successfully extract them to the kaldi folder in your nl profile.

majB · November 28, 2019, 7:48pm

Yes it did download the files.
How can I check if they are successfully extracted?

synesthesiam · November 28, 2019, 7:49pm

You should see a kaldi directory inside profiles/nl with a few files and a model directory if all went well. You might try creating an empty text file called custom_words.txt inside kaldi just to see if training will proceed…

majB · November 28, 2019, 7:58pm

The files were downloaded and extracted successfully.

Adding the custom_words.txt to the kaldi dir solved the problem

Thanks

synesthesiam · November 28, 2019, 8:03pm

I think you may have uncovered a bug, actually! Thanks

majB · November 28, 2019, 8:28pm

Glad to help…
And thank you for creating this nice project… it is exactly what I needed, an offline voice assist. and no cloud.
I have followed the discussion about where Rhasspy fits within the development of Almond and Ada, and I must say that I prefer option 2 "
Rhasspy does everything but handling intents
Basically what happens now, expect you use intent_script rather than events"
Keep up the good work…

synesthesiam · November 28, 2019, 9:09pm

Thanks! There actually was an issue related to custom words. It was using the same file as if you’d selected pocketsphinx instead of kaldi, which was not good.

Hoping to get these kinks worked out as more people are trying Rhasspy.

DeadEnd · November 28, 2019, 9:47pm

Great news!
After initial testing with a very cheap microphone, I got my hands on a used USB speakerphone. Just got it working today and I am SUPER excited.

So far I have one automation setup - but it works (using web-socket).
Great work on this! Very excited to see what I can do with it.

BTW I made one mistake when changing to the USB speakerphone - in case anyone makes the same mistake…

I thought I would have to change the device for the container from /dev/snd to /dev/bus/usb/00#… still learning linux . Don’t… use keep it /dev/snd or you’ll be scratching your head for an hour wondering where all your devices went, lol.

Cheers!
DeadEnd

As a quick follow up… I see the TTS components, but not any directions on how to use them. Are these capable of taking a text string and speaking it? In other words, can I sent a json to the web-socket (or does it have to be HTTP? I’m inexperienced and not sure of the differences…) to have it spoken? I didn’t see anything in the docs explaining this… maybe I should stop being lazy and just take a look at the API…

/api/text-to-speech
POST text and have Rhasspy speak it

So you would use http://localhost:12101/api/text-to-speech… but how should I format the message?
Trying to connect to the /api/text-to-speech as a ws connection gives errors:

[INFO:6110451] quart.serving: 172.17.0.1:44238 GET /api/text-to-speech 1.1 500 21 1321
[ERROR:6110450] __main__: MethodNotAllowed(405)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/quart/app.py", line 1594, in full_dispatch_websocket
    result = await self.dispatch_websocket(websocket_context)
  File "/usr/local/lib/python3.6/dist-packages/quart/app.py", line 1636, in dispatch_websocket
    raise websocket_.routing_exception
  File "/usr/local/lib/python3.6/dist-packages/quart/ctx.py", line 45, in match_request
    self.request_websocket.url_rule, self.request_websocket.view_args = self.url_adapter.match()  # noqa
  File "/usr/local/lib/python3.6/dist-packages/quart/routing.py", line 271, in match
    raise MethodNotAllowed(allowed_methods=allowed_methods)
quart.exceptions.MethodNotAllowed: MethodNotAllowed(405)

So I must be doing something wrong.

DeadEnd · November 28, 2019, 10:33pm

I tried using Kaldi - had the same issue with custom_words.txt - manually added it to resolve this… but I am still getting this error:

KaldiDecoder Missing HCLG.fst Graph not found at /profiles/en/kaldi/model/graph/HCLG.fst. Did you train your profile?

My training takes 0.16 seconds… and after, the HCLG.fst is still missing.
I tested it anyway, and it appears it can’t recognize any words, which I expect is due to the missing graph file.

Cheers!
DeadEnd

Romkabouter · November 29, 2019, 11:39am

I have a rest command in HA for that:

rest_command:
  rhasspy_speak:
    url: 'http://192.168.43.54:12101/api/text-to-speech'
    method: 'POST'
    payload: '{{ payload }}' 
    content_type: text/plain

where payload is the actual text you want Rhasspy to speak.
In HA you can use this with an automation like this:

- id: '1570370359402'
  alias: Lampen
  trigger:
  - event_data: {}
    event_type: rhasspy_Lights
    platform: event
  condition: []
  action:
  - data_template:
      payload: Dat is goed, ik zet de lamp in de {{ trigger.event.data.location }} {{ trigger.event.data.actiontype
        }}
    service: rest_command.rhasspy_speak

but essentially you can post a sentence to your Rhasspy API.

ntuseracc · November 29, 2019, 1:09pm

Hi,
havent played with rhasspy for a while because we were (and still are) renovating our flat but i got a bit time today to update everything to the latest version.

I was wondering if it is possible to set rhasspy in a mode where it waits for a response without using the wakeword.

For Example:
Have a automation in HA that checks if if the lights were turned off after using the bathroom and if not i will use rhasspy (if someones home) to deal with the problem:

normal ttts: light in bathroom is still on, should i turn it off
set rhasspy in a listening mode with a specific event id (from ha) so it waits for a response, it should be configureable how long rhasspy stays in the “respsonse mode” after which it will resume normal mode.
if rhasspy gets a respsone it can understand, it should fire a event with the same id it got from homeassistant. Then in HA you could deal with that event however you like

I think in the beginning boolean answers would be enough (yes no, off on etc to keep it simple) .
in the future you could create special intents categorised by the “event id”.

For example if the event “remind-later” is fired it should listen for time input configured like the timer example but only for the possible intents that are assinged to that event-id.

Is that possible and a good solution?

Wesley_Roelofs · November 29, 2019, 1:34pm

Iam trying to install the rhasspy addon on a raspberrypi zero. Iam getting the following error.

19-11-29 13:18:17 ERROR (SyncWorker_2) [hassio.docker.addon] Can’t build 75f2ff60/armhf-addon-rhasspy:2.4.3: The command ‘/bin/sh -c chmod a+x /run.sh’ returned a non-zero code: 139

Can someone help me ?

Romkabouter · November 29, 2019, 1:58pm

You have several options:
https://rhasspy.readthedocs.io/en/latest/command-listener/

ntuseracc · November 29, 2019, 2:56pm

I have read those methods but as far as i understand this would need a lot of scripting to get basic functionality like i described working. And those response intents would also work in normal operation which could be quite annoying.

Maybe i should describe my idea in a bit more detail:

add a special response mode that can be activated with mqtt, REST or anything else. If you want to use this response mode just send a payload like the following to rhasspy:

{ 
	rhasspy_mode: "response" {
		response_id: "myautomationcallback",
		tts: "Lights in the bathroom were left on, should i turn them off",
		duration: "20",
		default_callback: "Yes"}
}

Rhasppy then enters the response mode, if a tts string is in the payload it will play it back and then enable a special response listening mode where i won’t recognizes normal speech intents but only basic responses unless the response_id matches a specific response intent (for more complex “conversations”) which have to configured beforehand.
default_callback can be set if you want a automatic response without interacting with rhasspy.

In my example rhasspy asks if it should turn off the lights and then waits 20 seconds for a response, there is no special response intent configured for the used response_id so it will only recognize basic boolean responsen like yes or no. If rhasspy detects the spoken user speech/intent (or there is not response at all) it will fire a new event with the configured response response_id and the recognized intent (or default callback):

rhasspy_response_myautomationcallback

The event payload contains the detected response intents which then can be used in specific automations.

If you want more complex responses you can setup specific response intents that only work if rhasspy was activated in response mode with the correct response_id. Those intents will be ignored during normal wakeword operation.

I my opinion a system likes this should not be to complicated to implement but allows the enduser to easily create simple HA automations or nodered scripts to setup a system that can “talk” with the enduser.