Rhasspy offline voice assistant toolkit

tc23 · November 23, 2019, 12:09am

Does anyone have any pics of an actual setup? One problem I have seen (which I know is only a problem for some people) is that DIY smart speakers don’t look that great. I’ve looked for something that would house both respeaker and a pi, but haven’t come up with anything

Romkabouter · November 23, 2019, 10:44am

Well, that might be the price you pay for not being dependant on google/apple/amazon or some other cloud devices.
DIY are almost never consumerproduct devices.

I am currently trying to build a case for my hardware, but 3D is not my expertise I can savely say

synesthesiam · November 23, 2019, 3:28pm

I was unaware of HACS and AppDaemon. These look like a great way of implementing “skills”!

So, a Rhasspy skill might contain:

Intents
Sentences
Slots
Custom words
AppDaemon apps

How would this work with Hass.io? And what could be done to allow people to provide localization/translations for skills?

koan · November 24, 2019, 9:10am

In my Snips apps I did something like the following to localize utterances:

i18n = importlib.import_module('translations.' + SnipsAppMixin().assistant['language'])

The app would get the user’s language from the Snips configuration and then import the utterances from the right language.

And then the app would have code like:

 self.publish(*end_session(payload["sessionId"], i18n.RESULT_INTENT_SORRY))

People could provide a localization as a file with their translated utterances such as this RESULT_INTENT_SORRY in a GitHub pull request. I know, it’s just a hacky way to implement i18n, but I found tools like gettext a bit overkill for this purpose.

I wanted to do intents, example sentences, slots and custom words in a similar way in my Snips apps, but the way Snips works is that you create these in the Snips Console, a web based interface. For Rhasspy skills this could be implemented too in just some files that can be translated. On installation of the skill, it just has to know which language profile you are using in Rhasspy and install the intents and so on from the right language.

This way in most cases people can translate a skill to their own language just by a pull request with a couple of translated files.

nanosonde · November 24, 2019, 4:18pm

It would be very cool if rhasspy would continue to run as standalone solution in the future, i.e. that it does not depend on any special things like HA AppDaemon for those people who do not run HA, but other solutions and who would now like to move over to your solution as a result of recent Snips news.
(In fact, since yesterday I was not aware of rhasspy.)

There is also something very similar to AppDaemon: https://habapp.readthedocs.io/en/latest/
It can be used with MQTT and/or OpenHAB.

Would it be possible to create something like the snips skill server so that existing skills only need minor adjustments?

synesthesiam · November 24, 2019, 7:00pm

Do you mean something compatible with Snips skills, or just a similar service?

I agree. I’d prefer to keep Rhasspy out of the business of actually performing the actions. But I can see where it would be useful for people to want to share skills, which may contain actions.

Maybe like @koan described localization, other users could add actions to skills for various providers, like HABApp. But then there would again be the problem of how to get those actions into the appropriate server…

majB · November 25, 2019, 8:45am

Hi,
I am trying to get the brightness to work with Rhasspy for days, but I couldn’t make it work.
I saw your message and I wondered how you got it done.
Maybe you can post the code you used in the configration.yaml which will be very helpful.
Thanks in advance.

FunkyBoT · November 25, 2019, 5:18pm

You are welcome.
Take a look at this. It should solve your issue:

Automations:

    action:
      service: light.turn_on
      entity_id: light.w1
      data_template:
        brightness: >
          {{{ "ten":254, "nine":230, "eight":200, "seven":170, "six":140, "five":110, "four":80, "three":50, "two":30, "one":10, "zero":1 }[trigger.event.data["brightness"]]}}

I also use “zero” in order to get a minimum value for the brightness.

synesthesiam · November 25, 2019, 6:51pm

Hi, everyone. In preparation for the upcoming update (version 2.4), I’ve tagged version 2.3 on DockerHub. Version 2.4 hopefully won’t break anything, but just in case…

To save space in the Docker image, I’m not including the flair intent recognizer and Mycroft Precise. Leaving those two out reduces the image size by 3GB. If anyone is using them, I can prepare a larger Docker image; let me know!

majB · November 25, 2019, 8:40pm

much appreciated,
It worked like a charm, thanks

Romkabouter · November 25, 2019, 9:58pm

I do not use it, I think smaller images are great

Romkabouter · November 25, 2019, 10:04pm

For skills in Rhasspy, there are already quite some parts available.
First, you can define your own sentences, which basically are Skills already.

Something more advanced would be python scripts, called via the command intent handler.
Each skill being 1 or a set of scripts.
If there was a way to install those scripts, you have a skill server.
For instance you could write a script to search in wikipedia or wolfram aplha for facts or that sort of things.

synesthesiam · November 26, 2019, 1:34am

I think this fits the spirit of Rhasspy well. The only problem is that the scripts would be limited by Python environment that Rhasspy executes them with. HA must store extra Python dependencies somewhere in your config folder to allow custom components to add libraries.

An easy system might be to just have a special folder in your profile. If the folder contains a file named the same as an intent, Rhasspy executes it and feeds the JSON into its stdin. It could be a Bash script, Python script, whatever. Then, a “skill” could just be an intent with sentences, slots, and a script.

synesthesiam · November 26, 2019, 2:02am

I’ve finally pushed up version 2.4
I apologize in advance for any bugs! If necessary, you can stick with synesthesiam/rhasspy-server:2.3 until bugs have been fixed.

There were some major backend changes, including:

Moved to the quart web framework, which will hopefully finally fix the websocket issues
The speech/intent training system has been mostly re-written
- It uses doit for incremental training, so it only re-builds what’s necessary during training.
- You can force a full re-training from scratch with the “Clear Cache” drop-down on the Train button.

On the frontend, I’ve added a “Slots” tab so you can edit slots from the web interface. When testing intents on the main page, a summary of the intent/slots is now color coded for easier debugging.

One useful quirk of the new training system is being able to add slot values to a voice command without attaching them to words:

[TestIntent]
this is a test (:){name:value}

This will generate a TestIntent with a “name” slot whose value is “value”. This works because the colon (:) is used to substitute a spoken word (left) with something else (right) that goes into the intent. An “empty” substitution means nothing is spoken and no word is substituted, but you can still attach a tag to it

Romkabouter · November 26, 2019, 6:56am

Yes, I see.

I also do not think intent handing should be part of Rhasspy, but for another piece of software.
So if you take that as a starting point and intents are already send as events or mqtt messages, it might be a good idea to create a skill handler apart from Rhasspy.
Hassio is a good skill handler, you can create your own automations by reacting on events.
But, every person has to do that by them selfs and a lot of users do not have those skills.

If an other, standalone, application can be created which can handle en respond to events, that application could be setup up in such a way that programmers can create skills and users can use it.
This is what the snips skill-server does, that skill-server is part of the platform, but running as a separate service.

That way the event handling machine can be interchanged, just like all the other things in Rhasspy.
Multiple options for every subsystem (STT, TTS, WakeWord etc)
Rhasspy already support this, because you can use HA as handler, but also command. It could be exentend with “SkillServer”, posting events to an url

koan · November 26, 2019, 7:11am

Indeed, ideally we should have some choice of skill servers to link to Rhasspy, like we have choices for other components such as the intent recognition, ASR and so on. Rhasspy’s architecture makes this already possible.

Another interesting project to write skills is Pytlas. I haven’t tried it yet, but it looks doable to integrate it with Rhasspy:

Implement an interpreter that offloads the intent recognition, slot extraction and training to Rhasspy.
Implement a client to use Rhasspy to communicate with the user.

It has a nice API, but I hesitate to take this route because Rhasspy is already a wrapper around a lot of tools and Pytlas is yet another wrapper with its own syntax for training data, so I fear that this will make it too complex and error-prone.

nanosonde · November 26, 2019, 10:56am

@synesthesiam
What I meant is really a piece of code which offers Snips Skill compatibility to let it work with rhasspy.

I studied this page again: Snips apps server | Latest Platform Version | Snips Dev Center

This is the most relevant part of it:

The snips-skill-server service is used to run each action-* present in the /var/lib/snips/skills/

These action files must have their permissions set as executable. An action typically contains code connecting to the MQTT server and “looping forever” on it. Typical example of such an executable is a python script using the hermes-python library to listen to intents being detected, and reacted to it.

The first line of this python file must have #!/usr/bin/env python2 to run properly.

Each of these executable will be started by the app server, and restarted in case the process exits/crashes.

If a subfolder named venv exists, the snips-skill-server will try to activate the virtualenv before starting the executables (ie sourcing venv/bin/activate before starting the executable). This can be used to easily install dependencies for a python script.

Any other file or folder is ignored by the snips-skill-server .

So the snips-skill-server only cares about the managing(start/stop) processes and not more.
Each skill basically uses the Hermes protocol via MQTT to receive intents and start the corresponding handlers. As rhasspy already supports sending Hermes via MQTT it should be possible to easily reuse snips-skill by just executing the action file. Those “action” Python files already contain the complete stuff for a skill.

However, other aspects seem to missing in rhasspy via Hermes/MQTT: Dialog and Entity Injection.
@synesthesiam told me that the entity injection is already possible via HTTP slot updates.

Do you think it would be a good idea of further implementing Snips compatibilty stuff?
Personally, I like that way the processing pipeline via Hermes/MQTT works.
So components can be easily exchanged.

As there seems to be a common consent that rhasspy should not do intent handling, the only thing that would be required to be more Snips compatible would be adding Entity Injection via Hermes/MQTT as this requires ASR and NLU re-training.

A general remark:
HA and other automation/control (iobroker, openhab, FHEM, etc.) solutions are just one use-case of skill or a group of skills.
Other use-cases are not related to automations/control at all IMHO.
Example:
“ok, rhasspy. please calculate the sum of 5 and 5.”
“ok, rhasspy. Tell me a joke.”
etc.
This is something for a voice assistant skill, but not for a HA or similar.

koan · November 26, 2019, 11:32am

I like this way too, and coming from Snips as an app developer I have even been toying with the idea to implement these missing parts in Rhasspy, but it’s really much more complex than it looks. Not only is there much more that needs to be implemented than just entity injection (for instance, session management hasn’t been implemented yet in Rhasspy last time I checked, and this is non-trivial), the Hermes protocol isn’t even published completely and parts of it are only implemented in closed-source components of Snips (for example see this comment when I was implementing hermes-audio-server). I don’t like the idea of reverse-engineering a semi-published protocol, because it’s impossible then to reach 100% compatibility.

Yes it would be nice if the Rhasspy ecosystem would get a lot of Snips apps “for free” when we re-implement enough of the Hermes protocol and the Snips skill server. But I’m not sure anymore if it’s worth the hassle to try this. Most of these apps aren’t even maintained anymore, are not compatible with the newest Snips release, or aren’t that usable. Moreover, app developers will depend on libraries such as hermes-python with an unclear future.

Meanwhile, Rhasspy has a public API and is completely open source, so it should be possible to create apps by integrating Rhasspy with solutions such as:

Rasa Actions
pytlas
AppDaemon coupled to HACS for app installation
HABApp

I’d love to hear other options, as I’m currently exploring how I would implement apps in Rhasspy. The beauty of Rhasspy’s architecture is that you’re free to link it to any system you want

synesthesiam · November 26, 2019, 2:50pm

Since chatl can output Rasa NLU’s format, it might be pretty straight forward to import this data into Rhasspy. One of the hidden benefits of Rhasspy’s syntax and fsticuffs, though, is that it never actually generates all the possible sentences (not true with fuzzywuzzy). Everything stays compacted all the way to becoming an ARPA language model.

koan · November 26, 2019, 3:01pm

What’s the benefit of this? Performance?