Beginners guide to use Rhasspy with HomeAssistant

I posted this guide on the Rhasspy support forum, but was asked to post it here as well. :slight_smile: So here it is!

Part 1 - the basics and a first example

As I can very well recall my struggles to get Rhasspy going in combination with HomeAssistant, I just wanted to share a few lines with you, if you just installed Rhasspy and want to use your first voice command. :slight_smile:

This is not a guide about the installation of Rhasspy, there are a lot of good guides out there on the net. This is just a starter, to get you going and have some kind of working example to built upon. I hope you find it useful.

Our starting point is right after you have installed Rhasspy, it doesn’t matter how, Docker or as a HA-AddOn or whichever way you chose. I’m as well assuming you have a running HA instance and you know where to setup your automations in HA.

If you now open your web admin from Rhasspy, you’ll find this menu in the upper left.

rhasspy_beginners_guide_01

The menu contains the following items from top to bottom:

  • Home Here you find a status page with some testing possibilities
  • Sentences Here you set up your sentences, that means, the words you speak to Rhasspy after the wake word
  • Slots Slots are lists of things or devices, that you can load into your sentences, so you don’t need to write nearly identical sentences
  • Words This page is around the words and their pronaunciation in Rhasspy
  • Settings Here you change the setup for all the different parts of Rhasspy
  • Documentation This opens the Rhasspy documentation (Note: this is the offline docu, that was installed together with Rhasspy)

Now, choose the settings page in the menu, and you should be presented with this screen:

Let’s check a few things first, to get the setup right.

  • siteId
    It’s a good time to name your Rhasspy instance. Give it a name you can remember later and that is describing. If you ever change to a setup where you use Rhasspy satelites, you’ll need this and as we use this name later on in automations, it makes sense to do it now.

  • MQTT
    These are the settings for your MQTT broker. If you use HA-OS, a supervised install or a standalone HA-core installation, you normally will have a MQTT-broker already configured. In my case I’m running HA-OS, so I already use the mosquitto broker from the HA-AddOn store. So if you have a broker running, change the settings to “external” and fill in the data for your broker.

    • Host Fill in your IP address or the domain name from your MQTT broker, with HA-OS it is the same as your HA address. Example: 192.168.178.100 or homeassistant.local
    • Port The default port is 1883
    • User I recommend to setup a new user for your broker in the settings of HA. If you use the AddOn, you find these under settings > people > users.
    • Password Same as above

    If you don’t have a MQTT broker running, leave the setting to “internal”.

  • Audio Recording, Wake Word, Speech to Text, Intent Recognition, Text to Speech, Audio Playing and Dialogue Management are out of the scope of this guide. If you need help with these, please refer to the documentation of Rhasspy, which you can find here. As you can see I went with the recommended options.

  • Intent Handling
    This is the important part, here we setup HA as our intent handler. This means, what you speak to Rhasspy gets “translated” and then send to HA to actually do something, like switching a light.
    So choose “HomeAssistant” and restart Rhasspy to reflect your change.

    There are two ways for Rhasspy to talk to HA. One is with intents, the other one is with events. As I couldn’t get intents to work correctly, and after reading up some tutorials, I choose the event way. In the end it doesn’t make a huge difference in function, but events are def. easier to handle.

    • Hass URL Fill in the url to your HA instance
    • Access Token Setup an access token in HA under your user profile and fill it in here
    • Set the intent handling to Send events to Home Assistant (/api/events)
    • Save your settings and let Rhasspy restart

Now that we have our setup complete, we can start right into writing up our first sentence. Open the sentences page (via the menu) and you’ll see the default sentences.ini file presented in your editor window. Delete all the entries, we don’t need them for now and later on we are able to make our own sentences that really fit our needs.

Now add the following to the editor window:

[GetDate]
what date is today

This is very small, but it shows the principles, that are involved in training Rhasspy and send something to HA. So what are we looking at?

  • The first line [GetDate] is the name of our intent.
  • The second line is the sentence we need to speak, to tell Rhasspy what we want.

Just think of the following way:

  • You speak your wake word, Rhasspy wakes up and sends a short signal so we can now speak and Rhasspy listens.
  • Whatever sentence is set here, Rhasspy tries to get your spoken word right and “translates” it to a command (the first line).
  • Summed up, you speak, Rhasspy translates that to a command and this will be sent to HA to do something. This is what we call an “intent”.

As you might guess, it is not always easy and welcomed, if you need to get the sentence exactly right, so there is the possibility to set more than one sentence. But in the end, Rhasspy “translates” this always to one command.

Change the text in the editor by adding a third line

[GetDate]
what date is today
give me the date

Now we can speak one of the two sentences, and Rhasspy “translates” this always to just one command, namely [GetDate]. Just to make it clearer: You need to speak one of the sentences, and Rhasspy will “answer” with that one command.

We will come back to our sentences file later, but for now, safe it and let Rhasspy re-train, so it knows the sentences we just added.

Now we have to do something in HA, as Rhasspy already did it’s first part of the job. Move now over to HA and setup an automation. I’ll show here the YAML version of the automation, just because explaining what’s going on behind the scenes is easier. You can always do this automation in the UI editor of HA, it’s entirely your choice.

Let’s see how an automation could look like with the sentences we added before:

automation:
  - id: Rhasspy GetDate
    alias: Rhasspy GetDate
    mode: single
    trigger: 
      - platform: event
        event_data: {}
        event_type: rhasspy_GetDate
    action:
      - service: mqtt.publish
        data:
          topic: hermes/dialogueManager/endSession
          payload_template: '{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "Today is {{ states.sensor.date.state }}"}'

We’ll go through each line now, to explain what’s happening here (if there is more to explain, we’ll come to that later):

  • id Give your automation a “speaking” id, if you move on, you’ll likely get a lot of automations for Rhasspy, and it is easy to loose the big picture. So choose a good name, in my case I start all automations regarding Rhasspy with “Rhasspy”. That makes it easier in the end, for example if you search for an automation in HAs automation window, you’ll have all the Rhasspy entries “grouped” together, as they all start with, you might guess it, “Rhasspy”.
  • alias I just copy the id to the alias, as this is an optional step, but it makes things clearer down the road.
  • mode This is the mode in which your automation is run. In our case single is the right choice, as you likely won’t want the date told more than once. This will come in handy, if you have a command, that should be repeated. For example, if you later want to set your TV volume, you might want to run the automation a few times to increase the volume. Than this will change (don’t worry, we will come to an example later)
  • trigger This is the part, where we will use our command from before
    • platform: event As you might remember, we configured Rhasspy to send an event instead of an intent to HA, so we need to use the event platform in HA to recognize it
    • event_data For now we don’t need this, but it will come in handy later on, if your automations get more complicated. Just leave the two brackets empty.
    • event_type This is what identifies, what Rhasspy sends to HA. As you can see, it is the command we configured before, [GetDate]. It is always prefixed with “rhasspy_” and followed by the actual command “GetDate”. Makes in combination rhasspy_GetDate. Easy, isn’t it?
  • action This is where we configure what HA should do, if this automation get’s triggered (aka you spoke something that Rhasspy identified and sent to HA)
    • service:mqtt_publish We want HA to publish something (the answer) on the MQTT topic, so it is send back to Rhasspy
    • data
      • topic This is the topic Rhasspy listens to, in our case we want to close the session with an answer to our question. I added a few lines about seesions in Rhasspy at the end of this guide, if you’re interested what’s happening with sessionIds and so on.
      • payload_template Here we tell Rhasspy in which session we are (yes, there could be more than one), and what we want Rhasspy to tell us back (aka the answer).
        As you can see, we just setup a “text”, and it will be sent back to Rhasspy

This is, in an essence, what we need for Rhasspy and HA to work together. This is a very simple example, but the way things go, should be clear:

  • Rhasspy wakes up
  • You tell your sentence
  • Rhasspy tries to find out, what you want from it, and “translates” your sentence into a command
  • This command will be sent to HA over MQTT
  • HA picks up the command and looks for an automation that fits (actually it’s the other way around, but let’s not get to techy here) => named after the command you sent
  • HA is running the automation and publishes an “answer” over MQTT
  • Rhasspy identifies the session and speaks the text from the MQTT topic back to you

Now safe your automation, reload the automations in HA and move back to Rhasspy.

For testing purposes, the “Home” page comes in handy. Call it by pressing the “Home” button. If you take a look under the status bar, you’ll see the line that starts with the “Recognize” button. This is where we’ll test our command and the connection with HA.

Type in one of the sentences exactly how you configured it. In our example type “give me the date” and push “recognize”. If everything works, you should be presented with the command you configured for this sentence in a red box, here it will be “GetDate”. This means, your sentence is recognized and is “translated” correctly to a command. Yeah! Roght now, we didn’t send anything out, it is just “inside” Rhasspy, to check, if a sentence works.

If you want to take a look, push the button “Show JSON”, and you’ll see exactly, what Rhasspy is sending over MQTT.

For our guide we are happy right now, our first intent was recognized by Rhasspy. So let’s move a step further, and check the box on the right that says “Handle”. If you now push “Recognize” again, Rhasspy isn’t only recognizing your intent, it will additionally send out the command (the JSON you can take a look at) to HA. Move over to the automation list in HA and you should see, that the automation “rhasspy_GetDate” was executed. It should show a timestamp for the last execution (shouldn’t be too long ago, depending on how long you needed to switch over to HA).
Note: you won’t hear a spoken answer from Rhasspy, this is purely to check the connection to HA!

If this works correctly, now is the time to check if your voice command and the answer are running as well. Leave the “Home” page open and speak your wakeword followed by one of the sentences. You should now see your spoken sentence in the “Recognize” field, followed by the command in the red box. And while you’re reading, you should hear your answer from HA spoken through Rhasspy.

Congratulations, your first voice command works, Rhasspy is doing it’s job and HA is ready to answer your questions or to do something for you. Pad your shoulder, you did great!

You think we’re done here? Nope, that’s only half the way, but don’t worry, from here on it’s merely an expanding than doing something totally new. The next steps are to refine the sentences and sent something to HA, that actually does something, like switching a light.

Part 2 - refining the sentences and automations

For the really interesting stuff, we now move back to our sentences in Rhasspy (open it via menu) and add some useful things. Let’s start with a light.

Add these lines in the editor window under your [GetDate] command, so it looks like this and retrain Rhasspy:

[GetDate]
what date is today
give me the date

[LightsTurnOn]
turn the lights on

If you now speak your sentence (“turn the lights on”), Rhasspy will (hoefully) recognize it and sent the command [LightsTurnOn] to HA. As great as this is, HA can’t do much with this. Why you ask? Well, for now, we didn’t specify anything, neither the light we want to turn on, nor what “turn on” means. So let’s move over to HA and get something useful out of this.

Let’s add to the example from above and copy this automation (add it to the first one, you’ll see the complete example a little downwards):

  - id: Rhasspy LightsTurnOn
    alias: Rhasspy LightsTurnOn
    mode: single
    trigger: 
      - platform: event
        event_data: {}
        event_type: rhasspy_LightsTurnOn
    action:
      - service: mqtt.publish
        data:
          topic: hermes/dialogueManager/endSession
          payload_template: '{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "OK, the lights are turned on"}'

For now, this just gives us back a confirmation, but we actually didn’t do anything with lights. This comes now, as we need to provide HA with some entity_ids and what we want (turn_on).

Change the automation by adding the following under action, but before the mqtt_publish:

      - service: light.turn_on
        entity_id: group.all_lights

If you don’t have a group.all_lights, don’t worry, it’s just for the example and you don’t need to safe this, we will provide a different, but complete automation later. So what are we doing here? It’s kind of a standard automation in HA, we specify the service (lights.turn_on) and give an entity_id for the service to call.

You know the saying “all roads lead to Rome”? As with most things in life, there are a few different approaches to work with Rhasspy and HA, like in this case.

With the above example, we do not need anything specific sent from Rhasspy to HA, as the complete logic lives inside the automation in HA (which light is described in the automation, as well as what service to call). This is nice, but what if we want to do some specific things, like turning on one specific light, eg. the kitchen light? For now, we would need to setup a different sentence for each light, so Rhasspy (and down the line HA) would know, which automation to call, to turn on the light.

This is not the best idea, as this approach would fill the sentences file as well as the automations in HA with a lot of duplicated entries. So what we need to do now, is make our sentence and therefor our command a little more specific.

Go back to Rhasspy and your sentences file and change the command [LightsTurnOn] to the following:

[LightTurnOnKitchen]
turn the lights in the kitchen on

[LightTurnOnLivingroom]
turn the lights in the livingroom on

As you can see, we now provide Rhasspy with two different possibilities of lights to turn on. This will get crowded very fast, and this is only the Rhasspy side of things. In HA you would still need two different automations to handle this, instead of one.

Let’s first change the sentence to combine these two lights:

[LightTurnOn]
turn the lights in the ( kitchen | livingroom ) on

What we do here, is tell Rhasspy that we speak a sentence, but there are two possibilities to speak that sentence. One with kitchen or one with livingroom. Just think of the | as an “or”. This makes our sentences file a lot smaller in the future, but there is one thing: right now, we can’t differentiate what light is meant, as Rhasspy (correctly) translates this to one command, [LightsTurnOn]. And automations in HA can’t represent that either, as we still don’t have something sent with the command to choose.
Either way, we need to give the command some attribute (=tag), that tells HA what light we wanted to turn on. So we add a tag to the command:

[LightTurnOn]
turn the lights in the ( kitchen | livingroom ){entity} on

We now told Rhasspy to add the tag “entity” to our command, so we can decide in HA, which entity to call. Rhasspy will now not only send the command, but also a tag entity with the name we chose while we were speaking our command.

Before, Rhasspy sends this: “[LightTurnOn]” regardless of what we actually speak (kitchen or livingroom).
Now Rhasspy sends this: “[LightTurnOn] entity = kitchen”

This is good, as we now can tell HA what light to choose. But before we can go back to our automation, we still need something else to do. We need to tell HA the correct entity_id of the entity we provided. This means, the entity_id not necessarily follows the spoken word. Eg. “kitchen” will most likely not be the entity_id you configured in HA, more likely is light.kitchen. To solve this, we add another thing to our sentence, so called substitutions. They let us speak something, but translate that to something different. Let’s see an example:

[LightsTurnOn]
turn the lights in the ( kitchen:light.kitchen | livingroom:lights.livinigroom){entity} on

We tell Rhasspy to listen for the word “kitchen”, but in the JSON Rhasspy sends to HA, we want it to set “light.kitchen”. To make that a little clearer:

You speak: “Turn the lights in the kitchen on”
Before “tagging”, Rhasspy sends this: “[LightsTurnOn]”
After tagging, Rhasspy sends this: “[LightsTurnOn] entity = kitchen”
After tagging and substitutions, Rhasspy sends this: “[LightTurnOn] entity = light.kitchen”

And now we’re finally getting somewhere! We can now speak a command, and send different entities and meanings to HA. Great!

Switch over to your HA automations and change the complete command Rhasspy LightsTurnOn to this:

  - id: Rhasspy LightsTurnOn
    alias: Rhasspy LightsTurnOn
    mode: single
    trigger: 
      - platform: event
        event_data: {}
        event_type: rhasspy_LightsTurnOn
    action:
      - service: light.turn_on
        entity_id: "{{ trigger.event.data.entity }}"
      - service: mqtt.publish
        data:
          topic: hermes/dialogueManager/endSession
          payload_template: '{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "OK, the lights in the {{ trigger.event.data.entity_raw_value }} are turned on"}'

In the newly added part, you can see we configured to turn on the light (service: light.turn_on) and set the entity_id with the value from the tag in our sentence, using the substitution (light.kitchen).

We used something else here, in the “text” we send back to Rhasspy, there is entity_raw_value. This is a nice addition, as Rhasspy sends the “spoken word” together with some other data, and we can use this here perfectly: in our example we speak “kitchen” to Rhasspy. HA needs the entity_id, but using it in the spoken text, it would sound a little crazy:

Rhasspy would answer: “OK, the lights in the light.kitchen are turned on”

That’s why we use the raw value instead, that would be “kitchen”. So Rhasspy can answer correctly “OK, the lights in the kitchen are turned on”.

That should be it for our second part, you should now be able to write your own commands in Rhasspy to listen for and send them to HA to let it do something meaningful. You can now change the service to call or the entities and still need only a few sentences and automations to let HA react.

Part 3 - we are diving even deeper

As your sentences file will grow, there is still something in our example we could use to minimize the sentences. What, if you want to decide what to do in the sentence? For example turn a light on or off.

Let’s get directly into it and make use of this.

Change your Rhasspy sentence from

[LightsTurnOn]
turn the lights in the ( kitchen:light.kitchen | livingroom:lights.livinigroom){entity} on

to this

[LightsTurnOnOff]
turn the lights in the ( kitchen:light.kitchen | livingroom:lights.livinigroom){entity} ( on | off){state}

Now you can say “turn the lights…on” or “turn the lights…off”, and we can react to this in the automation we just wrote. Just note, the state in HA is already on or off, so we don’t need substitutions here, but we need the tag to tell us, which state is sent, “on” or “off”.

Now change the automation in HA to this:

  - id: Rhasspy LightsTurnOnOff
    alias: Rhasspy LightsTurnOnOff
    mode: single
    trigger: 
      - platform: event
        event_data: {}
        event_type: rhasspy_LightsTurnOnOff
    action:
      - service: "light.turn_{{ trigger.event.data.state }}"
        entity_id: "{{ trigger.event.data.entity }}"
      - service: mqtt.publish
        data:
          topic: hermes/dialogueManager/endSession
          payload_template: '{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "OK, the lights in the {{ trigger.event.data.entity_raw_value }} are turned {{ trigger.event.data.state }}"}'

What we did here are minor changes, but they’re very powerful. We change the service depending on the state (on/off) we get, to service: light.turn_on or service: light.turn_off. Isn’t that great? With such minimal changes, we configured a complete different thing. And to make it complete we add the sent state to the answer as well. It tells us now, what we where doing, namely turning the light “on” or “off”.

That’s all, we did some very powerful changes here, and if you think that further, you can do nearly everything with just a few sentences. I’d recommend, that you use the testing possibilities on the “Home” page to test while you’re working on this. It is very easy to check, if the command and the tags are correctly assigned.

For example, just type the following in the “Recognize” field and click “Recognize” without activating “Handle”:
“Turn the lights in the kitchen on” - you should see the recognition and as command [TurnLightsOnOff] entity = kitchen, state = on
Try it different and see how the recognition changes:
“Turn the lights in the livingroom off” - you should now see [TurnLightsOnOff] entity = livingroom, state = off

Part 4 - sessions, and why they are important

As you might have noticed in the above examples, there is always a sessionId in the answers back to Rhasspy from HA. And it is there for a reason. Let’s first see, what a session is, and how it is used by Rhasspy.

If you talk to Rhasspy, it opens a session, to store information and to know, what parts of the communication belong together.

  • You say the wakeWord, Rhasspy opens a session, and reacts to it. Normally that would be the short sound you hear, after that, Rhasspy is listening. Before the short sound, the wakeWord session is terminated, because it is no longer of use. The wakeWord was spoken, and Rhasspy moves on to the next step.

  • Rhasspy opens a new session, as your spoken text is different from a wakeWord. Still Rhasspy gathers more or less informations about what you’re saying. Taking our example from above, it would be the command, an entity, maybe a state and so on. The sessionId will be sent together with all the data.

  • Rhasspy leaves the session open and waits for an answer to the sent command (and its additions).

    • If you don’t close this session with your answer from HA, it will remain active for some time, and if you say a lot of commands straight after one another, your Rhasspy could get crowded. And it is good practice to close a session. So you end the session with your final answer, and that’s why you send it back together.
    • If you need to combine different commands and tags into one big automation, you might find session handling useful, as you can add more information with different intents on the fly.

    One example would be to setup a reminder for a specific date and time.

    • Start with your wakeWord, Rhasspy listens
    • The new session is opened, and you say something like “set a reminder”
    • Rhasspy sends a command to HA and awaits the answer
    • The automation in HA needs to know when you want to set the reminder and asks “Sure, on what date?”
    • Now you need a session handling, as you need to know, what you’re talking about. You need to tie “setup a reminder” to the newly spoken “date”. So you answer with “on September 3rd”
    • Rhasspy still uses the same session (as we provide it with a sessionId) to store the date together with the already spoken reminder command
    • HA will ask again “At what time?” and you answer with “at 3pm”
    • The session now contains all the info we finally gathered, by asking different questions, in one place. It knows you want “to set a reminder”, it knows the “date” and the “time”
    • You can now start your final automation to setup a reminder in the calendar

    You see where we are going with this, aren’t you? With a session it is possible to play “ping-pong” between you, Rhasspy as translator and HA to do something. This can be used in so many different places, eg. if you want to speak with your TV to show something specific from your media folder. The possibilities here are near endless, and that’s why it is important to stay in touch with your sessions. Open and close them to your liking, but do it.

    And to add to this, it is important for the other way around, if you want to let Rhasspy speak to you, in case of an alert or something like that, where you didn’t initiate the conversation via wakeWord/intent. Only if you open a session, you know what the alert was, the answer was meant for. And while we’re at it, in a later part 5 of this guide, we’ll see how it works, if you want Rhasspy to notify you of something, and you want to wait for an answer from you.

reserved for part 5 :slight_smile:

I hope you find this guide useful and can learn some things for yourself and your use of Rhasspy and Home Assistant.

As always, all input is welcome! :slight_smile: Criticize, find faults, or just tell me, how you liked it - input is welcome!

If you want something added, please leave a note!

And if you have questions, please use this thread to ask, I don’t know all the answers to life, but maybe someone else does! :rofl:

Happy voice commanding! :slight_smile: