Rhasspy - Offline voice control step by step (Server/Client) - Docker

Hi Everyone,
So I had everything working in my house for a few weeks so figured I needed to break things and create mayhem again lol.
I started looking a this, a local offline voice assistant.

General objective was to replicate the Alexa ‘mesh’ in my house, but offline. I wanted a simple set up with 2 components:

  1. Server Element running on my NUC (I use docker) that also runs HA, MQTT etc.
  2. Client Element that offered wake word, voice detection and speech-to-text
  3. Note I do NOT use TTS as i have sonos in the rooms I need a reply for now (I may add a speaker later and update)

Rhasspy did NOT disappoint! It was easy and I was up and running in 10 minutes (YMMV based on environment, docker REALLY helped though!). Don’t be put off by the extensive docs, getting up and running was very straight forward. Once set up you can then play with the config to suit your needs.
HTH…

Hardware set up involved components laying around:
Seeed pi 4 Mic array
8Gb Micro SD Card
Raspberry Pi 3B

Server Side Steps:

  1. Assuming you already have docker running, create a directory for rhasspy, and sub folder called profiles.
  2. Pull and Run docker image:
docker run -p 12101:12101 \
      --restart unless-stopped \
      --name rhasspy \
      -v "/<PATH_TO>/rhasspy/profiles:/profiles" \
      synesthesiam/rhasspy-server:latest \
      --user-profiles /profiles \
      --profile en
  1. Goto server URL http://<Server_IP>:12101 (you may be asked to download files)
  2. Goto settings and check config (and save along the way):

[Rhasspy]
Listen for wake word on Startup = UNchecked

[Home Assistant]
Do not use Home Assistant (note you obviously can instead of Node-Red)

[Wake Word]
No Wake word on this device

[Voice Detection]
No voice communication on this device

[Speech Recognition]
Do Speech recognition with pocketsphinx

[Intent Recognition]
Do intent recognition with fuzzywuzzy

[Text to Speech]
No Text to speech on this device

[Audio Recording]
No recording on this device

[Audio Playing]
No Playback on this device

  1. Check Slots, and Sentences tabs and make sure to hit Train and then Restart

Client Side Steps:

  1. Flash 8Gb MicroSD Card with Buster with Etcher.
  2. Remove and re-insert MicroSD card and add files to the root directory(for headless setup - meaning no screen needed). You only need (b) below if you plan to use WiFi.
    a) a file simply called ‘ssh’
    b) wpa_supplicant.conf (example here)
  3. Insert the MicroSD card in the Pi, use a proper Power Supply and check your router for the IP address it gets.
  4. SSH into the Pi using that IP address (I use Putty) using pi default user/pass = pi/raspberry.
    You are going to want to change that in the future!
  5. Install git:
    sudo apt install git
  6. Install Seeed mic array based on info here
git clone https://github.com/respeaker/seeed-voicecard
cd seeed-voicecard
sudo ./install.sh 
sudo reboot
  1. Plug in Seeed speaker and check install was successful against expected result here:
arecord -L
  1. install docker:
curl -sSL https://get.docker.com | sh
  1. modify user permissions to access docker without using ‘sudo’ all the time :wink:
sudo usermod -a -G docker pi
  1. Close SSH, and relaunch SSH connection to use new permissions.
  2. Create directories for Rhasspy Docker image to use:
cd ~
mkdir rhasspy
cd rhasspy
mkdir profiles
  1. Pull and run docker image:
docker run -p 12101:12101 \
      --restart unless-stopped \
      --name rhasspy \
      -v "/home/pi/rhasspy/profiles:/profiles" \
      --device /dev/snd:/dev/snd \
      synesthesiam/rhasspy-server:latest \
      --user-profiles /profiles \
      --profile en
  1. Goto Client URL http://<Pi_IP_address>:12101 (you will be asked to download some files)
    (At time of writing I put Wakeword, voice detection and recognition on the client)
  2. Under settings ensure the following is selected, Save along the way. You will need to Train once also.

[Rhasspy]
Listen for wake word on Startup = checked

[Home Assistant]
Do not use Home Assistant (note you obviously can instead of Node-Red)

[Wake Word]
Use snowboy (this should trigger a download of more files)

[Voice Detection]
Use webrtcvad and listen for silence

[Speech Recognition]
Use Remote Rhasspy server for speech recognition:
URL = http://<SERVER_IP>:12101/api/speech-to-text

[Intent Recognition]
Use Remote Rhasspy server for speech recognition:
URL = http://<SERVER_IP>:12101/api/text-to-intent

[Text to Speech]
No Text to speech on this device

[Audio Recording]
Use PyAudio (default)
Input Device = seeed-4mic-voicecard (you can test this if you want)

[Audio Playing]
No Playback on this device

Node-Red Config
1.Import this flow from the Rhasspy examples
2. Attach a debug node to the websocket in and configure it to show full msg object.
3. I edited light text node to take this:

{
  "domain": "light",
  "service": "turn_{{slots.state}}",
  "entity_id": "{{slots.name}}"
}
  1. Add a call service node after the light text and leave it blank. Deploy and Enjoy offline voice assistant.

Pick a light (that is a light domain not a switch, and say “Snowboy, turn bedroom light off”
:smiley:

10 Likes

correcting the spaces in the light name

Rhasspy passes bedroom Light but HA needs bedroom_light
This was pretty easy with a change node before the change node that splits out the intents:

Then I simply replaced the space with a _ (note I also kept the original name so I can use it in the TTS phase at the end):

1 Like

Correcting switch to light domain
So for anyone (like me) wondering how to deal with lights that are actually switches (switch.light_example) you can use a function node in node red to correct the domain. I have less lights that are actual lights, and a LOT more lights that are actually Z-wave switches so checked if the light was an actual light, if so use the light domain, else use the switch domain

if (msg.name == "light_name_1" || msg.name == "light_name_2" || msg.name == "light_name3" ) {
msg.domain = "light"    
}
else {
msg.domain = "switch"
}
return msg;
1 Like

Thank you for the tutorial, @jaburges! Would you be OK with me putting these instructions in a (forthcoming) tutorial section of the Rhasspy docs?

Another way of doing this is with substitutions. In your sentences.ini file, you can have:

light_name = (bedroom light):bedroom_light {name}

When you get the intent from Rhasspy, the name slot will contain bedroom_light. You can still get access to the original text (for text to speech, like you mentioned) by looking in the entities list of the intent. You’ll find an entry like { "entity": "name", "value": "bedroom_light", "raw_value": "bedroom light" }. The raw_value will have the original text before substitution occurs.

1 Like

Quick question along this line. If you want a substitution to have multiple words, is this possible and what do you have to wrap them in?

For example:

light_name = (kitchen light):cabinet light {name}

I tried this, but the slot.name only contained light not both words.
I tried putting " " around it but that didn’t work.
Is it possible in Rhasspy’s current state and if not do you think this is something an amature could figure out for a pull request :face_with_raised_eyebrow:.

Thanks!
DeadEnd

In this specific case, you could just do:

light_name = (kitchen:cabinet light){name}

but in general, you (currently) need to drop/add individual words. If you wanted to replace “kitchen cabinet” with “red clown shoes”, for example, it would be:

light_name = (kitchen: cabinet: :dark :clown :shoes){name}

Something like word: means to listen for the word, but drop it during substitution, whereas :word means to add a word in without anything being spoken.

Hope this helps :slight_smile:

1 Like

Absolutely amazing explanation (and fast too!).
I completely understand how it works now.

Thanks again!
DeadEnd

Of course! Thanks for even asking.
You put the effort into making the solution, it’s the least I can offer!

If I well understood: The server mainly is running on your Intel NUC right and you use the raspberry pi as a client?

I will give it a shot as I am exactly trying to archieve this :slight_smile:

That depends on who you are asking :slight_smile:
I am lucky that my server is on the other side of the wall of my main living area.
I was able to pass a USB extension through the wall, and plug a speakerphone directly into the server.
Others I believe are using a PI client and setting the server IP for intent etc.

DeadEnd

So normally it’s possible to have more than one sattelite right … I’m thinking of adding two sattelites

Having multiple independently working satellites is work in progress: https://github.com/synesthesiam/rhasspy/issues/49

sounds fantastic! I will keep an eye on it… for the moment i try setting up one remote pi to see if it works well :slight_smile:

So! I tried setting the client up but i don’t get any beep nothing… seems like is it not running… used the way described above… my server is running on hassio server and client on pi. i get plenty of error messages also I have a reaspeaker USB attached… it is recognized by rhasspy but no sound after telling hey snowboy :slight_smile:

Here the logs

https://pastebin.com/02nyqFCD

https://pastebin.com/jtDvwLby

note the above doesn’t include any sound - i didn’t configure or set up the confirmation sounds as I used Node-Red to set up.
I purely used the Mic-array to capture audio

means it should work with node-red? Let me try this :slight_smile: thanks for your response

Would a Raspberry Pi Zero have enough horsepower to run this?

It should work hopefully, at least for the client.
But at the min there is no ARMv6 support, hopefully soon.

1 Like

So I’m new to Rhasspy and am not sure if I’m missing some tweak (I didn’t see anything in a cursory GibHub review), but I’m not finding blocks for HomeAssistant or Node-Red in the settings page on the web UI.

This is with version 2.4.14 of the containers. I see the same config bits missing on both the client and server instances.

It looks like right now, you have to add the configuration bits to the profile.json. https://rhasspy.readthedocs.io/en/latest/intent-handling/#intent-handling

Not sure if this is a temporary tweak or what.