Rhasspy - Offline voice control step by step (Server/Client) - Docker

jaburges · December 10, 2019, 7:17am

Hi Everyone,

UPDATE new installation here: https://rhasspy.readthedocs.io/en/latest/installation/

Leaving this up for posterity:
So I had everything working in my house for a few weeks so figured I needed to break things and create mayhem again lol.
I started looking a this, a local offline voice assistant.

General objective was to replicate the Alexa ‘mesh’ in my house, but offline. I wanted a simple set up with 2 components:

Server Element running on my NUC (I use docker) that also runs HA, MQTT etc.
Client Element that offered wake word, voice detection and speech-to-text
Note I do NOT use TTS as i have sonos in the rooms I need a reply for now (I may add a speaker later and update)

Rhasspy did NOT disappoint! It was easy and I was up and running in 10 minutes (YMMV based on environment, docker REALLY helped though!). Don’t be put off by the extensive docs, getting up and running was very straight forward. Once set up you can then play with the config to suit your needs.
HTH…

Hardware set up involved components laying around:
Seeed pi 4 Mic array
8Gb Micro SD Card
Raspberry Pi 3B

Server Side Steps:

Assuming you already have docker running, create a directory for rhasspy, and sub folder called profiles.
Pull and Run docker image:

docker run -p 12101:12101 \
      --restart unless-stopped \
      --name rhasspy \
      -v "/<PATH_TO>/rhasspy/profiles:/profiles" \
      synesthesiam/rhasspy-server:latest \
      --user-profiles /profiles \
      --profile en

Goto server URL http://<Server_IP>:12101 (you may be asked to download files)
Goto settings and check config (and save along the way):

[Rhasspy]
Listen for wake word on Startup = UNchecked

[Home Assistant]
Do not use Home Assistant (note you obviously can instead of Node-Red)

[Wake Word]
No Wake word on this device

[Voice Detection]
No voice communication on this device

[Speech Recognition]
Do Speech recognition with pocketsphinx

[Intent Recognition]
Do intent recognition with fuzzywuzzy

[Text to Speech]
No Text to speech on this device

[Audio Recording]
No recording on this device

[Audio Playing]
No Playback on this device

Check Slots, and Sentences tabs and make sure to hit Train and then Restart

Client Side Steps:

Flash 8Gb MicroSD Card with Buster with Etcher.
Remove and re-insert MicroSD card and add files to the root directory(for headless setup - meaning no screen needed). You only need (b) below if you plan to use WiFi.
a) a file simply called ‘ssh’
b) wpa_supplicant.conf (example here)
Insert the MicroSD card in the Pi, use a proper Power Supply and check your router for the IP address it gets.
SSH into the Pi using that IP address (I use Putty) using pi default user/pass = pi/raspberry.
You are going to want to change that in the future!
Install git:
sudo apt install git
Install Seeed mic array based on info here

git clone https://github.com/respeaker/seeed-voicecard
cd seeed-voicecard
sudo ./install.sh 
sudo reboot

Plug in Seeed speaker and check install was successful against expected result here:

arecord -L

install docker:

curl -sSL https://get.docker.com | sh

modify user permissions to access docker without using ‘sudo’ all the time

sudo usermod -a -G docker pi

Close SSH, and relaunch SSH connection to use new permissions.
Create directories for Rhasspy Docker image to use:

cd ~
mkdir rhasspy
cd rhasspy
mkdir profiles

Pull and run docker image:

docker run -p 12101:12101 \
      --restart unless-stopped \
      --name rhasspy \
      -v "/home/pi/rhasspy/profiles:/profiles" \
      --device /dev/snd:/dev/snd \
      synesthesiam/rhasspy-server:latest \
      --user-profiles /profiles \
      --profile en

Goto Client URL http://<Pi_IP_address>:12101 (you will be asked to download some files)
(At time of writing I put Wakeword, voice detection and recognition on the client)
Under settings ensure the following is selected, Save along the way. You will need to Train once also.

[Rhasspy]
Listen for wake word on Startup = checked

[Home Assistant]
Do not use Home Assistant (note you obviously can instead of Node-Red)

[Wake Word]
Use snowboy (this should trigger a download of more files)

[Voice Detection]
Use webrtcvad and listen for silence

[Speech Recognition]
Use Remote Rhasspy server for speech recognition:
URL = http://<SERVER_IP>:12101/api/speech-to-text

[Intent Recognition]
Use Remote Rhasspy server for speech recognition:
URL = http://<SERVER_IP>:12101/api/text-to-intent

[Text to Speech]
No Text to speech on this device

[Audio Recording]
Use PyAudio (default)
Input Device = seeed-4mic-voicecard (you can test this if you want)

[Audio Playing]
No Playback on this device

Node-Red Config
1.Import this flow from the Rhasspy examples
2. Attach a debug node to the websocket in and configure it to show full msg object.
3. I edited light text node to take this:

{
  "domain": "light",
  "service": "turn_{{slots.state}}",
  "entity_id": "{{slots.name}}"
}

Add a call service node after the light text and leave it blank. Deploy and Enjoy offline voice assistant.

Pick a light (that is a light domain not a switch, and say “Snowboy, turn bedroom light off”

jaburges · December 11, 2019, 2:21am

correcting the spaces in the light name

Rhasspy passes bedroom Light but HA needs bedroom_light
This was pretty easy with a change node before the change node that splits out the intents:

Then I simply replaced the space with a _ (note I also kept the original name so I can use it in the TTS phase at the end):

jaburges · December 11, 2019, 2:22am

Correcting switch to light domain
So for anyone (like me) wondering how to deal with lights that are actually switches (switch.light_example) you can use a function node in node red to correct the domain. I have less lights that are actual lights, and a LOT more lights that are actually Z-wave switches so checked if the light was an actual light, if so use the light domain, else use the switch domain

if (msg.name == "light_name_1" || msg.name == "light_name_2" || msg.name == "light_name3" ) {
msg.domain = "light"    
}
else {
msg.domain = "switch"
}
return msg;

synesthesiam · December 13, 2019, 3:56am

Thank you for the tutorial, @jaburges! Would you be OK with me putting these instructions in a (forthcoming) tutorial section of the Rhasspy docs?

synesthesiam · December 13, 2019, 4:04am

Another way of doing this is with substitutions. In your sentences.ini file, you can have:

light_name = (bedroom light):bedroom_light {name}

When you get the intent from Rhasspy, the name slot will contain bedroom_light. You can still get access to the original text (for text to speech, like you mentioned) by looking in the entities list of the intent. You’ll find an entry like { "entity": "name", "value": "bedroom_light", "raw_value": "bedroom light" }. The raw_value will have the original text before substitution occurs.

DeadEnd · December 13, 2019, 4:13am

Quick question along this line. If you want a substitution to have multiple words, is this possible and what do you have to wrap them in?

For example:

light_name = (kitchen light):cabinet light {name}

I tried this, but the slot.name only contained light not both words.
I tried putting " " around it but that didn’t work.
Is it possible in Rhasspy’s current state and if not do you think this is something an amature could figure out for a pull request .

Thanks!
DeadEnd

synesthesiam · December 13, 2019, 4:19am

In this specific case, you could just do:

light_name = (kitchen:cabinet light){name}

but in general, you (currently) need to drop/add individual words. If you wanted to replace “kitchen cabinet” with “red clown shoes”, for example, it would be:

light_name = (kitchen: cabinet: :dark :clown :shoes){name}

Something like word: means to listen for the word, but drop it during substitution, whereas :word means to add a word in without anything being spoken.

Hope this helps

DeadEnd · December 13, 2019, 4:23am

Absolutely amazing explanation (and fast too!).
I completely understand how it works now.

Thanks again!
DeadEnd

jaburges · December 13, 2019, 5:13am

Of course! Thanks for even asking.
You put the effort into making the solution, it’s the least I can offer!

thundergreen · December 14, 2019, 7:58am

If I well understood: The server mainly is running on your Intel NUC right and you use the raspberry pi as a client?

I will give it a shot as I am exactly trying to archieve this

DeadEnd · December 14, 2019, 6:08pm

That depends on who you are asking
I am lucky that my server is on the other side of the wall of my main living area.
I was able to pass a USB extension through the wall, and plug a speakerphone directly into the server.
Others I believe are using a PI client and setting the server IP for intent etc.

DeadEnd

thundergreen · December 15, 2019, 9:49am

So normally it’s possible to have more than one sattelite right … I’m thinking of adding two sattelites

koan · December 15, 2019, 10:15am

Having multiple independently working satellites is work in progress: Support for receiving from / sending to multiple site ids · Issue #49 · synesthesiam/rhasspy · GitHub

thundergreen · December 15, 2019, 1:02pm

sounds fantastic! I will keep an eye on it… for the moment i try setting up one remote pi to see if it works well

thundergreen · December 15, 2019, 6:47pm

So! I tried setting the client up but i don’t get any beep nothing… seems like is it not running… used the way described above… my server is running on hassio server and client on pi. i get plenty of error messages also I have a reaspeaker USB attached… it is recognized by rhasspy but no sound after telling hey snowboy

Here the logs

jaburges · December 16, 2019, 4:27am

note the above doesn’t include any sound - i didn’t configure or set up the confirmation sounds as I used Node-Red to set up.
I purely used the Mic-array to capture audio

thundergreen · December 16, 2019, 8:24pm

means it should work with node-red? Let me try this thanks for your response

ChrisWeiss · December 21, 2019, 8:28pm

Would a Raspberry Pi Zero have enough horsepower to run this?

wills106 · December 22, 2019, 8:05am

It should work hopefully, at least for the client.
But at the min there is no ARMv6 support, hopefully soon.

ChrisWeiss · December 23, 2019, 5:31am

So I’m new to Rhasspy and am not sure if I’m missing some tweak (I didn’t see anything in a cursory GibHub review), but I’m not finding blocks for HomeAssistant or Node-Red in the settings page on the web UI.

This is with version 2.4.14 of the containers. I see the same config bits missing on both the client and server instances.

It looks like right now, you have to add the configuration bits to the profile.json. https://rhasspy.readthedocs.io/en/latest/intent-handling/#intent-handling

Not sure if this is a temporary tweak or what.