Rhasspy offline voice assistant toolkit

synesthesiam · December 12, 2018, 7:47pm

Is this a Raspberry Pi 3 with the latest Raspbian (based on Debian Stretch)?

chairstacker · December 12, 2018, 7:59pm

Yes, it’s an RPi3.

The command cat /etc/*{release,version} results in:

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
9.6

infiniteloop · December 12, 2018, 8:47pm

Hi all,
i want to share with you my progress so far, maybe they can be useful for someone.

I’m using a rPi 3 with a fresh install os Raspian stretch lite and I’ve installed Rhasspy as described by @synesthesiam in a previous post:

synesthesiam:

Make sure you have Docker installed:
curl -sSL https://get.docker.com | sh
and that your user is part of the docker group:
sudo usermod -a -G docker $USER
Be sure to reboot after adding yourself to the docker group!

Next, start the Rhasspy Docker image in the background:
docker run -d -p 12101:12101 \
      --restart unless-stopped \
      -e RHASSPY_PROFILES=/profiles \
      -v "$HOME/.rhasspy:/profiles" \
      -v /dev/snd:/dev/snd \
      --privileged \
      synesthesiam/rhasspy-hassio-addon:armhf

It works flawless and even recognize the ps3 eye microphone out of the box.
Then I started to add sentences in my language (Italian) following the example that are present in English.
When you hit Re-Train it complains about missing words so you have to add those words (in the specific section) with the corresponding pronunciation. It’s a little bit long process but it’s easy to do. Using the desktop version with NLU probably you can skip this steps. (hint: what about snipsNLU? it’s open source and should run on rpi)

The interesting part is how you can send the intent to home assistant.
I’ve made some modification to the example provided in English so that I can create less and more efficient automation in Home assistant. For example:

[ChangeLightState]
room_name = (soggiorno | cucina | camera | camerina | bagno) {room}
light_name = (lampadario | faretti) {name}
light_cmd = (accendi | spengi) {command}

<light_cmd> [il] [la] <light_name> [in] [ <room_name> ]

This way I can say “Accendi il lampadario in soggiorno” (translation: Turn on the livingroom chandelier) and I get the following intent:

"intent":
  "entities":
    0:
      "entity": "command"
      "value": "accendi"
    1:
      "entity": "name"
      "value": "lampadario"
    2:
      "entity": "room"
      "value": "soggiorno"
  "hass_event":
    "event_data":
      "command": "accendi"
      "name": "lampadario"
      "room": "soggiorno"
    "event_type": "rhasspy_ChangeLightState"
    "intent":
    "name": "ChangeLightState"
  "text": "accendi lampadario soggiorno"

Now if your entities in home assistant follow a recurring schema, for example

light.lamp_name.room_name

in your automation config file you can write something like this:

- alias: Light voice command (rhasspy)
  trigger:
    platform: event
    event_type: rhasspy_ChangeLightState
  action:
    service_template: >
      {% if trigger.event.data["command"] == "accendi" %}
        light.turn_on
      {% else %}
        light.turn_off
      {% endif %}
    data_template:
      entity_id: "light.{{ trigger.event.data['name'] }}_{{ trigger.event.data['room'] }}"

so with a single automation you can turn on and off every light that you can control with home assistant.

Hope this help

synesthesiam · December 12, 2018, 9:21pm

Excellent write-up, @infiniteloop. Thank you. With your permission, I’d like to include your translations in the Italian profile and incorporate your automation template into the example HA config

Thanks for the tip about snips-nlu. Looks like it supports most of the languages that Rhasspy does out of the box (Dutch and Russian are missing, unfortunately). They have a pre-built pip package for amd64, but it seems like I’d have to build from source for the Raspberry Pi.

Another NLU system I’ve been looking at is Mycroft Adapt, which seems pretty lightweight. One affordance of Adapt that some of the others don’t is that you can specify required and optional phrases explicitly for your intents. It’s not as fancy with the machine learning, but seems like it would be a good step up from my fuzzywuzzy based system.

infiniteloop · December 12, 2018, 9:42pm

You are welcome, I’m only making use of your awesome work.
Feel free to use whatever you need, however one thing to note is that the service_template part of the automation probably is not so useful for an English speaker.

Regarding the profile: locally is it updated whenever I create new sentences?

chairstacker · December 12, 2018, 9:45pm

Encouraged by @infiniteloop, I just tried it on yet another RPi3:

command: cat /etc/*{release,version}

Result:

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
9.6

Same result: I get the same error message for recording and no sound on the speakers.

synesthesiam · December 13, 2018, 1:47am

For reference, here’s what my Raspberry Pi 3 reports (installed fresh Raspbian this week, PS3 eye camera is connected via USB). Microphone works great, haven’t tested speakers yet.

$> arecord -l

**** List of CAPTURE Hardware Devices ****
card 1: CameraB409241 [USB Camera-B4.09.24.1], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

$> aplay -l

**** List of PLAYBACK Hardware Devices ****
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
  Subdevices: 7/7
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

$> cat /etc/*{release,version}

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
9.4

chairstacker · December 13, 2018, 2:01am

The only difference I can see - apart from the model of the USB Camera, of course - is that your result from arecord -l is:

Subdevices: 1/1

while mine says:

Subdevices: 0/1

As proposed earlier, I have created a file called .asoundrc in my /home/Pi directory with the following content:

pcm.!default {
    type asym
    playback.pcm "plughw:0"
    capture.pcm  "plughw:1"
}

ctl.!default {
    type hw
    card 1
}

Permissions are ‘0644’ and the owner is root

gpbenton · December 13, 2018, 9:14am

I found that mapping /dev/snd to the docker container with -v completely locked up the sound system on my host (Kubuntu 18.04). I had to reboot the host to free it up and play some sounds again, before attempting to run the container using the --device flag.

Cao_Hoa · December 13, 2018, 12:44pm

my new fault

synesthesiam · December 13, 2018, 1:56pm

If you’re not running this inside Hass.IO, make sure to change the URL for Home Assistant in the Settings. In your case, maybe it’s running at http://192.168.1.168:8123 ?

Cao_Hoa · December 13, 2018, 2:14pm

thank you very much
I would love your project
you do very well

synesthesiam · December 13, 2018, 2:24pm

The short answer is yes, it will be written locally to $HOME/.rhasspy/profiles. I’ve added a longer answer to the documentation, which describes how the RHASSPY_PROFILES environment variable is used like PATH to decide where to read/write profile-related files.

synesthesiam · December 13, 2018, 2:26pm

Docker Image Changes

Everyone: I’ve consolidated the Rhasspy Docker images into the single name synesthesiam/rhasspy-server:latest in order to avoid confusion (and because it was the most heavily used).

Please use this Docker image when updating Rhasspy. NOTE: is does not include rasaNLU anymore – my plan is to move that out into a separate Docker container and just use its HTTP interface.

chairstacker · December 13, 2018, 5:17pm

Thanks for the hint, @gpbenton.

I’ve started rebooting my Pi on most config/system changes a long time ago - so I did that here as well.

Might just get rid of all containers and start over.

Curious to see, though, if @synesthesiam could get their speakers to work.
As long as my speakers don’t work I think it’s a settings issue - and therefore don’t see a reason just yet to buy a different (3rd?!) USB microphone.

chairstacker · December 13, 2018, 9:32pm

Started over again, i.e.

stopped and removed the container.
made sure that the USB microphone and analog speakers worked (arecord -D plughw:1,0 --duration=3 test.wav && aplay test.wav)
set up the new container (synesthesiam/rhasspy-server:latest) including --device /dev/snd:/dev/snd instead of -v /dev/snd:/dev/snd
container seems to start up fine, now the output when pushing the ‘Pronounce’-button is sent to the speakers
Pushing the ‘Hold to record’-button still results in [Errno -9997] Invalid sample rate

Note - Point 2. shows the following info on the screen now:

Recording WAVE 'test.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
Playing WAVE 'test.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono

Update:
I have since added a file with the name .bash_aliases to my home directory, the only content it hold is the line
alias arecord='arecord -D plughw:1,0 --format=S16_LE --rate=48000'

It changes the message that is displayed when using this command arecord -D plughw:1,0 --duration=3 test.wav && aplay test.wav to

Recording WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Mono
Playing WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Mono

Pushing the ‘Hold to record’-button still results in [Errno -9997] Invalid sample rate

synesthesiam · December 13, 2018, 10:13pm

OK, at least you’re making some progress. What do you see in the list of devices on Rhasspy’s Speech tab? Is there anything besides “default”?

chairstacker · December 13, 2018, 10:19pm

Here is the dropdown list:

If I select Default Device or 2 I get the invalid sample rate error, for all the others I get
[Errno -9998] Invalid number of channels

chairstacker · December 14, 2018, 1:23am

Here are some further results of my testing & trying:

Opening a bash shell inside the container with
docker exec -it condescending_sanderson /bin/bash

and running

arecord -D plughw:1,0 --duration=3 test.wav && aplay test.wav

works both ways, i.e. it records and plays back properly, but it displays:

Recording WAVE 'test.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
Playing WAVE 'test.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono

To me, this indicates that the standard sample rate is 8 bit while Rhasspy seems to require 16 bit.
Not sure, if or where I could add the file .bash_aliases containing the line
alias arecord='arecord -D plughw:1,0 --format=S16_LE --rate=48000'
as shown above.

Can that be added to the container at all, any hints?
nano doesn’t seem to work inside the container and I’m not familiar with vi.

Another update:
Within the container, I changed the following entries in the file /usr/share/alsa/alsa.conf

from

defaults.ctl.card 0
defaults.pcm.card 0

to

defaults.ctl.card 1
defaults.pcm.card 1

This now means that I can record though the USB microphone - but playback through the speakers does not work.

Leaving the pcm entry at 0 results yet in another error message, though.

synesthesiam · December 14, 2018, 3:27am

OK, I’ve got a couple of things in place and hopefully one will help
Make sure you pull the latest Docker image before trying. For some reason, it seems like you have to pull the architecture specific image to get an update (I’m still learning how to operate docker):

docker pull synesthesiam/rhasspy-server:armhf

In the Profiles tab, I’ve added an Audio System section where you can pick either PyAudio (the default) or have Rhasspy use arecord directly. Try picking the latter, saving, and then refreshing. If this works, I’d be very interested what PyAudio and arecord are doing differently!

In case that doesn’t work, I’ve added some instructions on running Rhasspy in a virtual environment. I’m really hoping that the first option works for you, though.

Good luck, and thanks for sticking with it!