Raspberry Pi as a CHAPTER 5 voice assistant

In setting up my RasPi 3B as a HA Voice Assistant per Chapter 5 of Year of the Voice I followed the tutorial referenced in the documentation. It all worked as stated, and really didn’t leave me much to add.

… but I will anyway.

Note that this assumes Wyoming, openwakeword, and a Voice Assist pipeline have already been installed and setup on your HA server.

The documentation does include useful additional information; but assumes that the reader is also a developer who doesn’t need things in much detail. I strongly recommend you work through the Tutorial, then scan the documentation for the advanced options.


Installation

You should use the Install OS and Install Software sections from the tutorial. “script/setup” might be enough of a hint to a seasoned developer, but newer users need more detail.

Determine Audio Devices

Testing the audio input and output is vital; and is something new users often have trouble with.

If you have a reSpeaker 2-mic or 4-mic HAT (or a clone thereof), follow the instructions in the tutorial. If you are using different audio devices you will have to amend the commands throughout the tutorial.

In my case I am using a USB microphone and speaker connected to the 3.5mm headphone socket – so instead of installing the reSpeaker drivers I used commands “arecord -L” and “aplay -L” to check what audio devices are detected.

pi@HA-voice-2:~ $ arecord -L
null
    Discard all samples (playback) or generate zero samples (capture)
hw:CARD=ABTWPDQ0222M,DEV=0
    ABTWPDQ-0222-M, USB Audio
    Direct hardware device without any conversions
plughw:CARD=ABTWPDQ0222M,DEV=0
    ABTWPDQ-0222-M, USB Audio
    Hardware device with all software conversions
default:CARD=ABTWPDQ0222M
    ABTWPDQ-0222-M, USB Audio
    Default Audio Device
sysdefault:CARD=ABTWPDQ0222M
    ABTWPDQ-0222-M, USB Audio
    Default Audio Device
front:CARD=ABTWPDQ0222M,DEV=0
    ABTWPDQ-0222-M, USB Audio
    Front output / input
dsnoop:CARD=ABTWPDQ0222M,DEV=0
    ABTWPDQ-0222-M, USB Audio
    Direct sample snooping device
pi@HA-voice-2:~ $ 

The information I am looking for here is, what devices are available (here it’s only “ABTWPDQ0222M USB Audio”, but my aplay includes HDMI and the headphone socket on my microphone as well as the built-in 3.5mm socket). I want the “plughw” variation since that includes any conversions built into the driver software. So my device is “plughw:CARD=ABTWPDQ0222M,DEV=0”.

For my headphone under aplay -L is:

plughw:CARD=Headphones,DEV=0
    bcm2835 Headphones, bcm2835 Headphones
    Hardware device with all software conversions

To test, record a 5 second message to test.wav:

arecord -D plughw:CARD=ABTWPDQ0222M,DEV=0 -r 16000 -c 1 -f S16_LE -t wav -d 5 test.wav

to which I say some random test words. Then play it back to my headphones:

aplay -D plughw:CARD=Headphones,DEV=0 test.wav

which repeats my testing words back to me. So far, so good !

IMPORTANT: If you didn’t get to hear your test, you must get this working before continuing. If there are problems

  • as a first step try a different speaker device by changing -D <device>
  • some devices (like the reSpeaker HATs) require a driver to be installed first
  • If you are using Docker, you may need to allow Docker to access the audio device(s)
  • getting an audio device working is pretty basic linux stuff - not a Home Assistant or Voice Assistant problem - so you are likely to get better answers if you ask for help in the support or forum for your specific device.

Running the Satellite

As in the previous section, change the device names for your microphone and speaker. It may be useful now to set the name you want this satellite to show as. In my case,

script/run --debug --name 'HA-voice-2' \ --uri 'tcp://0.0.0.0:10700' \ 
    --mic-command 'arecord -D plughw:CARD=ABTWPDQ0222M,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' \
    --snd-command 'aplay -D plughw:CARD=Headphones,DEV=0 -r 22050 -c 1 -f S16_LE -t raw'

In my case I got several DEBUG: lines displayed, ending with:

DEBUG:root:Detected IP: 192.168.1.85 
DEBUG:root:Zeroconf discovery enabled (name=b827ebd38289, host=None) 
DEBUG:root:Connecting to mic service: ['arecord', '-D', 'plughw:CARD=ABTWPDQ0222M,DEV=0', '-r', '16000', '-c', '1', '-f', 'S16_LE', '-t', 'raw'] 
DEBUG:root:Connecting to snd service: ['aplay', '-D', 'plughw:CARD=Headphones,DEV=0', '-r', '22050', '-c', '1', '-f', 'S16_LE', '-t', 'raw'] 
INFO:root:Connected to services 
Recording raw data 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono 
DEBUG:root:Connected to mic service

Going to my web browser and Home Assistant page, and I see that there is already a notification, that a new device was detected … Click [Configure], select an area - and there it is in Integrations under “Wyoming Protocol”.

You should setup your Voice Assistant pipeline if you haven’t already; and Restart Home Assistant.

And as predicted, back on the RasPi satellite, the display now shows

INFO:root:Streaming audio

THAT’S IT ! READY TO TEST.
Note that this assumes Wyoming, openwakeword, and a Voice Assist pipeline have already been installed and setup on your HA server.

My test was successful, though I am used to sounds to indicate when the voice assistant has detected wakeword, and when it has completed its processing. Fortunately these are easy to add. You should remember that you proved the installation was successful by running “script/run --help” - which spewed out a lot of text. That is the help file, and includes

  --awake-wav AWAKE_WAV
                        WAV file to play when wake word is detected
  --done-wav DONE_WAV   WAV file to play when voice command is done

Wyoming-satellite already includes a couple of sounds we can use, so it’s just a matter of adding the arguments to your command …
Press CTRL-C to interrupt the wyoming-satellite program currently running; press the up-arrow key to repeat the last command, add " --awake-wav sounds/awake.wav --done-wav sounds/done.wav", and press [Enter] to run the modified command.

Me: “OK Nabu”
VA: trill sound
Me: “Turn on study light”
VA: chirp
VA: “Turned on light”

In Home Assistant, on my HA-voice-2 device page under Wyoming Protocol under Integrations the “Assist in progress” indicator changed to “on” during the exchange - but of more interest, over on the satellite the console now displays:

INFO:root:Streaming audio
DEBUG:root:Wake word detected
DEBUG:root:Connected to snd service
Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
DEBUG:root:Event(type='transcript', data={'text': 'Turn on study light.'}, payload=None)
INFO:root:Streaming audio
DEBUG:root:Connected to snd service
Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
DEBUG:root:Event(type='synthesize', data={'text': 'Turned on light', 'voice': {'name': 'NatashaNeural'}}, payload=None)
DEBUG:root:Connected to snd service
Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono

which shows the transcript of what it thought you said.

Create Services

Currently we will have to go back to the satellite and run the command whenever the satellite reboots - but by running it as a service in linux it can automatically start up whenever the RasPi is booted. Also, the service runs in the background, so we can use the console (or SSH session) to do other things. AND the service is automatically restarted if it crashes ! Magic !

You may need to modify the ExecStart line to match the command you were running from the console - ie with the correct devices and the awake and done sounds. Note also that since the service is started by the linux Operating System before you log on, it needs to know exactly where to find the script and files - in /home/pi/wyoming-satellite


That’s basically it ! You have a working RasPi Voice Assist satellite - GO YOU !!!

4 Likes

Wow, that was long for “nothing to add” :wink:

There are some more advanced topics covered in the documentation, which you may wish to look into …


Local Wake Word Detection

Our satellite is already detecting the wake word … by sending all sounds through our LAN (probably wi-fi) to the HA server, which is doing the job of deciding whether the we spoke the wake word.

This is certainly a lot better than sending all sound out through the internet to a server owned and operated by a multinational company who doesn’t value our privacy the way we do … but maybe we can do better. The speed of our voice assistant depends on a number of factors, including how fast the HA server’s CPU is, how much other work it is doing, how many voice satellites we have, and how busy our wi-fi is.

If we are using a Raspberry Pi 3, 4, 5 or zero 2 as a satellite, it seems a pity not to use more of its processing power. Running the wake Word Detection on the RasPi reduces the load on the HA server’s CPU, and on the wi-fi (since it is only the actual commands which need to be sent to the HA server).

NOTE: Local wake word is an option. It may not be an improvement for you.

Lets go back to the tutorial “Local Wake Word Detection” section. The instructions start “From your home directory…” which means to do a “cd ~” first.

pi@HA-voice-2:~/wyoming-satellite $ cd ..
pi@HA-voice-2:~ $ 

The instructions are mostly a matter of copy from the instructions and paste into your console/terminal window, and i don’t think they need more explanation.

I should point out that the instructions in the tutorial use the “OK Nabu” wake word - but several others are supplied ( hey_jarvis, alexa, hey_mycroft and hey_rhasspy ) requiring you only to change the value in the --wake-word option. You can also use community-trained wake words, or even make your own.


Audio Enhancements

There are notes in both the Tutorial and Documentation on ways you can tweak the mic input and speaker output to suit your individual environment.


LED Service

Having visual clues to the voice assistant’s status (like the 6 pictures displayed on the ESP32-S3 BOX 3’s screen) is a great addition … however this section is specific to the reSpeaker 2-mic HAT, which has 3 APA102 status LEDs as well as 2 microphones.
The supplied python program would be a good starting point if you wish to develop your own … but lets not dive down that rabbit-hole now.


Event Commands

Going beyond the LED Service, wyoming-satellite can run external commands in response to a dozen different events from the server. Another rabbit-hole :wink:

3 Likes

Awesome write-up which would have been very useful if I had found it an hour ago… Having just got this all going on a Pi4 with USB speakerphone (Poly-20) myself. :rofl:

In my case I also used openwakeword to do the wake-word detection on the Pi.

After running up openwakeword on the Pi as per the instructions; GitHub - rhasspy/wyoming-openwakeword: Wyoming protocol server for openWakeWord wake word detection system

I then started the satellite with;

script/run \
  --debug \
  --name 'FR satellite' \
  --uri 'tcp://0.0.0.0:10700' \
  --mic-command 'arecord -D plughw:CARD=P20,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' \
  --snd-command 'aplay -D plughw:CARD=P20,DEV=0 -r 22050 -c 1 -f S16_LE -t raw' \
  --wake-uri 'tcp://127.0.0.1:10400' \
  --wake-word-name 'ok_nabu' \
  --awake-wav sounds/awake.wav \
  --done-wav sounds/done.wav \
  --mic-auto-gain 5 \
  --mic-noise-suppression 2

Cheers.

1 Like

Thanks for this, I will definitely venture down this rabbit hole this weekend.

thanks for putting this together - I have satellite running on rp4 & usb conference device, works like a charm but is there any link or guide to setup rp4 so it reboots and starts running the script again… thanks

that is the “create service” bit :slight_smile:

doh, skipped read that part - will give that a go, i had it up and running via the initial chapter 5 blog post - THANK YOU!!!

I have this working pretty well with a RPi0-2W & the reSpeaker 2-mic HAT. Even found a little speaker that allows me to clearly hear the response (my main gripe about the various ESP32 based devices is the difficulty in hearing the response).

One issue I’ve noticed is after a power failure, the RPi0-2W won’t work until I reload the device in the Wyoming Protocol integration. The only way I thought to fix this is to create an automation in HA to reload the integration when it comes back online. But I wonder, is there something I can do on the RPi0-2W to reestablish the HA connection when it boots up after a power failure (or being switched back on)?

I have to restart the service after every shutdown, even having

Restart=always
RestartSec=1

in the service file.

Who has M5 Atom Echo and Pi’s running? I am experiencing that the M5 Atom works way faster compared to the Pi0 2 (but the Pi’s with openwakeword detection).

Did you create the services on the RasPi ?


My own working wyoming-satellite.service file contains:

[Unit]
Description=Wyoming Satellite
Wants=network-online.target
After=network-online.target
Requires=wyoming-openwakeword.service

[Service]
Type=simple
ExecStart=/home/pi/wyoming-satellite/script/run --debug --name 'HA-voice-2' --uri 'tcp://0.0.0.0:10700' --mic-command 'arecord -D plughw:CARD=ABTWPDQ0222M,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' --snd-command 'aplay -D plughw:CARD=Headphones,DEV=0 -r 22050 -c 1 -f S16_LE -t raw' --awake-wav sounds/awake.wav --done-wav sounds/done.wav --wake-uri 'tcp://127.0.0.1:10400' --wake-word-name 'hey_jarvis'
WorkingDirectory=/home/pi/wyoming-satellite
Restart=always
RestartSec=1

[Install]
WantedBy=default.target

Check that you changed ExecStart and workingDirectory to the full directory tree (from root). The service starts as the machine is booting up, so doesn’t know which users home directory to find the files.

Unfortunately I am still fairly new to linux, so not sure what else to suggest.

Not surprised; you’re comparing apples with oranges.

  • M5 Atom is basically just copying packets of data between audio and LAN interfaces.
  • OpenWakeWord takes a fair bit of processor power (hence not recommended for a less than a RasPi 3).

If you turn off the local openwakeword, my guess is that the M5 Atom would still be slightly faster, because the general purpose RasPi OS will have more overhead.
M5 Atom is definitely less powerful than a RasPi 3 or Zero 2 - but it’s super cheap … so a reasonable compromise.

Thanks for the reply. I’m not great with linux myself. It took me 2 tries of the github instructions to get mine to work. :slight_smile: Yes, I created the wyoming-satellite service and can see the service running after a restart but it doesn’t respond until I reload the integration in HA.

My ExecStart is very similar to yours except I have some additional microphone parameters to account for the different audio hardware.

My setup is different in that I’m using a custom wake word I created (Ok Jarvis) and am running an additional service (2mic_leds.service) to control the LEDs on the ReSpeaker 2-mioc hat. I could be wrong but I don’t think any of that should matter. Thanks again.

I got wyoming-satellite but after I give it the first command it gets stuck listening.

After I stay in silence for a while, it starts saying “sorry, I couldn’t understand that” Any guidance would be much appreciated.

1 Like

so strange - can’t find any relevant differences. Similar to @Gregkg I’m using the 2mic_leds.service and the 2mic hat. Do you also use a Pi0 2? On Pi4 the autostart works for me.

For me it’s just really surprising, how such a cheap solution can perform better.

Did you check in developer tools → assist if the sentence works? Otherwise you can debug if you go to config → voice assistants click on your assist and click debug in the 3 dots menu.

Does somebody know how to solve the issue with a sentence which targets hundreds of entities? I’m facing this with the HassShoppingListAddItem intent - “Add apple to my shopping list” but instead of doing this, every available entity is matched as a target

I did check developer tools, and that works fine. The problem, i’m guessing, is that specific satellite. Maybe the microphone that I’m using? I’m using a Anker PowerConf S330. Does anyone know what, if any, fine tuning I should be doing for it? This is how I’m running it:

script/run \
--name 'mini-server' \
--uri 'tcp://0.0.0.0:10700' \
--mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw' \
--snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw' \
--mic-command 'arecord -D plughw:CARD=S330,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' \
--snd-command 'aplay -D plughw:CARD=S330,DEV=0 -r 22050 -c 1 -f S16_LE -t raw' \
--wake-uri 'tcp://localhost:10400' \
--wake-word-name 'computer' \
--done-wav /home/ignacio/wav/done.wav \
--awake-wav /home/ignacio/wav/awake.wav \
--mic-noise-suppression 2

For my openwakeword, should I set --threshold to some specif value or leave it empty?

I agree that the different wake word, 2-mic_leds service and different microphone should not make a difference. By the way, did you follow Mike’s tutorial - it is for the same hardware as yours, so should be correct as-is.

So, you are having to go to your Home Assistant machine and reload something there ? Are you reloading the device in Wyoming Integration, as per this image ?

I found with the previous version (chapter 4’s homeassistant-satellite) that it would always take 15 seconds after I finished talking before the command would be actioned … but for me wyoming-satellite only pauses occasionally. I assume that you have waited longer.

Is this the 15 second delay I mentioned in last paragraph ? Apparently others are still getting this, and it is reported on github.

Not sure whether that means you are or are not a treker/trekie/star Trek fan. Either way, a fairly commonly used word as a wakeword is likely to create more false activations. But that is a different issue :wink:

It is sounding to me as though we may need to tweak the audio sensitivity for each microphone, and even for how we use it. :frowning:

I’m still trying to decide on a wake word, right now I think Dumbledore is the one, but computer is fun for testing.

I think so. Maybe I just need to wait for an update.

I have been using it for 2 weeks, and for me, the respeaker 2 hat is not good enough, the record have a lot of noise, even when I use the noise suppression, faulty are still too much.
I tried with the hikvision ds u02 webcam and it is more way better.

By the way I am finding a solution for continous conversation. On esphome we can use method “start_continous” for trigger listenning. Do we have any simple way here for doing the same?

Yes, that is exactly what I have to do.

Yes I did & the reason I bought the ReSpeaker 2-mic HAT. Silly me thinking I’d get it to work perfectly if I had the same hardware as the instructions. :rofl:

One thing I noticed is, the LEDs on the HAT do not turn on when starting up from a power failure (or turned off/on) but do as soon as I reload the device in the Wyoming Integration.

Right now, it’s not a show stopper & will play with it as I get the time. Now I’m playing with custom sentences to control my thermostat since there’s no built in intents for that yet. I know voice in HA is still in it’s infancy (but am impressed how much they accomplished so far) and am excited to see it improve going forward.

I do hope in the future, Nabu-Casa create their own satellite hardware for voice assistants similar to what they did for HA Blue, Yellow etc… The small audio coming from various ESP devices doesn’t work for me. My ultimate goal will be to ditch my various Google/Nest assistants to control my household.

Thanks again for your help.