Raspberry Pi as a CHAPTER 5 voice assistant

@ ignacio82 I’ve had the homeassistant-satellite build back running since yesterday, and it works reasonably well. I notice that it too has a continuously-present arecord process, so I now assume that’s just how it works and is not an symptom of something working incorrectly.

However, my wyoming-satellite build did have issues:

  • initially it responded well, but while replying to my request, would concurrently reissue the awake sound and wait on a second request; this behaviour was fairly consistent, until…
  • later, it would only respond occasionally to the wake word, and either never return from my request, or return after a minutes-long pause with the “sorry, I didn’t understand” message.

These made the wyoming-satellite build unusable for me at the present time.

Hi everyone, question. Is it possible to have the wyoming satellite pass Assist’s response to another media_player? I know how to using an esphome. I would like to use the pi for just input.

I achieved this by using the synthesize-command flag to forward the generated text to a custom event in home assistant and picking that up with node-red to send it to any media_player with tts.speak.
I can send you the config when i am home later, if that approach sounds interresting to you. It is somewhat roundabout though…

1 Like

If it works, it works. Yes, I’d really like that. Thanks :+1:t2:

Sorry, totally spaced on getting back to you. So here it is:

Assuming /home/pi is where you "git clone"ed to:
create the file
/home/pi/wyoming-satellite/examples/commands/synthesize_custom.sh
with the content

#!/usr/bin/env sh

text="$(cat)"
echo "Text to speech text: ${text}"

token='LLA_TOKEN'
echo "${token}"

curlData='{
  "event":"synthesize",
  "satellite": "snapcast-livingroom",
  "text": "'$text'"
}';
echo "$curlData" | jq '.'

curl \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $token" \
  -d "$curlData" \
  https://HASS_FQDN/api/events/satellite_tts

create yourself a long-lived access token and put it for LLA_TOKEN, and fill in the FQDN of your home assistant instance for HASS_FQDN.

Then, assuming you installed on an OS using systemd, make your

/etc/systemd/system/wyoming-satellite.service

look like so:

[Unit]
Description=Wyoming Satellite
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/home/pi/wyoming-satellite/script/run \
  --debug \
  --name 'snapcast-livingroom' \
  --uri 'tcp://0.0.0.0:10700' \
  --mic-command 'arecord -D plughw:CARD=Device,DEV=0 -r 16000 -c
1 -f S16_LE -t raw' \
  --snd-command 'aplay -D null' \
  --synthesize-command 'examples/commands/synthesize_custom.sh'

WorkingDirectory=/home/pi/wyoming-satellite
Restart=always
RestartSec=1

Restart=always

[install]

Note the synthesize-command-line and sending snd to null.

Now, anytime the satellite gets send synthesized text, it should create an event in home assistant with the name of the satellite and the text to be spoken. I use this information within node-red like so:

[{"id":"c57a4a8e82851f13","type":"server-events","z":"a533baf02563f05c","name":"","server":"2452f89c.1b7828","version":3,"exposeAsEntityConfig":"","eventType":"satellite_tts","eventData":"","waitForRunning":true,"outputProperties":[{"property":"payload","propertyType":"msg","value":"","valueType":"eventData"},{"property":"topic","propertyType":"msg","value":"$outputData(\"eventData\").event_type","valueType":"jsonata"}],"x":130,"y":400,"wires":[["8a0a12e62be88756"]]},{"id":"eb18cd0b8af1bf7d","type":"api-call-service","z":"a533baf02563f05c","name":"","server":"2452f89c.1b7828","version":5,"debugenabled":false,"domain":"tts","service":"speak","areaId":[],"deviceId":[],"entityId":["tts.piper"],"data":"{\"message\":\"{{payload.event.text}}\",\"media_player_entity_id\":\"media_player.snapcast_player\"}","dataType":"json","mergeContext":"","mustacheAltTags":false,"outputProperties":[],"queue":"none","x":540,"y":400,"wires":[["50f6c0501580df66"]]},{"id":"8a0a12e62be88756","type":"switch","z":"a533baf02563f05c","name":"satellite_name","property":"payload.event.satellite","propertyType":"msg","rules":[{"t":"eq","v":"snapcast-livingroom","vt":"str"},{"t":"eq","v":"snapcast-kitchen","vt":"str"}],"checkall":"true","repair":false,"outputs":2,"x":360,"y":400,"wires":[["eb18cd0b8af1bf7d"],["b142937ea983f692"]]},{"id":"2452f89c.1b7828","type":"server","name":"Home Assistant","addon":true}]

Basically listen for the event type “satellite_tts”, route by satellite and call the tts.speak service on whatever compatible media_player you like.
Hope this helps you along :slight_smile:

2 Likes

Thanks! Im going to try that this weekend.

So I am using a reSpeaker and a Pi 2 W. I setup everything according to the tutorial. What I am confused about is that the wake word specified in my systemd unit file is not being used. It seems to be overriden by the Assist pipeline

[Unit]
Description=Wyoming Satellite
Wants=network-online.target
After=network-online.target
Requires=wyoming-openwakeword.service


[Service]
Type=simple
ExecStart=/home/pi/wyoming-satellite/script/run \
        --name 'kitchen-voice' \
        --uri 'tcp://0.0.0.0:10700' \
        --mic-command 'arecord -D plughw:CARD=seeed2micvoicec,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' \
        --snd-command 'aplay -D plughw:CARD=seeed2micvoicec,DEV=0 -r 22050 -c 1 -f S16_LE -t raw'
        --mic-auto-gain 5 \
        --mic-noise-suppression 2 \
        --wake-uri 'tcp://127.0.0.1:10400' \
        --wake-word-name 'hey_jarvis'

WorkingDirectory=/home/pi/wyoming-satellite
Restart=always
RestartSec=1
[Install]
WantedBy=default.target

Instead of trigging on Jarvis it triggers on Alex which is defined in my pipeline:

What I thought was supposed to happen is that the wake word on the PI triggers and then sends the audio over to HA. What seems to be happening is that the audio is being sent directly to HA, and it is in charge of determining whether it should process the audio.

Any pointers?

I am not exactly sure what made the difference here. I ended up removing the satellite from HA. Then I adjusted the systemd file like this:

[Unit]
Description=Wyoming Satellite
Wants=network-online.target
After=network-online.target
Requires=wyoming-openwakeword.service
Requires=2mic_leds.service
[Service]
Type=simple
ExecStart=/home/pi/wyoming-satellite/script/run \
        --debug \
        --name 'kitchen-voice' \
        --uri 'tcp://0.0.0.0:10700' \
        --mic-command 'arecord -D plughw:CARD=seeed2micvoicec,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' \
        --snd-command 'aplay -D plughw:CARD=seeed2micvoicec,DEV=0 -r 16000 -c 1 -f S16_LE -t raw' \
        --snd-command-rate 16000 \
        --snd-command-channels 1 \
        --wake-uri 'tcp://127.0.0.1:10400' \
        --wake-word-name 'hey_jarvis' \
        --event-uri 'tcp://127.0.0.1:10500'
WorkingDirectory=/home/pi/wyoming-satellite
Restart=always
RestartSec=1
[Install]
WantedBy=default.target

I did find that the event-uri has or systemd complains about unknown arguments.

The device is now working as I expect

My guess is that (like me) you may have missed a system restart. Personally I am not confident with linux stopping and starting services, so do a full reboot just to make sure :wink:

I am not using event-uri, and I guess that it is not working for you either - so try removing that argument, restart the service, and see if that removes the error messages or stops the service. The trick is to change only one thing at a time - which becomes very tedious.

FYI, to prove that my system is no longer using Openwakeword on the HA machine, I disabled it there. When I am happy with this HA Voice Assist I will re-enable it for use by my RasPi Zero satellite

I have Pi and M5Atom. The Pi I use local wake word and notice that I have to say wake word, wait for it to activate then say command. With the AtomEchoM5, I can almost flow the wake word and request.
I assume the local wake word with Pi is the reason as the M5 is always streaming and HA/Wyoming is doing the processing.

I originally thought having local wake word would end up being better/faster, but now I am not totally convinced.

Except for running it as a satellite, is there a way to get a respeaker mic (I have the 4 array USB version) working directly connected to a RPI running HAA?

Mike do what works best for you in your situation. Local wake word reduces data over the network (more important if multiple satellites over wi-fi) - but at the expense of requiring more CPU at the satellite. You didn’t mention whether you are using one of the older slower RasPi models - or one of the faster more expensive models. There are always trade-offs.


You mean like was discussed in the Chapter 4 blog back in October 2023, and other forum topics ?

Just doing another fresh install, and come across a problem … so i will update my original post to use the correct branch of HinTak’s reSpeaker driver for the current kernel version.

The current (at time of writing) RasPi OS Bookworm (dated March 15th 2024) uses the v6.6 kernel.

Raspberry Pi OS with desktop

    Release date: March 15th 2024
    System: 32-bit
    Kernel version: 6.6
    Debian version: 12 (bookworm)

However it seems that the reSpeaker driver install currently defaults to the v6.1 kernel - which is correct for the older RasPi OS, now referred to as “Legacy” or “Bullseye”

Fortunately this is easy to fix.

On your Raspberry Pi use the “uname -r” command to find out which kernel version is currently running on your RasPi. The first two numbers are the kernel version. If it starts “6.1.” then the default branch is for you; but my newly updated RasPi 4 replied uname -r with “6.6.31+rpt-rpi-v8” so I have kernel v6.6

pi@raspi4:~ $ uname -r
6.6.31+rpt-rpi-v8

HinTak has provided branches in his gitbub repository named with the kernel versions, so it is simple to add the “–branch < version >” option to the git clone command:

git clone --branch v6.6 https://github.com/HinTak/seeed-voicecard.git

or to add a “git checkout < version >” command before running the install script:

git clone https://github.com/HinTak/seeed-voicecard
cd seeed-voicecard
git checkout v6.6
sudo ./install.sh

Still happening for me on my Voice Assist test RasPi4. Thinking that there may have been an update to wyoming-satellite or openwakeword in the past 6 months, I have started again to re-build from scratch.

Installed the respeaker driver usijng the github v6.6 branch; tested audio with arecord and aplay; installed and configures wyoming-satellite and wyoming-openwakeword. All good so far.

I speak the wakeword and hear the awake-wav sound. In HA > Wyoming > services the “Assist in progress” turns On and I speak my command. But “Assist in progress” does not turn off when I stop speaking. It seems to time out after about 15 seconds - and then I hear the done-wav sound, “Assist in progress” turns off, my command is recognised and actioned very quickly (so not because of HA on a slow CPU).

Possibly the microphone is picking up just enough noise to stop VAD (voice activity detection) from triggering ?

I have been distracted with other projects for several months … but now It’s time for me to find the best settings for my microphone … and that means also documenting, so others new users can short-cut that learning curve.

Alas, I am still using Rhasspy on 2 of my RasPi satellites because my HA Voice Assist wyoming-satellite is still taking 15 seconds before doing commands. There has been no improvement to HA code, and no response to the various github issues reporting this issue, in the past 4 months.

Hi,
i am currently working on making the installation of the raspy satellite software easy. I can setup a satellite with led and wakeword detection with docker-compose only the 2mic_hat installation for the driver is manual.

My target is to create a alexa like copy that works good. Currently i try to tweak the settings for better detection. Since the wake-word is local that step is working good and fast.

After reading that the rhasspy 2 is faster i wonder if i should focus on the multiroom audio for now.

Anyone interested in working on a selfmade alexa like device with raspberry pi zero 2w as base instead of the esp32?

It’s me. I always want to make satellite in pi zero 2w because i feel more comfortable working on python instead of esphome or andruino. But it seems that the wyoming satellite project is no longer being updated and people are focusing on the esp32-s3 voice kit

Florian i applaud your enthusiasm, and have gone through similar thinking myself … and decided it isn’t worth the effort at this time.

My current view is:

  • not sure if you are meaning Rhasspy (https://community.rhasspy.org) v2.5, Rhasspy v3 (not completed), which became the core of HA Voice Assist (which is the topic of this thread). Mike stated some time ago that he intends to update Rhasspy when he gets time - which I believe is mostly documentation on how to use it for non-HA uses.

  • The Rhasspy Raspberry Pi hardware options are overly expensive for limited functionality.

    • Raspberry Pi is a general purpose computer, and uses only a fraction of its CPU for the voice assistant and wakeword detection.
    • While Speech-to-text and Intent recognition can run on a satellite RasPi, it is not particularly suitable for the compute-intensive techniques used by Digital Signal Processing.
    • Driver for the seeed 2-mic HATs actually only uses 1 microphone and has none of the DSP (Digital Signal processing) magic we have come to expect from those big-name brands. HinTak has updated for new OS kernels, but no-one is interested in improving the code.
    • Conferencing speakerphones reportedly give good audio quality - but at a high price.
    • If someone already has a Raspberry Pi sitting around doing nothing, and a decent quality microphone, then it makes sense - but don’t spend money to go this route.
  • Mike and Paulus have talked (briefly) about an ESP32-S3 voice kit hardware device being developed by Nabu Casa; as @vunhtun says, this is the focus currently.

    • The ESP32-S3 has a co-processor and additional hardware instructions that make it suitable for AI and the maths required for Digital Signal Processing … without the overheads required to run a full Linux OS.
    • They mentioned the hope for this hardware to be released before the end of this year. They also want enough stock ready to ship at release so potential customers aren’t disappointed.
    • The big question is price. Inevitably it will be compared (on both quality and price) directly with the current generation voice assistant devices from huge corporations who have been subsidising production. Totally unfair comparison, but there are a huge number of HA users who don’t seem concerned about privacy when it comes to their voice assistants.
  • I expect that this new ESP32-S3 voice kit will instantly become the recommended hardware for new satellites. I personally intend to replace my RasPis running Rhasspy with this new voice kit as soon as I can afford to do so.

  • then there will be only a few people looking at RasPi voice satellite instructions. I guess that:

    • most of those people will be more experienced, and so can handle the current instructions.
    • those people left wanting to use RasPi with Rhasspy (or Wyoming as the new version seems to be called now) will not be using it as a simple voice assistant - but wanting to integrate its modules into other systems (including developing their own voice assistants). This will require a different, much broader, focus for the documentation.
    • as part of updating the documentation for using Rhasspy v3 / Wyoming on RasPi, the installation will probably change anyway to incorporate techniques used by the ESP32-S3 voice kit.

One of my Rhasspy 2.11 satellites is running on a Raspberry Pi Zero (not even the version 2). With a nice 3D printed case it can look like an alexa device … but without the various Digital Signal Processing algorithms being placed into the public domain we can’t get the same quality.

Hi @donburch888 i mean this kind of software: GitHub - rhasspy/wyoming-satellite: Remote voice satellite using Wyoming protocol.

I know that the current focus is the esp based hardware. I tested the current m5 stack software and for me it doesnt perform really good. And i am not sure if a “bigger” esp32 will. I am also missing multiroom-audio. If that changes on the 19th i may recap my decision but i think with a pi as a base for the hardware there is a lot more possible.

The rasperry pi zero 2w costs 17€ and the 2mic_hat costs 9€. For a good speaker and a small amp i payed around 15€. Thats a total of 41€. Almost the same price like a alexa.

The speaker is the same like the ones for the Bose Soundlink Mini.

The pi’s cpu is ~ 50% idle, thats ok for me. The wake word detection is the only thing running locally on that, all the other stuff is on my server.

I know that the 2mic_hat uses only one mic. I tested both systems and the mic from the 2mic_hat works much better.

I am thinking about creating a fork from the software mentioned above. I already created a working docker-compose file with docker files that work. I am currently fine tuning the settings.

Yep. That is the latest branch … and yes, Mike has previously commented that (a) he thinks the RasPi installation should be improved; and (b) that he intends to come back to Rhasspy when he gets some spare time. So I am sure he will appreciate your efforts.

Down here at the end of the world, in the land of Aus, a Raspberry Pi Zero 2 W + Adafruit Voice Bonnet for Raspberry Pi (reSpeaker HAT look-alike) already costs double the price of a basic google or alexa device (without adding power supply, case and speaker) for noticeably inferior performance. Here it makes sense only if someone already has the RasPi lying around unused. I am pleased to hear that is not the case in your part of the world.

I am curious that you seem to be dismissing the upcoming Voice Kit as just a “bigger” Atom M5 stack. I’m no expert on this, but I understand that the ESP32-S3 processor has additional instructions built into the CPU which makes it better - better even than a Raspberry Pi for the sort of computations required for digital signal processing and AI.

Nabu Casa are (presumably) addressing the BOX3 limitations … optimising hardware and firmware for voice assist, manufacturing at scale, and providing the ongoing support and development that real-world users need. I am eagerly awaiting the 19th to find out more; especially the price … which I guess will be somewhere between the google/alexa subsidised offerings, and the cost of a Raspberry Pi solution … with ironically worse performance than the big guys and better than RasPi.