Wanted to share a success story for Voice Assistant utilizing Wyoming Satellite and the Assist pipeline. I’ve got a pretty nice little satellite setup now around the house that’s improving daily as I add additional intents and sentences. Thought it would be useful to note some of the hardware and software choices and pitfalls I ran into throughout the process in the hope it saves somebody else some time.
First, the final product:
The speaker, mic array and pi are held together with 3M Scotch fasteners, so I can pick the whole unit up by the speaker, but also detach the three individual parts for easier setup/access.
Second, the pieces of the puzzle:
rpi4B with power supply and case (~$40 USD total)
SD card ($5 USD)
Jeecoo Speaker A10 ($12)
Ground Loop ($8)
Respeaker 4 Mic Array with case (~$70)
Obviously this isn’t a “low cost” hardware option. I did setup the ATOM Echo and try a few other lower cost solutions, including an rpi zero 2 and other mic options, but found a variety of challenges that eventually led me to this stack. I’ll also note that while I had a couple of Pi’s floating around the house from other projects, it was MUCH easier to start with a fresh SD card. The Pi4 is also overkill for the hardware, but it kept things running smoothly and I could actually get one in stock unlike the slightly cheaper (<$10 diff) Pi 3 which have been harder to acquire. The speaker works great and has a volume knob for manual adjustment which has come in handy.
The Respeaker mic-array was the big spend, but for me was absolutely worth it. The hardware has been great, audio quality/pickup is the closest I’ve seen to the commercial products and honestly I haven’t seen anything else even in the vicinity.
I utilized this tutorial, but with the hardware mentioned above.
To start, I used the rPi installer from here to load the 64-bit OS light onto the SD card. This is critical as its ambiguously written in the instructions as to if the 64-bit is required for VAD only, or for anything to work properly. I found I couldn’t get it cleanly running with Wyoming Satellite at all without the 64-bit setup, though could run the (soon to be deprecated?) HomeAssistant-Satellite on the 32-bit OS.
After loading the OS, I plugged in the Pi and booted up. Important note here, I did attempt the rPi Zero 2 W per the tutorial but found the 2.4Ghz wifi band in my house to be noisy enough that I could barely sustain an SSH session. Given my STT options and how I was expecting to use the satellite, this was a non-starter. The better Wifi experience from the beefier hardware eliminated so many potential issues, I felt like the extra $20 or so was worth it.
Following the tutorial instructions, I installed Wyoming Satellite and OpenWakeword and plugged in the reSpeaker array and speaker. Important note: make sure the speaker plugs into the mic array as it allow the onboard reSpeaker hardware to deal with feedback/etc. Tested both with aplay/arecord on the hardware and they were entirely plug-n-play, no additional configuration required. Setup the services per the instructions to ensure they would be available on-boot.
The Wyoming Protocol in HA immediately detected the new satellite and provided a device and some very nice entities for it out-of-the-box. I didn’t need to mess with that configuration significantly as the Respeaker array also has some built-in noise suppression capabilities. I intend to continue to tune it over time, but the OOB experience was good enough to put it in my living room and not get yelled at by my wife for false positives or negatives from the unit (the ultimate test).
For my Assist pipeline, I have two configured, one wired to an OpenAI integration per this tutorial, and the other a standard HA Pipeline for controlling the home. Obviously the eventual goal is to merge them, but I haven’t found a solution I personally like for that yet (I know there are a few out there, and they look great, just haven’t spent enough time exploring it yet). With the OpenAI pipeline, I did add some custom instructions to keep replies to 2-3 sentences to keep from timing out Piper TTS. Speaking of…
I found Piper to be really solid. Not absolutely perfect, but for a completely local TTS model, its been more than sufficient. Plus my younger kid likes playing with the various voice configurations, so that’s a nice little bonus.
I tried like crazy to get an acceptable Whisper model working locally and just couldn’t make it work. It was either too slow, or in the case of the OpenAI pipeline, just couldn’t understand enough of my speech to be useful. I decided to just go with a cloud-based STT solution and am unbelievably happy I did - honestly wish I had saved myself the hassle and just started with this approach, but you live and learn. The performance is excellent, it handles everything I’ve thrown at it, and despite hours and hours of testing this month already, I’ve racked up a grand total of $1.73 in charges. I’m using Googles STT and get <1s latency and near-perfect results. I used this integration on HACS.
Now, its just a fun software challenge. Happy to share more on my sentence implementations or configurations if useful, but I’ve been having a blast “teaching” my assistant new tricks regularly. I still need to tweak the Wakeword parameters (a few too many false activations throughout the day, not crazy, but not perfect), tune the overall audio better (some sentences are occasionally mis-interpreted), and play with the LEDs on the speaker unit (they do automatic VAD and DOA visuals OOB, but I’d love to have them show wake/thinking as well), and improve my cable management behind the unit.
Some things I’m hopeful for from future releases:
- A ‘failover’ mode for Pipelines. If the standard HA pipeline doesn’t understand the command, kick it over to the OpenAI pipeline for further interpretation or response
- Ability to “hold” a conversation - e.g. receive responses and continue in the same conversation context
Those two alone would give me a clean-path to a fully competitive solution to the commercial units (Google Home/Alexa) that I refuse to have in my house.
Huge thank you to @synesthesiam and team. I had been following Rhasspy for a couple of years (Mike helped me with some troubleshooting almost two years ago on that front!) and was over the moon when he joined the HA team. The future is very bright indeed!