Raspberry Pi as a HA Voice Assist CHAPTER 4 satellite

Alas, that is where I am also at currently :frowning:
I have swapped Whisper to the largest available model (“small”) to improve accuracy … but Jarvis on the test RasPi in my study is like a teenager finding so many inventive ways to misunderstand “turn on the light”, and with 15 second delay from my finishing speaking until it processes the command.

Admittedly I have put it aside for a couple of weeks, so these issues may have been resolved already.

I am now eagerly awaiting the next chapter in a couple of days.

Can I ask what hardware you tried this on?
I have heard many mention 10-15 second delay after speaking. I have not witnessed this so I sm curious about it.
From the time I finish saying “what time is it?” To the time I get a response is about 3 seconds.

Oh! I see. The truth is that as it currently works, I wouldn’t call my interactions as something natural.

I improved the behavior by leveraging OpenAI and with OpenAI interactions are way more natural but HA refuses to interact with my home when OpenAI is being used.

Don, if you don’t mind, drop me a line here if you ever manage to get a better response from your system. I’m clearly in the same boat :slight_smile:

LOL this is funny. In my case my voice assistant refuses to tell me what time it is, when I ask that :slight_smile:

I wonder what may I have messed up in my assistant pipeline… I can turn lights on and off but time won’t work :slight_smile:

I am running HAOS in the only VM under proxmox on a used Optiplex 7050 (i5-7500T). VM uses 4 CPU threads and 16BG RAM. Wyoming, Whisper, Piper, OpenWakeWord are all installed as add-ons. Also ESPHome, Mosquitto, Node-RED and Rhasspy Assistant 2.5.11 are installed add-ons. Whisper uses the largest model available to us (ironically called “small” :wink: ) yet the whole VM rarely goes over 15% CPU usage even when processing voice.

At the satellite end, it’s a RasPi 3B with Raspberry Pi OS Lite 32-bit on a fresh micro-SD card. I am using a decent USB mic, and headphones connected to the 3.5mm audio jack. Both homeassistant-satellite and wyoming-openwakeword are running as services.

The hardware seems well up to the job. My guess is that there’s some setting not exposed in wyoming-openwakeword for the length of silence to signal the end of command.

I’m guessing that these Voice Assist satellites can’t also act as media players… Are you able to confirm?

I’m running homeassistant-satelita on a raspberry pi 3b+ which also acts as a shairport-sync player, that is, I’m able to stream music to it and it detects the wake word, plays confirmation sounds and then responds, so yes, you can.

When I try to get the list of mics/speckers I don’t see any I am using this kit/hat Version one using Pi 2
This looks like really nice kit for voice assistant if we can find the drivers

this is now old and archived maybe some can point me to the drivers

Year of the Voice chapter 5 has changed to using wyoming-satellite instead of homeassistant-satellite.

homeassistant-satellite will of course continue to work for those who have it installed, but apparently the newer Voice Assist features are only available with wyoming-satellite.

I personally am about to wipe my microSD card and do a clean install per Chapter 5. See you on the other side …

Is there a link to the updated rpi satellite setup with the Wyoming satellite?
I set my rpi as a satellite up a while ago and haven’t used it and would like to update it before using it again.

Personally I just wiped my SD card and started agin using the documentation and Mike’s excellent tutorial to build a satellite using a Raspberry Pi Zero 2 W and a ReSpeaker 2Mic HAT.

My notes are at Raspberry Pi as a CHAPTER 5 voice assistant. So far it has been much better for me … but still work for me to do before rolling it out through the house.

1 Like

Is it feasible to run a HA voice satellite as a service on an Raspi doing other things as well? I have a bunch of Volumio and Kodi clients, wich is probably not the smartest idea given that there’s potentially competing audio feeds. But what about my Retropie and/or PiMIGA clients as long as they’re not emulating a game with audio (they’re basically idle 99 % of the time) or my Dietpi server?

Chapter 4 and Chapter 5 on RasPi set up both the home assistant satellite and wakeword as separate service running at the same time in the background, so yes the computer can do other things as well. As long as the total of tasks that are running do not cause a bottleneck in CPU, I/o, or network.

There have been conversations about using the same RasPi as media player and voice assistant satellite … and one day I will look into doing that myself. In one way it would help to have both audio out and audio in on the same device - Echo cancellation subtracts the sound it is playing from what the mic is receiving, and the difference is what the human is speaking - but I understand there’s more work to get it to alexa/google quality. Also, if I set up my voice satellite in the kitchen to play music for my spouse it will have to be good quality opera sound :wink:

Excellent point!

Looking forward to future developments then!

A Raspi with a proper DAC/Amp HAT is a surprisingly hifi-capable renderer, check out some of the stuff on https://darko.audio/.

Maybe this is the right thread to ask about VAD (voice activity detection): my setup is basically working but I’m struggeling with too long cmd recording → 15 sec most of the time as it is not recognizing when my command is finished.
This of course leads to weak response times and also more processing time.
Anything I can do? I played around with --vad-threshold and --vad-trigger-level but no effect.
I also do not really understand, what these parameters mean.

I’m not too knowledgable in this area, but VAD is detecting that there is no volume when you stop talking.

I suggest you have a play with your microphone adjustments (volume, gain, etc); including recording some samples and play them back (with arecord/aplay or whatever you are using).

Possibly the volume from the mic is low enough that there’s not enough difference between talking or not; or a fair amount of background hiss or noise (which it thinks may be continued talking).

I have seen previous discussion about this, with conclusion that it’s affected by not just the type/brand/model of microphone being used - but also by where the microphone is located, and the audio characteristics of the room, including where you are speaking from and which direction you are facing.

Thx for your response.
I managed it now. Basically it really was a microphone issue. As soon as I switched on the debug recordings for all voice input I could hear the difference.

Hi there! Just took this same route today and wiped off my sd as well. Accuracy is still poor. Do you have somewhere documented your pipeline? are you using English to do stt?

See this output:

  • What time is it?
  • What date is today?

In both cases, the answer is: “I’m sorry but I don’t know any device by that name”

If you are using wyoming-satellite, my notes are at Raspberry Pi as a CHAPTER 5 voice assistant . Yes, I am using English.

Used your notes. Thanks for taking the time to document it!

what language model are you using? Do you have the whisper configuration that you are using, documented somewhere?

One community member posted his whisper conf in one of the responses in your thread: Raspberry Pi as a CHAPTER 5 voice assistant - #30 by Stratus

I have tested a few options and language comprehension of my setup is pretty low, and Spanish is quite a widespread language so I’m a bit confused as why my system is not understanding my prompts well enough. I’m actually using Mike’s suggestions to enhance audio experience, but still my results are lacking…

Thanks!