Best combination for Voice Assistant performance

TheStigh · October 16, 2024, 7:14pm

Hi all,

What is your experience for fastest reply on any voice command, related to combinations of Conversation Agent, STT and TTS.

Thinking of replies to questions, and performing actions like turning off a light?

I’m still not even close to get the same response time turning off a light comparing to Google Home.

Let me hear your experiences.

vincen · October 17, 2024, 1:10pm

Hi

Well my tests using the integrated Assist have been quite nice, it’s quite fast to answer and process all locally (I have used Vosk not Whisper as I use it in french
Tested with Respeaker 2 Lite board from Seeedstudio !

Vincèn

TheStigh · October 17, 2024, 7:57pm

What do you use for TTS ?

baudneo · October 17, 2024, 8:42pm

Piper, even on cpu is blazing fast for tts (only English tested). I use piper via Wyoming protocol.

I use whisper (CUDA accelerated) for STT and it is fast. It’s on par with Google and Alexa for response times (and is totally local). I am also using respeaker lite xmos board with a Xiao esp32 s3.

TheStigh · October 17, 2024, 10:15pm

Very interested to learn about the CUDA accelerated way - never knew about it, could you spare couple of minutes and explain what you’ve done ?

baudneo · October 17, 2024, 10:57pm

I have a custom setup from before it was easy to do it, but now, I think it’s really easy. When I get home I’ll find the repo and instructions.

It’s a docker based solution using nvidia-container-toolkit to pass the GPU to the docker containers. It has options to use different STT and tts backends. Some have CUDA accel, some don’t.

Has wake word, STT and tts containers available. However, I would recommend using an esp32 s3 based assistant solution, which will allow wake word on device, rather than streaming 24/7 audio to the wake word host over the network.

baudneo · October 18, 2024, 12:29am

Switch to the GPU branch and build from it.

TheStigh · October 18, 2024, 9:53am

Cool! Are you running Whisper and Piper as containers on an external device or on the same physical HW as HA ?

baudneo · October 18, 2024, 12:00pm

Same physical device compartmentalized into VMs and lxc containers. So a different Linux host via virtualization, but same physical host.

TheStigh · October 18, 2024, 12:02pm

What type of Nvidia GPU are you using ?

baudneo · October 18, 2024, 12:18pm

For whisper I’m using GTX 1660 ti, more than enough compute and ram to handle that work load.

TheStigh · October 18, 2024, 12:20pm

Do you run Piper on the same GPU ?

baudneo · October 18, 2024, 12:27pm

Piper no (cpu only and it’s still around 0.1s for processing, not much to gain from acceleration), wake word no, as the esp32 s3 has wake word on device.

My GPU is also doing RTSP h264 CUDA decoding for my zoneminder install and also runs my custom ml framework for zm doing object detection using yolo-nas, yolov10 and yolov8. Which goes to show that cheap older hardware can get you all sorts of goodness using ML/DL.