Remote voice assist Pipeline - Whisper

In this post, I’d like to show you how to run Whisper on a different computer, outside your Home Assistant server.

Many users run Home Assistant on a Raspberry Pi or a small PC. While Whisper (especially with fast and small models) can work on those setups, its performance is often suboptimal. Since Wyoming allows for a remote Whisper installation, I decided to test running Whisper on my MBP M1. In theory, it should run much faster, won’t consume resources from my limited Home Assistant host, and will still operate locally within my network.

In this post, I’ve included some screenshots with Spanish localization. I hope it’s easy enough to follow in other languages. Let’s get started!


Local baseline

First, I started with the Local pipeline as described in Getting started - Local - Home Assistant. I’m running Home Assistant on a computer with an Intel N100 (4 cores) and 16GB of RAM. Despite being more powerful than a Raspberry Pi, the results are not great:

slow local piper

The recording is about 3.2 seconds long, and it took 5.4 seconds to process.

Remote setup

I wanted to run Whisper on my MBP, which is quite powerful (M1 Max CPU, 65GB RAM). This guide assumes you already have git and Python 3.12 installed. I tried different Python versions, but at the time of writing, only Python 3.12 worked.

First, you have to clone GitHub - rhasspy/wyoming-faster-whisper: Wyoming protocol server for faster whisper speech to text system, which is a Wyoming server for Whisper:

git clone https://github.com/rhasspy/wyoming-faster-whisper
cd wyoming-faster-whisper

Next, set up the Python environment:

python3.12 -m venv .venv
source .venv/bin/activate

Then, install the dependencies:

python -m ensurepip --upgrade
pip install -r requirements.txt

Finally, start the server:

script/run --model tiny-int8 --language es --uri 'tcp://0.0.0.0:10300' --data-dir ./local --download-dir ./local

Note: This setup uses the tiny-int8 model. While this model is not great (at least for Spanish), I’ll use it for consistency to compare results later.

Keep the server running.

Home Assistant configuration

Go to the Wyoming Protocol integration in Home Assistant and add a new service:

Enter your IP address and use 10300 as the port:

Screenshot 2025-02-05 at 12.36.59

And that’s it! One last thing: I recommend renaming the integration. By default, it’s named faster-whisper, which is the same name as the local Whisper. Having both with the same name can be confusing.

Voice configuration

The next step is to create a Voice Assistant using the remote Whisper instance. Follow the tutorial in Getting started - Local - Home Assistant, specifically the section “Setup your assistant.” The only difference is that you should select the remote Whisper instance.

image

Testing it

Using the same sentence, I now get sub-second processing times:

image

Accuracy

With Whisper running on more powerful hardware, you can now use better models to improve accuracy. To change the model, modify the command used to start the server. For example:

script/run --model turbo --language es --uri 'tcp://0.0.0.0:10300' --data-dir ./local --download-dir ./local 

Of course, processing times will increase, but at least you can try different models to find the best balance between accuracy and speed for your needs.


And that’s it! I hope you find this guide helpful.

Other posts in this collection:

7 Likes

In this post, I’d like to show you how to run Piper…
while Piper (especially with fast and small models)
remote Piper installation, I decided to test running Piper on my MBP M1

It looks like you got sidetracked and made type in the component name in the first few paragraphs.

1 Like

tysm for your article now I only need a way to switch the preferred assistant if my pc switches online state

ok, im a total noob to this, tried doing this in powershell and when i paste the script/run this comes up… exactly how am i supposed to run this or are you using something else to run this

Thanks for these instructions! Took a bit to figure a few minor things out but it works! I need to add a systemd startup file but otherwise it works like a champ!

Edit: Here’s my systemd file:

[Unit]
Description=Wyoming Whisperer
After=network-online.target

[Service]
Type=simple
User=tim
Restart=always
WorkingDirectory=/home/tim/wyoming-faster-whisper
ExecStart=/home/tim/wyoming-faster-whisper/run.sh

[Install]
WantedBy=multi-user.target

This runs in a simple LXC container on my NAS so if running outside of a container, will probably want to adjust some things. Whiper should probably have its own system user, e.g.

1 Like

Thank you for this tutorial, I followed it and it worked well

I will just add that I ran this in a Linux VM and it would only run on CPU despite the VM having the appropriate drivers and CUDA Toolkit installed, and even despite the fact that I was able to run Ollama on GPU on this very VM. If you run into this scenario, the answer lies in the faster-whisper readme:

GPU execution requires the following NVIDIA libraries to be installed:

On the surface it appears that cuBLAS is installed with CUDA Toolkit, but apparently not. Installing these two by following those links and using Nvidia’s official installation instructions for both cleared the problem right up for me.

I’m getting 2-2.8 seconds for whisper small-int8 (bigger than tiny_int8), on CPU, and 0.1-0.7 seconds for the same model on GPU!

oh yes, and I almost forgot that to use GPU you must also add the --device cuda flag to your command to start the server, otherwise, as you can see in the source code here, it defaults to CPU

Did you ever figure this part out?

I’ve got a lightweight VM for running HAOS and I’m currently putting together a separate rack server with an old quadro in it for some lightweight local ai inferencing purposes and would like to use whisper on that when it’s powered up, but revert to the smallest lightest version on the HAOS VM otherwise.