Run whisper on external server

baudneo · January 24, 2024, 4:46pm

What arch and what cpu are you running on?

What processing times are you getting using cpu piper? Go to your assist settings, click on your GPU accel assist pipeline and up in the right hand corner is a 3 dot menu, click that and hit “Debug”. That will bring up the debug stuff and it will show you processing times for each step.

The docker build error stuff doesn’t really need testing on other amd64 systems, it’s something to do with how piper links to onnxruntime shared libs. It has 1 onnxruntime shared lib it links to in /usr/share/piper. When piper-phonemize is pulled in to build piper, it pulls in onnxruntime .tar file and builds using that. So, when CUDA is called for piper, piper uses the onnxruntime .so file BUT, the CUDA execution provider shared lib wasn’t pulled in during build. When I manually link onnxruntime-gpu CUDA provider shared lib that is installed using pip install onnxruntime-gpu I get a crash or a seg fault with a core dump.

So my assumption is that the CUDA provider shared lib isn’t being bundled with piper-phonemize. I need to figure out how to get those shared libs bundled.

I also run my docker host in a proxmox lxc with GPU passed through.

mchk · January 29, 2024, 1:31pm

There are many variables that affect the end result.
But as for the 1070, according to my measurements, it adds ~20w of consumption to the system. Using 550w gold PSU.

python · January 31, 2024, 4:02am

Hello I noticed the same problem with my RTX4070 - do you have any concept how can we try to fix it? Which part of the system is responsible for this?

Now I need to start asking super quickly after wakeup word which is pretty annoying

Fraddles · February 1, 2024, 2:39am

The GPU box has been out of action for a bit… pulled it apart to add some bits… Went to fire it up a few days later and no boot… 2 days chasing hardware issues for what turned out to be a lurking Grub issue… That’s what I get for using Debian Testing I guess

Should have it all back up and running in the next day or so…

Cheers.

toky · February 4, 2024, 8:28am

For the folks that have whisper/piper/openwakeword running “outside” home-assistant, how did you add the integration?

When I try to configure the integration it asks for a “Host” and a “Port”. No matter what I try, it is NOT liking it and its not connecting. Sadly I can’t find any error or entry on any log that can help me trouble shoot this.

I use Home Assistant Blue.
I have whisper up and running (on a different machine but in the same LAN).
From the HA Blue (through the terminal integration) I can connect to the port of whisper (nc -zv whisper.int.MYDOMAIN 10300) so I know DNS and Network connectivity are not a problem.

I can start a new thread but my question seems related to this thread’s purpose.

Fraddles · February 4, 2024, 10:24pm

It went through pretty easy for me…
Settings → Devices & Services → Wyoming Protocol → Add Entry

Enter hostname; whisperbox.network.home*
Enter port; 10300
Works!

My Compose file can be found here; https://github.com/Fraddles/Home-Automation/tree/main/Voice-Assistant.

I am currently testing Unraid and have successfully deployed it on there with GPU support also. So far I have only tried this with Whisper…

Have you tried using the host’s IP address instead of hostname?
Checked that you are exposing the correct port?
Can the whisper host connect BACK to the HA host?
Can you get it working with everything running (slowly) on the HA Blue?

Cheers.

CobraPhil · February 8, 2024, 5:54am

I could really use some help.
I am a newbie to Home Assistant and Docker.
Trolling through YouTube and this forum I was able to get HA installed and running.
My machine is an AMD Epyc 7551p with 128GB and RTX 3080 ti.
It is running VMWare Esxi. I have HaOS installed as a VM and Win11 installed as a VM with the GPU shared to Win11. On the Win11 VM, I have LM Studio installed. I have a Voice Assistant setup and running well with the default Wyoming pieces in place. All that is working well. The GPU is being used by LM Studio.
Now, I want to use the wyoming-addons-gpu to get Whisper, Piper, and WakeWord running on the GPU as well. I have Docker Desktop installed on the Win11 VM. I have followed the instructions provided by baudnew. Basically:

git clone GitHub - baudneo/wyoming-addons-gpu: Docker builds for Home Assistant add-ons using Wyoming protocol
cd wyoming-addons-gpu
git checkout gpu
docker compose -f docker-compose.gpu.yml build --no-cache
docker compose -f docker-compose.gpu.yml up
But my containers continue to restart. Looking at the Whisper logs I see:

2024-02-08 15:36:32 main.py: error: the following arguments are required: --model, --uri, --data-dir
2024-02-08 15:36:32 /run.sh: line 3: --uri: command not found
2024-02-08 15:36:32 /run.sh: line 4: --data-dir: command not found
2024-02-08 15:36:32 /run.sh: line 5: --download-dir: command not found

Piper and Wakword logs are similar; they are missing the environment variables. I have tried to put them in various places and rebuild the containers, but I get the same error messages.

Like I said, I am a Docker newbie. So, simple and detailed help would be appreciated. Thanks.

AdrianDeWinter · February 23, 2024, 12:47am

That looks like a line ending problem, i just had the same thing.
To check:
Open the wyoming-addons-gpu/whisper/run-gpu.sh file in a suitable editor (i suggest Notepad++) and check the line endings. I’d wager your git client was set to check out windows style (CR LF) rather than unix style (LF). This causes each flag in the shell file to be interpreted as a separate command (which obviously won’t be found).

You can fix this either with your text editor (for Notepad++: Edit>EOL Conversion>Unix (LF)), or with git (from within the wyoming-addon-gpu directory):

git config core.autocrlf input
git config core.eol lf
git rm --cached -r .
git reset --hard

Warning: If you have any changes you’d like to keep, back them up before you start

If you want those settings to be global and apply to all repositories you check out on that machine in the future, add the --global flag to the git config commands.

I would recommend the git approach as more files may be affected, and this way you don’t have to find and fix them one by one

griffindodd · March 24, 2024, 1:34am

After a LOT of pain and suffering I’ve learned that the faster-whisper 1.0.0 dependencies are a bit of a mess for CUDA support.

I’m running Nvidia Drivers 525 with CUDA 11.8 which seems to be the recommended environment for (wyoming)-faster-whisper but when you try and use the --device CUDA option rather than --device CPU, the service starts without throwing an error but then fails looking for Library libcublas.so.12 which is a CUDA 12 lib.

I’m so close to getting local whisper and piper running in CUDA but this is grinding me into the ground with frustration. Has anyone found a way to overcome this issue?

mchk · March 24, 2024, 2:32am

useful links

Fraddles · March 26, 2024, 9:10pm

You just need to get the required files and map them into the container.

My working example here; Home-Automation/Voice-Assistant at main · Fraddles/Home-Automation · GitHub

PureeTofu · June 27, 2024, 6:37pm

Interesting to see that CUDA does not accelerate Piper significantly. I was about to begin the journey of Piper + CUDA12 on Docker (WSL2) but you may have changed my mind.

Do you know if Piper retained the model in VRAM?
Just trying to better understand why the performance difference is so small.

baudneo · June 27, 2024, 8:34pm

Depends on your cpu, more power = less time. I imagine by now they have all the plumbing figured out, CUDA may be worth it now.

python · July 2, 2024, 12:16am

neowisard · July 3, 2024, 2:28pm

Easy way today - use original whisper.cpp from ggerganov if you have GPU and OpenAI API for home assistant plugin.

i test and adopted it now . No overhead, very fast, really very.
plugin and some instruction : GitHub - neowisard/ha_whisper.cpp_stt: Home Assistant Whisper.cpp API SST integration

Donkey545 · August 17, 2024, 7:49pm

For those who are running AMD hardware, I put together a container that runs faster whisper with ROCm support. I don’t think this was possible until very recently, so I think I may be among the earliest to implement it for my setup. I figured sharing it here may help some people get the best performance possible.

wyoming-faster-whisper-rocm

alienatedsec · September 10, 2024, 8:41pm

I hope the below helps someone as it took me ages to find a fully working x86-64 version.
Make sure you have cuda-container-toolkit and drivers installed.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

This docker-compose.yaml worked for me with CUDA acceleration and wyoming support.

services:
  faster-whisper:
    image: lscr.io/linuxserver/faster-whisper:gpu
    container_name: faster-whisper-cuda-linux
    runtime: nvidia
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/London
      - WHISPER_MODEL=medium-int8
      - WHISPER_BEAM=1 #optional
      - WHISPER_LANG=en #optional
    volumes:
      - /root/.cache/whisper:/config #beware of your path could be different
    ports:
      - 10300:10300
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2 # I have two CUDA capable cards
              capabilities:
                - gpu
networks: {}

alienatedsec · September 10, 2024, 8:44pm

Check my comment Run whisper on external server - #120 by alienatedsec , where the response time is around a second.

ginandbacon · September 21, 2024, 3:49am

Has anyone used a Jetson since HA/Nabu has been working with them and they got everything moved to GPU/CUDA based regarding voice, among other things? The 8GB models dropped in price but really having a hard time finding any feedback and obviously a bit technical to setup but want to setup HA Core on a Jetson for testing. In the long run I would prefer to (eventually) run everything on one box my and you can run HA Core on the Jetson now

cibernox · September 25, 2024, 10:47pm

For us using intel machines with no GPU other than the integrated one (intel 12th gen Xe iGPU), what’s the fastest way of running whisper?
The small-int8 is the first one that kind of works, any model smaller than that is just crap. And even the small one is just meh.
A simple sentence like “Turn on the kitchen lights” takes 2.3s on small-int8 which is on the verge of being usable, 3.3s in small which feels to slow and 6.5s in medium-int8, which is maddening.

I tried running inside a dedicated container instead of as an addon but I didn’t see any noticeable speed improvement (~0.1s).
Is there some configuration on the docker or any alternative version of docker-image that would run faster without having an nvidia or amd GPU?