Run whisper on external server

That looks like a line ending problem, i just had the same thing.
To check:
Open the wyoming-addons-gpu/whisper/run-gpu.sh file in a suitable editor (i suggest Notepad++) and check the line endings. I’d wager your git client was set to check out windows style (CR LF) rather than unix style (LF). This causes each flag in the shell file to be interpreted as a separate command (which obviously won’t be found).

You can fix this either with your text editor (for Notepad++: Edit>EOL Conversion>Unix (LF)), or with git (from within the wyoming-addon-gpu directory):

git config core.autocrlf input
git config core.eol lf
git rm --cached -r .
git reset --hard

Warning: If you have any changes you’d like to keep, back them up before you start

If you want those settings to be global and apply to all repositories you check out on that machine in the future, add the --global flag to the git config commands.

I would recommend the git approach as more files may be affected, and this way you don’t have to find and fix them one by one

After a LOT of pain and suffering I’ve learned that the faster-whisper 1.0.0 dependencies are a bit of a mess for CUDA support.

I’m running Nvidia Drivers 525 with CUDA 11.8 which seems to be the recommended environment for (wyoming)-faster-whisper but when you try and use the --device CUDA option rather than --device CPU, the service starts without throwing an error but then fails looking for Library libcublas.so.12 which is a CUDA 12 lib.

I’m so close to getting local whisper and piper running in CUDA but this is grinding me into the ground with frustration. Has anyone found a way to overcome this issue?

useful links

You just need to get the required files and map them into the container.

My working example here; Home-Automation/Voice-Assistant at main · Fraddles/Home-Automation · GitHub

Interesting to see that CUDA does not accelerate Piper significantly. I was about to begin the journey of Piper + CUDA12 on Docker (WSL2) but you may have changed my mind.

Do you know if Piper retained the model in VRAM?
Just trying to better understand why the performance difference is so small.

Depends on your cpu, more power = less time. I imagine by now they have all the plumbing figured out, CUDA may be worth it now.

Easy way today - use original whisper.cpp from ggerganov if you have GPU and OpenAI API for home assistant plugin.

i test and adopted it now . No overhead, very fast, really very.
plugin and some instruction : GitHub - neowisard/ha_whisper.cpp_stt: Home Assistant Whisper.cpp API SST integration

For those who are running AMD hardware, I put together a container that runs faster whisper with ROCm support. I don’t think this was possible until very recently, so I think I may be among the earliest to implement it for my setup. I figured sharing it here may help some people get the best performance possible.

wyoming-faster-whisper-rocm

2 Likes

I hope the below helps someone as it took me ages to find a fully working x86-64 version.
Make sure you have cuda-container-toolkit and drivers installed.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

This docker-compose.yaml worked for me with CUDA acceleration and wyoming support.

services:
  faster-whisper:
    image: lscr.io/linuxserver/faster-whisper:gpu
    container_name: faster-whisper-cuda-linux
    runtime: nvidia
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/London
      - WHISPER_MODEL=medium-int8
      - WHISPER_BEAM=1 #optional
      - WHISPER_LANG=en #optional
    volumes:
      - /root/.cache/whisper:/config #beware of your path could be different
    ports:
      - 10300:10300
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2 # I have two CUDA capable cards
              capabilities:
                - gpu
networks: {}

Check my comment Run whisper on external server - #120 by alienatedsec , where the response time is around a second.

Has anyone used a Jetson since HA/Nabu has been working with them and they got everything moved to GPU/CUDA based regarding voice, among other things? The 8GB models dropped in price but really having a hard time finding any feedback and obviously a bit technical to setup but want to setup HA Core on a Jetson for testing. In the long run I would prefer to (eventually) run everything on one box my and you can run HA Core on the Jetson now

For us using intel machines with no GPU other than the integrated one (intel 12th gen Xe iGPU), what’s the fastest way of running whisper?
The small-int8 is the first one that kind of works, any model smaller than that is just crap. And even the small one is just meh.
A simple sentence like “Turn on the kitchen lights” takes 2.3s on small-int8 which is on the verge of being usable, 3.3s in small which feels to slow and 6.5s in medium-int8, which is maddening.

I tried running inside a dedicated container instead of as an addon but I didn’t see any noticeable speed improvement (~0.1s).
Is there some configuration on the docker or any alternative version of docker-image that would run faster without having an nvidia or amd GPU?

No, but the good news is, a GTX 1660 ti works and are about $100 cad used. Won’t do LLMs, but is good enough for this.

Yes, but the bad news is that my server is an intel NUC, so adding a GPU is not an option.
I was hoping for external m.2 TPU accelerators like Hailo 8 or similar boards would become popular enough.

Can always x1 PCIe lane to external enclosure. Depends how bad you want it.

I don’t know yet. I also care for power consumption a lot. My server sips ~6-9w when mostly-idle (which is 98% of the time for a home server). I can imagine adding an nvidia GPU would easily 5x that number.

Again, depends how bad you want it.

Sometimes, you have to have less than optimal set ups to accomplish bleeding edge tech

If you want to wait for power efficient, edge processors that have full support with whatever stack you want to use, that’s fine.

You asked specifically about if it were possible l, today, without a GPU. I simply gave you the information you requested. Hailo 8 doesn’t seem supported for this use case, but I could be wrong (memory will be the biggest issue)

And I appreciate it. It is a shame that Xe intel iGPUs are not supported. They are fairly decent actually. Maybe there are developments in the future. I’ve seen some info about a pytorch extensions with HW acceleration for intel xe graphics.

Actually, it won’t increase a lot of energy consumption because the GPU doesn’t work most of the time and only works when you’re talking to your assistant. The standby power of GPU is approximately 6W-9W. If you choose a GPU with 12GB VRAM, you can allow high-quality STT and LLM locally. 3060 is the best choice because it is very affordable and has a large number of VRAMs available.