GPU support in wyoming-whisper docker

GP123 · March 12, 2025, 11:03pm

The wyoming-whisper docker currently does everything on CPU even if a GPU is present. It would be great if GPU support could be added. The underlying software (faster-whisper) supports that.

I guess it doesn’t really help on a HA system normally because those usually run light hardware without GPU. But I offload whisper onto another server that does have one because it also does other AI tasks.

Sir_Goodenough · March 13, 2025, 12:12am

It already exists in a docker form, which you want because it as you said will likely be on a heaver server with your Ollama instance.

The vast majority of HA servers are not running GPU’s, so there needs to be a version like the one we have. Making an add-on version of this is kinda pointless, the generic Docker is a good way to go. Then add in all the GIANT overhead of drivers for all the possible and the Cuda Software. Ir’s currently out of scope.

GP123 · March 17, 2025, 12:57pm

Yeah I agree an addon like that is not useful. HA servers are almost always small. But does speaches also have wyoming support? I thought that was just standard faster-whisper which doesn’t. It only has the OpenAI API.

The reason it’s kinda important is that Home Assistant is really slow if you use a local LLM. Mine is very fast (when I talk to it in the web interface I get a response within a second or two) but still it takes HA 30 seconds to respond somehow. So any speedup helps. I don’t want to use cloud LLM or TTS/STT of course.

GP123 · March 28, 2025, 10:22pm

Just for others to be aware in case you find this question: Instead of Speaches on its own I used this docker: GitHub - roryeckel/wyoming_openai: OpenAI-Compatible Proxy Middleware for the Wyoming Protocol

It provides the Speaches docker and also Kokoro for TTS. It works great and smoothly. It’s so much faster now that I can offload STT and TTS to a GPU! Home Assistant responds really quickly now.

This docker provides both wyoming (as a proxy to openai) and openai endpoints so you can have both.

SteveGui · April 24, 2025, 7:42pm

This is exactly what I’m trying to do as well. But having trouble understanding the underlying architecture.

Ideally I think it would be nice to have each component separate and independent for updating purposes.

I was thinking like:

ollama for models
openui for interface
whisper on CUDA
kokoro on cuda
wyoming protocol

This way, i’d like to have ollama, openweb ui, whisper, kokoro, wyoming just running, but then all of them can leverage each other.

I guess in some way, I’m hoping to have the “Openweb ui” for Wyoming protocol.

Looking at it, it would seem that Speaches, is the abstraction or Ollama equivalent. Where it can eventually hold different models for TTS STT. I’m not sure how it can take on new models if/when they come up like Ollama. So I think that would be fine if Speaches houses Whisper and Kokoro.

But trying to find that wyoming layer so that HA can call to it and it can all to Speaches. Is this what the Wyoming-openai github is doing?

mib1185 · April 24, 2025, 10:16pm

I’ve created a docker container which includes faster-whisper + wyoming-server and the download+install of nvidia cuda drivers → GitHub - mib1185/wyoming-faster-whisper-cuda: This takes the wyoming-faster-whisper and wraps it into an nvidia cuda supported container.

SteveGui · April 25, 2025, 2:28am

Thanks for that docker container!
I started with just getting kokoro via fastapi working, but of course there was no wyoming protocol to connect to homeassistant.

Then i used @GP123’s roryeckle wyoming_openai. I have kokoro running as one of the dockers (removed the prior container). The wyoming openai docker is working, but when I go to HA, I can’t see kokoro or speaches.

I’m editing the docker compose yaml, so that it points to localhost and will try that out first. Otherwise, if that doesn’t work out, I’ll try yours just for the whisper faster cuda option

SteveGui · April 25, 2025, 3:06am

roryeckle’s wyoming_openai containers work nicely.
I’m able to use kokoro for my tts, and speaches activates whisper large to good effect.

a couple of limitations:

would love text generation to be read as it is being generated by tts, the overall speed for everything is great independently, but using gemma3 and deepseek, they do generate abouta paragraph of text quickly. but not instantaneously. so it’s all quite slow waiting for it
i like to play with different language voice packs speaking english, however, neither openwebUI nor Homeassistant seem to be able to force english on the voices like it is in native kokoro. Would love it if someone know how to solve that…

Thanks all!