The wyoming-whisper docker currently does everything on CPU even if a GPU is present. It would be great if GPU support could be added. The underlying software (faster-whisper) supports that.
I guess it doesn’t really help on a HA system normally because those usually run light hardware without GPU. But I offload whisper onto another server that does have one because it also does other AI tasks.
It already exists in a docker form, which you want because it as you said will likely be on a heaver server with your Ollama instance.
The vast majority of HA servers are not running GPU’s, so there needs to be a version like the one we have. Making an add-on version of this is kinda pointless, the generic Docker is a good way to go. Then add in all the GIANT overhead of drivers for all the possible and the Cuda Software. Ir’s currently out of scope.
Yeah I agree an addon like that is not useful. HA servers are almost always small. But does speaches also have wyoming support? I thought that was just standard faster-whisper which doesn’t. It only has the OpenAI API.
The reason it’s kinda important is that Home Assistant is really slow if you use a local LLM. Mine is very fast (when I talk to it in the web interface I get a response within a second or two) but still it takes HA 30 seconds to respond somehow. So any speedup helps. I don’t want to use cloud LLM or TTS/STT of course.