I’m struggling to find a reliable way to cut the cord from google home minis.
I spent time and money to test some platform but at the end the experience is still poor.
I mainly use a HA Voice PE to do my test so at least mic should be good.
For the STT i tested
Whisper = slow and poor recognition
Vosk = better experience but still not enough
I then moved the pipeline from my Celeron Nuc to my Xeon media server (a dell t20) and tried again the 2 STT named above but still recognition was poor.
I then added a Nvidia GeForce GTX 1660 6GB and used whisper:gpu but still time for recognition is high and quality is poor.
Lately I found whisperx and other container but my knowlegde is limited so I’m not able to build a container from scratch.
There’s some good samaritan that can support me in this journey. I don’t want to lose this fight.
I get great results with whisper-large-v3-turbo. But you may not have enough vram. The big whisper versions are very good and also a bit better with Italian than English.
The installation method does not affect the quality of speech recognition. Running a local service, docker or cloud provider - whisper (on the same model) will produce identical results.
Your problem can be divided into two components.
The increased processing time is probably related to the installation method, it is difficult to give any advice.
As for the recognition quality, you can check this by temporarily using the cloud integration by selecting an identical model. This way you can find out if there is a problem with whisper in general, or only with local installation…
Are you sure about that? In my idea the more calculation power I have the more speech is captioned correctly.
For example ALL the italian user that have cloud report a good quality in recognition. And in my idea the tool used should be the same (whisper) so the different in result can only be a consequence of the configuration.
Hi man, trying to set up whisper with gpu on my computer running a rtx 3060.
How did you expose the gpu to the container?
Been trying literally every night for 1-2 hours and all i get is: “CUDA failed with error named symbol not found”. I dont understand what im missing here.
The installation method and the performance of the gpu are different things.
That’s a valid point, but your graphics card offers strong enough performance (processing short phrases in about one second) to deliver latency comparable to cloud-based solutions.