Vosk is far more better than Whisper with tiny french models, and do real time stt on rpi4.
Just a minute, You all are complaining the Whispher is slow in a Raspberry Pi CPU?
What did you expect? runnig a billion parameter Pytorch model on a tiny Pi CPU is going to be “Dead Dog Slow”. There is no hope of it ever runing acceptably fast on the Pi. Even if you moved to an Intel i7, Whisper would be slow
Whisper needs to run on a GPU. Even a low end Nvidia 1070 runs about 175X faster then a Pi. Higher end hardware is even faster.
People have benchmarked Whisper. You can run the small model in real-time on an Intel i3 with 8GB RAM but even the higher-end CPUs like the AMD Ryzen Pro can’t run the medium of large model in real-time. You need a CUDA enabled GPU to do that. Once you move to a GPU, you can process one second of speeach in less then one second.
Lookat every other voice assistant like Siri or Google. They run on server farms and these servrs have $20k Nvidia A100 GPUs installed.
https://github.com/rhasspy/hassio-addons/tree/master/vosk
I’m confirming that:
Whisper: 7 to 9 seconds recognition
VOSK: < 0.3 sec recognition
language: ru
hardware: ryzen 7 5800H, 16gb (minipc) (virtualbox: 6 core, 8 Ram), esp32-s3 + inmp441 (microwake world)
overall experience: instant response!
Vosk is indeed blazing fast ! Thanks for the tip !
Minimal example for docker install:
#...
whisper:
image: rhasspy/wyoming-vosk
ports:
- 10300:10300
volumes:
- ./vosk-data:/data'
command: --data-dir /data
Just download the required model in ./vosk-data
and you are all set, nothing more needed.
It can be even more optimized with corrected and limited sentences