Whisper: performances in self-hosted for French

Vosk is far more better than Whisper with tiny french models, and do real time stt on rpi4.

1 Like

Just a minute, You all are complaining the Whispher is slow in a Raspberry Pi CPU?

What did you expect? runnig a billion parameter Pytorch model on a tiny Pi CPU is going to be “Dead Dog Slow”. There is no hope of it ever runing acceptably fast on the Pi. Even if you moved to an Intel i7, Whisper would be slow

Whisper needs to run on a GPU. Even a low end Nvidia 1070 runs about 175X faster then a Pi. Higher end hardware is even faster.

People have benchmarked Whisper. You can run the small model in real-time on an Intel i3 with 8GB RAM but even the higher-end CPUs like the AMD Ryzen Pro can’t run the medium of large model in real-time. You need a CUDA enabled GPU to do that. Once you move to a GPU, you can process one second of speeach in less then one second.

Lookat every other voice assistant like Siri or Google. They run on server farms and these servrs have $20k Nvidia A100 GPUs installed.

https://github.com/rhasspy/hassio-addons/tree/master/vosk

I’m confirming that:
Whisper: 7 to 9 seconds recognition
VOSK: < 0.3 sec recognition
language: ru
hardware: ryzen 7 5800H, 16gb (minipc) (virtualbox: 6 core, 8 Ram), esp32-s3 + inmp441 (microwake world)
overall experience: instant response!

2 Likes

Vosk is indeed blazing fast ! Thanks for the tip !
Minimal example for docker install:

#...
  whisper:
    image: rhasspy/wyoming-vosk
    ports:
      - 10300:10300
    volumes:
      - ./vosk-data:/data'
    command: --data-dir /data

Just download the required model in ./vosk-data and you are all set, nothing more needed.
It can be even more optimized with corrected and limited sentences

1 Like