Hi all,
After receiving my Voice PE I found myself wondering which Faster Whisper model I should be using, but found no good information on their performance requirements. To understand what models I should be trying, I wrote a quick benchmark script.
Results
It is not adviced to reduce beam width due to insignificant effect on speed.
The default base-int8
model in Home assistant is a good choice, but I will personally experiment more with the base.en
and small-int8
models.
Repository
Test Method
Six voice recordings from a Home Assistant Voice Preview Edition were captured.
To capture voice recordings add the following to configuration.yaml
assist_pipeline:
debug_recording_dir: /share/assist_pipeline
The recordings are spoken in English with a Finnish accent and include the following phrases:
Turn off the lights
Set the lights to maximum brightness
What's the temperature?
What's the temperature of the heat pump?
Turn the lights to half brightness
Turn off the lights
These recordings were processed using the whisper models available in the Home Assistant Voice pipeline, with various beam width settings. Note that the large
and turbo
models were too slow to be relevant for the results. The processing time for each configuration was averaged and recorded.
Test Systems
Hardware
- System 1:
Intel i7-1360P
- 16GB 4800 MT/s dual channel RAM
- System 2:
Intel N100
- 32GB 3200 MT/s single channel RAM
Host Environment
- Host OS:
Proxmox
- Virtual Machine:
Ubuntu Server 24.04
- Full allocation of CPU cores
- CPU type: Host
- 12GB of RAM