Home Assistant add-on that uses onnx-asr for speech-to-text.
Notably, provides access to the NVIDIA NeMo Parakeet-TDT model which should be significantly faster and more accurate than Whisper for English in most cases.
Faster and better speech to text
This addon provides an English language voice recognition service which is (in theory) both better than the biggest whisper models and nearly twice as fast as even the smallest whisper model! The only drawback is it needs around 2.5GB of RAM.
This addon also supports whisper models, which can be used for other languages. It seems to be slightly faster than wyoming-faster-whisper for some models, particularly whisper-base.
This means it should be a drop-in replacement for most users!
The following benchmarks were performed on my Ryzen 5 5600X, with the English phrase “Turn on the living room lamp.”
Model Size
Runtime
Model
Time
Parakeet
wyoming-onnx-asr
nemo-parakeet-tdt-0.6b-v2
0.26s
Tiny
wyoming-onnx-asr
onnx-community/whisper-tiny.en
0.35s
wyoming-faster-whisper
tiny
0.4s
wyoming-faster-whisper
tiny-int8
0.49s
wyoming-onnx-asr
onnx-community/whisper-tiny
0.5s
wyoming-faster-whisper
Systran/faster-distil-whisper-tiny.en
0.59s
Base
wyoming-onnx-asr
whisper-base
0.6s
wyoming-onnx-asr
onnx-community/whisper-base.en
0.76s
wyoming-faster-whisper
Systran/faster-whisper-base
0.82s
wyoming-faster-whisper
Systran/faster-whisper-base.en
0.94s
Small
wyoming-faster-whisper
Systran/faster-distil-whisper-small.en
1.4s
wyoming-onnx-asr
onnx-community/whisper-small.en
3.4s
Large
wyoming-onnx-asr
onnx-community/whisper-large-v3-turbo
8.1s
wyoming-faster-whisper
Systran/faster-distil-whisper-large-v3
10s
About
The addon source can be found in onnx-asr-addon, which is an addon version of the wyoming-onnx-asr python module, itself heavily based on wyoming-faster-whisper. This is all made possible through the work of the developer of onnx-asr who ported the parakeet model in the first place.
Installation
This addon can be installed from my repository:
And read the docs By default the addon only sets up an english model, but it can be configured with both english and multilingual. Once running, the Wyoming integration should be auto-detected in integrations.
If you’re using Home Assistant Container, wyoming-onnx-asr provides a drop-in replacement for the wyoming-faster-whisper container as well.
The library is of particular interest to Russian-speaking users. Since it gives access to a quality local GigaAM model. If anyone is interested, I have already done a similar project (but the translation is only in the addon). At the same time allowing to get sufficient speech recognition speed on n100 cpu. This is also true for parakeet.
It would be nice to help the author add support for canary, it would add a few more languages, but it seems to have problems with it when converting to onnx
You can also manually create onnx for fastconformer (available for several European languages), but no one has done a comparison on test datasets, so it’s not known if this is better than whisper
upd.
Added the multilingual model nemo-parakeet-tdt-0.6b-v3 to my server
In my test system I installed the addon, configured both the model_en and model_multi to “auto”. The debug logs say the models are downloaded correctly.
Installed the addon for Wyoming, and the STT is viewable. However when I configure a new Voice Assistant and want to select the Onnx-asr as engine, it is not possible. It is greyed out. What am I missing?
I can reproduce this, and it looks like an error in how I’m presenting the language codes for the multilingual model.
I’ll try and get a fix out soon.
While trying to troubleshoot I couldn’t find a way to change the voice assistant language of a pipeline: does home assistant even support this? Language is a drop-down on each item, but I can’t see where to add extra languages at the start!
I’ve released 0.1.3 which should resolve this, although I’m not sure what will happen if you have both en and multilingual enabled at the moment. I’ll fix that up later today.
Hi,
I’ve installed addon, but cannot see in speech to text services.
I clicked your my link to add Wyoming Protocol, but it wants me to enter host and port number.
Where can I get these information?
Currently my Wyoming Protocol shows piper and whisper as services.
Thank you.
Hey @dayshine got notification for new version, but can’t update plugin.
Error: Failed to perform the action update/install. Error updating ONNX ASR: Can’t install ghcr.io/tboby/onnx-asr/amd64:0.2.0: 404 Client Error for http+docker://localhost/v1.51/images/create?tag=0.2.0&fromImage=ghcr.io%2Ftboby%2Fonnx-asr%2Famd64&platform=linux%2Famd64: Not Found (“manifest unknown”)