Hi All
I’m using Vosk locally for a while via HA App and phone Assistant with no big issues.
It correctly gets speech transcribed most of the time.
Unfortunately, PE seems not to like it, as 90% of the time “it’s sorry” that it can’t understand
I’ve checked the audio in the pipeline and it’s very noisy compared to the phone one (tested from the same distance) and I would have expected to have at least the same quality (maybe the audio saved is not yet “processed” by PE?)
Surprisingly, PE worked okay via HA cloud, but I want to keep it local, and the response time using Vosk is okay for me.
I’ve searched and found something about the noisy audio but not much other information
It’s already processed by the DSP chip. So you get the same sound as the STT module.
The small Vosk model does not work well with distorted sound - that’s the payback for speed.
Local, high quality and fast, low cost. You can only choose two items out of three. // while the Whisper MLX may be a good solution, the mac mini has good power consumption numbers. //
You can also use Vosk in a similar way as StP if you enter all the required phrase variants into the dictionary, but this option precludes interaction with the LLM.
Yeah, unfortunately I’m running HAOS voice stuff on a server that was not designed to do it so Vosk was a good compromise (it works fine with the App).
I knew the requirement to run STT locally using PE, but what I didn’t expect is the “poor” quality of the audio coming from it.