I’m looking for an integration that can use the openAI Whisper API, basically sending a recording and getting a text version (using openAI tokens). This would then popup in the “speech to text” box of the Voice Assistant configuration.
I looked everywhere but most of the results are about local whisper.
Looking for the same thing, I simply don’t have the horse power to run the local integration since it only starts understanding Swiss German at the medium model size.
Yes same I spent quite some time on this and it appears that the wav metadata are stripped from the stream passed to the stt component, you can add it with back with wave :
wav_stream = io.BytesIO(audio_data)
with wave.open(wav_stream, 'w') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(16000)
wav_file.writeframes(audio_data)
wav = wav_stream.getvalue() # wav file object
I’m working on a package with a few additional features.
Thank you so much! You have made my day better as my system is finally usable in my language unlike google/microsoft were pretty weak. Thank you once again and have a good day!
GroqCloud seems to uses a better whisper model (whisper-large-v3) than OpenAI API that uses whisper-large-v2. And not only it’s also a little bit faster … but I’ve only made a few requests.
And I forgot to mention that is free up to 28,800 audio seconds a day
Thanks for the suggestion @SyndicatedPillbug
Overall the OpenAI api is more stable over time the whisper-large-v2 is more than enough for the Home Assistant’s Assist, but it cost a few cents or more per month depending on the usage (which is a fair price).
GroqCloud on the other hand it is basically free (up to 28800 audio seconds per day) use a better model whisper-large-v3, probably overkill for this usage.
But from time to time it’s really slow at processing the transcription up to even 5 minutes, I don’t know what is causing this behavior I’m pretty sure I wasn’t hitting any kind of usage limits.
Choose wisely
PS: if anyone wants to contribute even with a simple translation I would really appreciate it
Hey, thanks for your great work. Both components work perfectly
While playing around with Groq, I saw that the chat models are also free, for now at least. Do you think it would be possible to create an add-on that replaces the conversation agent with Groq as well? I think it would be a killer add-on if it would let us use llama 3.1 and gemma2.