Looking for an OpenAI Whisper API integration

Hi,

I’m looking for an integration that can use the openAI Whisper API, basically sending a recording and getting a text version (using openAI tokens). This would then popup in the “speech to text” box of the Voice Assistant configuration.

I looked everywhere but most of the results are about local whisper.

Is there something I could start from somewhere?

Thanks

Edit : I guess GitHub - chatziko/ha-google-cloud-stt: Use Google Cloud Speech-to-Text in Home Assistant. would make a good starting point

2 Likes

Try this integration.

I already use this integration, this is a conversation agent not a stt engine.

1 Like

Looking for the same thing, I simply don’t have the horse power to run the local integration since it only starts understanding Swiss German at the medium model size.

I have tried this but I can’t make it work.

Yes same I spent quite some time on this and it appears that the wav metadata are stripped from the stream passed to the stt component, you can add it with back with wave :

wav_stream = io.BytesIO(audio_data)

with wave.open(wav_stream, 'w') as wav_file:
    wav_file.setnchannels(1)
    wav_file.setsampwidth(2)
    wav_file.setframerate(16000)
    wav_file.writeframes(audio_data)

wav =  wav_stream.getvalue() # wav file object 

I’m working on a package with a few additional features.

That is my attempt - lurebat/openai_stt (github.com)

I also created one using aiohttp, so the wav file is just sent to the whisper api and returned. Heavily inspired by openai_tts. For me it’s working in the Home Assistant Assit pipeline: GitHub - davidohne/ha_whisper-api_stt: Home Assistant Whisper API SST integration

chatgpt_assist

3 Likes

Thank you so much! You have made my day better as my system is finally usable in my language unlike google/microsoft were pretty weak. Thank you once again and have a good day!

There is an addon available in the addon store: addons/whisper at master · home-assistant/addons (github.com)

Don’t suppose anyone has a version of this that would work with the Groq version of Whisper?

1 Like

This one seems pretty decent:

2 Likes

Thanks!
I really appreciate it

1 Like

I made it on the fly :smiley:

GroqCloud seems to uses a better whisper model (whisper-large-v3) than OpenAI API that uses whisper-large-v2. And not only it’s also a little bit faster … but I’ve only made a few requests.

And I forgot to mention that is free up to 28,800 audio seconds a day :exploding_head:
Thanks for the suggestion @SyndicatedPillbug

3 Likes

Hi I’m still here with another report :sweat_smile:

Overall the OpenAI api is more stable over time the whisper-large-v2 is more than enough for the Home Assistant’s Assist, but it cost a few cents or more per month depending on the usage (which is a fair price).

GroqCloud on the other hand it is basically free (up to 28800 audio seconds per day) use a better model whisper-large-v3, probably overkill for this usage.
But from time to time it’s really slow at processing the transcription up to even 5 minutes, I don’t know what is causing this behavior I’m pretty sure I wasn’t hitting any kind of usage limits.

Choose wisely :mage:

PS: if anyone wants to contribute even with a simple translation I would really appreciate it

2 Likes

Hey, thanks for your great work. Both components work perfectly :slight_smile:

While playing around with Groq, I saw that the chat models are also free, for now at least. Do you think it would be possible to create an add-on that replaces the conversation agent with Groq as well? I think it would be a killer add-on if it would let us use llama 3.1 and gemma2.