Looking for an OpenAI Whisper API integration

holyWorkshopKeys · January 31, 2024, 12:35pm

Hi,

I’m looking for an integration that can use the openAI Whisper API, basically sending a recording and getting a text version (using openAI tokens). This would then popup in the “speech to text” box of the Voice Assistant configuration.

I looked everywhere but most of the results are about local whisper.

Is there something I could start from somewhere?

Thanks

Edit : I guess GitHub - chatziko/ha-google-cloud-stt: Use Google Cloud Speech-to-Text in Home Assistant. would make a good starting point

ddaniel · January 31, 2024, 1:56pm

Try this integration.

holyWorkshopKeys · January 31, 2024, 3:06pm

I already use this integration, this is a conversation agent not a stt engine.

chixxi · January 31, 2024, 6:51pm

Looking for the same thing, I simply don’t have the horse power to run the local integration since it only starts understanding Swiss German at the medium model size.

sayanova · February 2, 2024, 3:47pm

I have tried this but I can’t make it work.

holyWorkshopKeys · February 3, 2024, 4:14am

Yes same I spent quite some time on this and it appears that the wav metadata are stripped from the stream passed to the stt component, you can add it with back with wave :

wav_stream = io.BytesIO(audio_data)

with wave.open(wav_stream, 'w') as wav_file:
    wav_file.setnchannels(1)
    wav_file.setsampwidth(2)
    wav_file.setframerate(16000)
    wav_file.writeframes(audio_data)

wav =  wav_stream.getvalue() # wav file object

I’m working on a package with a few additional features.

lurebat · February 6, 2024, 10:25pm

That is my attempt - lurebat/openai_stt (github.com)

davidohne · February 9, 2024, 10:52am

I also created one using aiohttp, so the wav file is just sent to the whisper api and returned. Heavily inspired by openai_tts. For me it’s working in the Home Assistant Assit pipeline: GitHub - davidohne/ha_whisper-api_stt: Home Assistant Whisper API SST integration

chatgpt_assist

emanuelbaltaretu · March 16, 2024, 5:17am

Thank you so much! You have made my day better as my system is finally usable in my language unlike google/microsoft were pretty weak. Thank you once again and have a good day!

kendonB · June 9, 2024, 5:04am

There is an addon available in the addon store: addons/whisper at master · home-assistant/addons (github.com)

SyndicatedPillbug · August 10, 2024, 9:33pm

Don’t suppose anyone has a version of this that would work with the Groq version of Whisper?

bungamungus · August 13, 2024, 2:02am

This one seems pretty decent:

fabio-garavini · August 14, 2024, 4:46pm

Thanks!
I really appreciate it

fabio-garavini · August 14, 2024, 6:18pm

I made it on the fly

GroqCloud seems to uses a better whisper model (whisper-large-v3) than OpenAI API that uses whisper-large-v2. And not only it’s also a little bit faster … but I’ve only made a few requests.

And I forgot to mention that is free up to 28,800 audio seconds a day
Thanks for the suggestion @SyndicatedPillbug

fabio-garavini · August 15, 2024, 9:53am

Hi I’m still here with another report

Overall the OpenAI api is more stable over time the whisper-large-v2 is more than enough for the Home Assistant’s Assist, but it cost a few cents or more per month depending on the usage (which is a fair price).

GroqCloud on the other hand it is basically free (up to 28800 audio seconds per day) use a better model whisper-large-v3, probably overkill for this usage.
But from time to time it’s really slow at processing the transcription up to even 5 minutes, I don’t know what is causing this behavior I’m pretty sure I wasn’t hitting any kind of usage limits.

Choose wisely

PS: if anyone wants to contribute even with a simple translation I would really appreciate it

HunorLaczko · August 24, 2024, 5:04pm

Hey, thanks for your great work. Both components work perfectly

While playing around with Groq, I saw that the chat models are also free, for now at least. Do you think it would be possible to create an add-on that replaces the conversation agent with Groq as well? I think it would be a killer add-on if it would let us use llama 3.1 and gemma2.