Looking for an OpenAI Whisper API integration

But this can use your GPU and Cuda… (Renamed from faster whisperer)

This should show up where all the other voice choices are right? Did you have to add much to the config yaml? I’m running a bunch of whisper flavors on a separate server, confused why I’m striking out on this one. Any idea if there is a good tutorial somewhere?

Is there any integration that allows sending an audio file for Whisper to transcribe? Or sending audio directly to an AI? All the ones I’ve seen only work with Assist.

Interesting task, though I can’t imagine where to apply it. I made a rough version specifically for you.

Working with multimodal LLMs is a more complex task. OpenAI and Google integrations allow you to send files for processing, while other LLMs require you to create a separate implementation of these functions.

1 Like

How does it work? Do I need to have anything else installed?

You will need the stt service and an audio file.

What I am trying to do is send a sound file that I record each time an alarm is triggered, and have an AI tell me (either using transcription or directly) whether it detects the siren or not. Is this possible with this integration? Does it return any output?

You send audio and receive text. This will not work with a siren.

And in the text, it doesn’t say: Did it sound like a siren? Thanks anyway!

The request for speech recognition using Whisper confused me; in fact, you can use action: google_generative_ai_conversation.generate_content to solve your task.

1 Like

thanks very much!

Thank you very much :+1:

You should really create an own thread for this integration.
Nearly missed it hidden in this thread. :wink: