After tinkering with Faster-Whisper for a while, I’ve come to the conclusion that local STT is a crap. So, I’m sticking with Google until this dinosaur goes extinct.
Now, I’m building a shiny new Gemini Cloud STT to finally retire the ancient Faster-Whisper setup in Home Assistant.
Actually, I was shocked that now the gemini can support transcription from different languages. At the time I develop the integration (a few days ago). It support only Englsih transcription. Now, it basically works for all languages and even understands “sounds”. You can try the mic in the conversation. It should translate and transcribe your voice.
I was thinking that STT → INTENT → TTS consuming three LLM API calls is so… NOT ELEGANT. But I think I will do that as a hands-on exercise before moving to a one-shot audio input to audio output assist pipeline.