Hi everyone, I just got myself a Home Assistant Voice PE and I’m currently exploring the possibilities of using LLMs and STT/TTS models from Google, OpenAI and ElevenLabs.
Problem: Continuous Conversation only seems to work properly with ElevenLabs and Piper as TTS.
When using Gemini TTS (gemini-2.5-flash-preview-tts via Google Gemini Addon) or OpenAI TTS (OpenAI TTS 3.7 tts-1/gpt-4o-mini-tts) the microphone basically turns back on immediately while the TTS audio is still playing.
I would really prefer to use Google Gemini or OpenAI because of the significantly lower costs while still maintaining high voice quality.
What can I do to fix this? Is there a way to tweak the pipeline or prevent the microphone from triggering too early when using this two TTS services?
I did a test project, and I can say that synthesis using gemini-2.5-flash-preview-tts isn’t suitable for this task. It’s too slow, and the intonation in the text, which is divided into separate sentences, is inconsistent.
I see what you mean. That’s a bit of a shame, because the Gemini voices actually sound amazing! And they are much better than the OpenAI voices!
My recent tests with Nabu Casa and continuous speech have been a bit hit-or-miss, as the whole setup isn’t running as smoothly as I’d like yet. It’s awesome that the voices here are almost as fast as ElevenLabs.
It works, thanks! The quality isn’t terrible, but it definitely feels like a step down from the actual Gemini TTS voices, despite having the same names. At the moment, I think I’ll have to stick with ElevenLabs.