HA Voice PE: Continuous Conversation not working with Gemini and OpenAI

Hi everyone, I just got myself a Home Assistant Voice PE and I’m currently exploring the possibilities of using LLMs and STT/TTS models from Google, OpenAI and ElevenLabs.

Problem: Continuous Conversation only seems to work properly with ElevenLabs and Piper as TTS.
When using Gemini TTS (gemini-2.5-flash-preview-tts via Google Gemini Addon) or OpenAI TTS (OpenAI TTS 3.7 tts-1/gpt-4o-mini-tts) the microphone basically turns back on immediately while the TTS audio is still playing.

I would really prefer to use Google Gemini or OpenAI because of the significantly lower costs while still maintaining high voice quality.

What can I do to fix this? Is there a way to tweak the pipeline or prevent the microphone from triggering too early when using this two TTS services?

Any help would be greatly appreciated!

1 Like

Find a custom component that implements the _process_tts_stream method for tts. Or use an external gateway. For example, this project.

Thanks for the specific tip! It now works with OpenAI, but it seems to be not very stable so far. Is there something similar available for Gemini?

I did a test project, and I can say that synthesis using gemini-2.5-flash-preview-tts isn’t suitable for this task. It’s too slow, and the intonation in the text, which is divided into separate sentences, is inconsistent.

1 Like

I see what you mean. That’s a bit of a shame, because the Gemini voices actually sound amazing! And they are much better than the OpenAI voices!

My recent tests with Nabu Casa and continuous speech have been a bit hit-or-miss, as the whole setup isn’t running as smoothly as I’d like yet. It’s awesome that the voices here are almost as fast as ElevenLabs.

Try Google’s TTS cloud service with this project.

Honestly, I don’t really understand why streaming synthesis hasn’t been added to the system G.cloud component yet. Apparently, few people need it.

1 Like

It works, thanks! The quality isn’t terrible, but it definitely feels like a step down from the actual Gemini TTS voices, despite having the same names. At the moment, I think I’ll have to stick with ElevenLabs.