I would like to express support for streaming for Google Generative AI and Piper.
Ideally, the LLM response would be able to be streamed to Home Assistant and from Home Assistant into Piper TTS, such that the time to first spoken word is reduced.
I believe Google Gemini’s API is capable of text streaming and I believe that some STT services are capable of streaming.
See reference to stated efforts for Ollama and OpenAI in Voice Chapter 9 below:
Reducing the time to first word with streaming
When experimenting with larger models, or on slower hardware, LLM’s can feel sluggish. They only respond once the entire reply is generated, which can take frustratingly long for lengthy responses (you’ll be waiting a while if you ask it to tell you an epic fairy tale).
In Home Assistant 2025.3 we’re introducing support for LLMs to stream their response to the chat, allowing users to start reading while the response is being generated. A bonus side effect is that commands are now also faster: they will be executed as soon as they come in, without waiting for the rest of the message to be complete.
Streaming is coming initially for Ollama and OpenAI.