Google Generative AI & Piper Response Streaming

I would like to express support for streaming for Google Generative AI and Piper.

Ideally, the LLM response would be able to be streamed to Home Assistant and from Home Assistant into Piper TTS, such that the time to first spoken word is reduced.

I believe Google Gemini’s API is capable of text streaming and I believe that some STT services are capable of streaming.

See reference to stated efforts for Ollama and OpenAI in Voice Chapter 9 below:

Reducing the time to first word with streaming

When experimenting with larger models, or on slower hardware, LLM’s can feel sluggish. They only respond once the entire reply is generated, which can take frustratingly long for lengthy responses (you’ll be waiting a while if you ask it to tell you an epic fairy tale).

In Home Assistant 2025.3 we’re introducing support for LLMs to stream their response to the chat, allowing users to start reading while the response is being generated. A bonus side effect is that commands are now also faster: they will be executed as soon as they come in, without waiting for the rest of the message to be complete.

Streaming is coming initially for Ollama and OpenAI.

Since VoicePE is the main product, we must first wait for the implementation of audio streaming to the device. Piper supports streaming from the beginning (but I think this option will not be used, and splitting the whole text into word groups or sentences will be enough). Receiving chunks from the LLM is already implemented. The team needs to work with esphome and decide how many chunks to voice before sending to voicePE, and implement all the related tasks.
This work is ongoing and probably already has a fairly high priority.

1 Like