TTS streaming support

Are there any TTS engines that work in streaming mode aside from piper?

I have the latest version and with local piper streaming mode works, if there is a long answer from the LLM then assistant starts reading it before the whole output is on screen. However there is a noticeable delay/latency. That is especially annoying if the response is short (it’s of course understandable since it’s done a the local cpu).

It works much better with a GPU-accelerated piper. The slightly inconvenient part is that in assistant menu there is no easy way to tell two piper instances apart but it’s not the end of the world.

However the voice quality is not as good as some other TTS engines out there.

In particular I like Kokoro since it has a good compromise between voice quality and speed. It supports streaming if I go to it’s web interface. However when used from HA it waits until the whole output is on the screen. I’ve tried this implementation that provides a wyoming proxy to OpenAI API and this implementation that adds an integration that can be pointed towards an OpenAI API endpoint and then exposes it inside HA. In both cases it waits until the whole text is on the screen.

Looking at the diagram for wyoming proxy it appears that the proxy intentionally waits for the whole text to appear before passing it on to TTS engine.

Does anyone know about either other Wyoming proxies that support streaming or other integrations that can do it?

And is this something that needs to be additionally set up on HA side or on the TTS side? I suspect the latter but I could be missing something.

The TTS, or at least the wyoming side of the TTS (in your case the wyoming proxy) would have to implement the support for this.

There are new events for SynthesizeStart, SynthesizeChunk, SynthesizeStop and SynthesizeStopped that the proxy has to react to, plus it has to register itself as supports_synthesize_streaming=True.

the changes are relatively straight forward looking at the changes made in Piper for streaming support in this commit Add streaming support · rhasspy/wyoming-piper@0cbbfde · GitHub

So in short, nothing to do on the HA side (if you have 2025.07, but the proxy would require changes.

1 Like

I implemented a proxy for the legacy Wyoming servers as soon as this feature became available.

As for TTS integrations, depending on the synthesis method, you have to come up with the best solution. Among my repositories, there’s a fork of hass-edge-tts where I implemented new functionality.

The streaming seems to work but I had to change the exec for my docker image
--streaming --voice en_US-libritts-high --speaker 8

I’ve made a small addon that is a wyoming protocol proxy for cloud TTS providers. For now it supports only Google Cloud and OpenAI, but it can be extended to other providers as well.

It basically gives you the same streaming functionality as you get with the latest wyoming-piper but with cloud TTS providers. It also works with Home Assistant Voice Preview Edition devices. Check it out here: GitHub - eslavnov/wyoming-cloud-streamer: Wyoming protocol server for cloud TTS engines.

The official Docker image 1.6.3 for docker.io/rhasspy/wyoming-piper now supports streaming and I’ve verified this (I run a Core setup in a VM with containers manually started on a machine with a GPU).

Proof:

Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:synthesize: raw_text=Lily, filled with wonder, chose to stay and learn, beginning her journey as a guardian of the forest's magic., text='Lily, filled with wonder, chose to stay and learn, beginning her journey as a guardian of the forest's magic.'
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:input: {'text': "Lily, filled with wonder, chose to stay and learn, beginning her journey as a guardian of the forest's magic."}
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:/tmp/tmp94rx4z0d/1757931780044717722.wav
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Completed request
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Synthesizing stream sentence: As the seasons passed, Lily grew stronger, her connection to the forest deepening with each day. She learned to speak with the animals, heal the land, and protect the balance of nature.
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Synthesize(text='As the seasons passed, Lily grew stronger, her connection to the forest deepening with each day. She learned to speak with the animals, heal the land, and protect the balance of nature.', voice=SynthesizeVoice(name='en_US-hfc_male-medium', language=None, speaker=None), context=None)
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:synthesize: raw_text=As the seasons passed, Lily grew stronger, her connection to the forest deepening with each day. She learned to speak with the animals, heal the land, and protect the balance of nature., text='As the seasons passed, Lily grew stronger, her connection to the forest deepening with each day. She learned to speak with the animals, heal the land, and protect the balance of nature.'
Sep 15 10:23:00 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:input: {'text': 'As the seasons passed, Lily grew stronger, her connection to the forest deepening with each day. She learned to speak with the animals, heal the land, and protect the balance of nature.'}
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:/tmp/tmp94rx4z0d/1757931780762582062.wav
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Completed request
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Synthesizing stream sentence: Her kindness inspired others in the village to care for the earth, and together they restored the forest to its former glory.
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Synthesize(text='Her kindness inspired others in the village to care for the earth, and together they restored the forest to its former glory.', voice=SynthesizeVoice(name='en_US-hfc_male-medium', language=None, speaker=None), context=None)
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:synthesize: raw_text=Her kindness inspired others in the village to care for the earth, and together they restored the forest to its former glory., text='Her kindness inspired others in the village to care for the earth, and together they restored the forest to its former glory.'
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:input: {'text': 'Her kindness inspired others in the village to care for the earth, and together they restored the forest to its former glory.'}
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:/tmp/tmp94rx4z0d/1757931781214709285.wav
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Completed request
Sep 15 10:23:01 roxanne.dragonfear wyoming-piper[3313113]: DEBUG:wyoming_piper.handler:Synthesize(text='And so, Lily became a legend, a symbol of harmony between humans and nature, forever remembered in the hearts of all who lived in the village.', voice=SynthesizeVoice(name='en_US-hfc_male-medium', language=None, speaker=None), context=None)

Worth noting: the phone app won’t stream, but the assist satellites at home all will and do stream.

For long LLM responses, this is truly a gamechanger.