TTS streaming support

Are there any TTS engines that work in streaming mode aside from piper?

I have the latest version and with local piper streaming mode works, if there is a long answer from the LLM then assistant starts reading it before the whole output is on screen. However there is a noticeable delay/latency. That is especially annoying if the response is short (it’s of course understandable since it’s done a the local cpu).

It works much better with a GPU-accelerated piper. The slightly inconvenient part is that in assistant menu there is no easy way to tell two piper instances apart but it’s not the end of the world.

However the voice quality is not as good as some other TTS engines out there.

In particular I like Kokoro since it has a good compromise between voice quality and speed. It supports streaming if I go to it’s web interface. However when used from HA it waits until the whole output is on the screen. I’ve tried this implementation that provides a wyoming proxy to OpenAI API and this implementation that adds an integration that can be pointed towards an OpenAI API endpoint and then exposes it inside HA. In both cases it waits until the whole text is on the screen.

Looking at the diagram for wyoming proxy it appears that the proxy intentionally waits for the whole text to appear before passing it on to TTS engine.

Does anyone know about either other Wyoming proxies that support streaming or other integrations that can do it?

And is this something that needs to be additionally set up on HA side or on the TTS side? I suspect the latter but I could be missing something.

The TTS, or at least the wyoming side of the TTS (in your case the wyoming proxy) would have to implement the support for this.

There are new events for SynthesizeStart, SynthesizeChunk, SynthesizeStop and SynthesizeStopped that the proxy has to react to, plus it has to register itself as supports_synthesize_streaming=True.

the changes are relatively straight forward looking at the changes made in Piper for streaming support in this commit Add streaming support · rhasspy/wyoming-piper@0cbbfde · GitHub

So in short, nothing to do on the HA side (if you have 2025.07, but the proxy would require changes.

1 Like

I implemented a proxy for the legacy Wyoming servers as soon as this feature became available.

As for TTS integrations, depending on the synthesis method, you have to come up with the best solution. Among my repositories, there’s a fork of hass-edge-tts where I implemented new functionality.

The streaming seems to work but I had to change the exec for my docker image
--streaming --voice en_US-libritts-high --speaker 8