I’d rather not self-promote, as I’ve already shared a post in the relevant section.
But after using this method for a few days, I can confirm its usefulness, so I will duplicate the information in this section.
Now my J4205 can easily play back responses from LLMs.
In the future, I expect improvements to the Wyoming protocol, local servers, and cloud integrations for various TTS engines with streaming support. Then this method will no longer be as relevant.
But for now, my integration might already prove useful to someone out there.
Copy the streaming_tts_proxy folder from this repository to your custom_components folder.
Restart the system.
Add the streaming tts proxy integration.
Specify the host and port (use core-piper:10200 for the add-on).
Specify the language and voice in the settings.
Start testing.
The only requirement is that your server must be capable of generating 1 second of speech in no more than 1 second (RTF = 1), preferably slightly faster.
I tried to reflect the work with different satellite versions in the diagrams; there are some inaccuracies, but the idea should be clear.
My variant is designed for the Wyoming protocol.
However, developers of custom voice integrations can already adapt them to the new method.
Also, the author will need to decide how to segment the text. Some TTS services likely support true streaming input, which could yield even better results.
Integration does not affect the speed of command execution, only the method of generating the voice response. The benefits of this become apparent when interacting with LLMs.
It’s also worth noting that the development team set a 60-character threshold, after which the streaming synthesis mechanism kicks in.
This was done to cache short standard responses, which is only possible with the old synthesis method. Streaming uses direct chunk transmission to the satellite and does not create a cache.
What I mean, it starts answering sooner than later after the command is done.
I know that because when I say turn off ac, the AC gives me a beep.
So it’s command-beep-answer, without any delay.
Everything is correct, Wyoming will receive a library update in the next release. Piper is already prepared for this update—it can be connected directly. @synesthesiam implemented a very smart text processing solution that accounts for the nuances of many languages. Authors of other servers can use this implementation as a reference.
After the 2025.7 release, I’ll add handling for the supports_synthesize_streaming key from the server in my integration to avoid duplicating data processing.
Another question: after Home Assistant, Wyoming, Piper, etc. implementing support for streaming, what will be the use case for your solution? Proxy to allow streaming with other TTS solutions that still don’t support the feature?