Piper TTS: no streaming w/LLM

I cannot get Piper to stream my local LLM output via Android app or browser.

AMD64 server (not HA hardware)
Android app
HA version 2025.7
Piper TTS version 1.6.2

  • HTTPS in place

I have very nice end to end voice/LLM combo with Ollama as conversation agent. Everything working smoothly.

Since streaming became available I’ve tried to add it in, but I just cannot get it to work.

I have streaming toggled ON in the Piper settings, the Android app sees this. I added HTTPS support so that my browser allows voice access/TTS. I’ve restarted the HA server several times, reinstalled the app, but still no streaming.

I have seriously been researching this for days, reading gocs, poking around in the settings, looking into community addons. Nothing I can find seems to be recent enough (HA 2025.7 + Piper 1.6.1/1.6.2)

I must be missing something, but I am drawing a total blank.

Any ideas?

What actions do you perform in the browser? What result do you get? Please describe in detail.

I have to put this on hold. After a hard reboot (powered off/on) I cannot connect to my server. I suspect the self-signed certificate I installed. Apologies and thanks!

edit: Sorry, cant post all the proper links as I’m restricted as a new forum signup.

This was one of the few threads I found online whilst searching high and low for the same info so I’m reposting here in case it helps anyone else. Perhaps someone more eloquent than me could write a guide.

The key info for me was in a thread on these forums where someone mentioned passing
“–streaming” to the piper config. They were talking about docker but I just worked through leaning how to edit files on native HAOS. Maybe search for “documentation-to-test-streaming-tts-feature”

Here’s what I did to get streaming working.

In Piper 1.6.1 the release notes showed they had enabled much faster streaming of LLM responses. Essentially what’s happening is the speech generation begins before the LLM has finished fully formulating its response. The result is that longer responses and conversational interactions go from quite slow to useably fast. My experience prior to this was:

  1. Turning on a light might take upto 10 seconds or slightly longer
  2. Anything not simplistic would take upto 90 seconds to start responding.

This was with prefer local processing enabled and when waking up my main PC to run the LLMs the first query of the day required loading up the LLM into its memory which also slowed things down. That aspect is still true but you can tell HAOS to keep the LLM alive with a timeout of -1 for infinite.

Here’s my home setup:

Home Assistant Voice PE
Satellite1 from FutureProofHomes
Raspberry Pi 5 16Gig
Mid tier Local PC with LLM(s) running on RX 7900 XTX 24Gig
All using todays very latest versions of software and firmware except for HAOS which is 2025.11.3

I’ve not done any extensive benchmarks for different LLMs but did notice an improvement when using smaller models, but of course YMMV depending on your needs and hardware. For managing a home and dealing with simple longer queries I’m using ministral-3:3b ollama com /library/ministral-3 which is only a 3Gig download. I’ll probably go to AMD 395+ miniPC when budgets allow.

Step 1 - Make sure your HAOS is up to date and backed up. Back it up. I really mean it.

Step 2 - Read this article about editing configuration.yaml home-assistant.io docs configuration Then install the file editor addon:
home-assistant.io common-tasks os #installing-and-using-the-file-editor-add-on They describe using samba or Studio code. If you have these already you could edit the same config using those. YMMV.

Step 3 - Open file editor and from the HomeAssistant root find configuration.yaml, click on it and an editor pane will appear to the right. See picture below.

Step 4 - Add the content from line 13 onwards , the rest was already there. Don’t edit anything else. Don’t worry about it looking like a short configuration.yaml, that’s normal.

Enable Streaming for PIPER TTS

tts:

  • platform: piper
    model: en_US-piper
    streaming: true

If you get it right there’s a green tick on the top right as an indicator. Click the adjacent save button and restart the whole HAOS so all the scripts/addons get a fresh boot. If you test a longer query, e.g. “How do I stroke a cat” or “Tell me a short story” you should get a response generated much faster than previous attempts. If your LLM isn’t loaded the first attempt will have to wait for it to load so you might want to set it to be held in memory with -1 on the ollama keepalive option.

Wyoming Piper 2.0+ (current versions of the add-on and container) uses streaming mode by default, no additional actions are required.

The “–streaming” parameter is no longer available

1 Like

Curious… I’m running wyoming piper Version 2.1.1

Perhaps I’ve overwritten something on a backup/restore. Its definitely improved from the previous iteration of HA and voice pipelines I had setup. I’ll go back and look at the versions running on the old deployment, thanks.