How are responses sent to piper?

swifty · December 18, 2024, 7:31pm

Does anyone know how responses from assist are sent to piper? Are they
A) “streamed” as assist is answering
Or
B) “queued” into a assist until the entire response is finished, and then sent?

The reason I ask is I’m currently looking into upgrading my home server hardware and would like to be able to run an LLM locally for voice assistants within the house…
Big hardware like a 4090 is out of the question but I see you can get reasonable speed responses with something like llama 3.2 using hardware like a 7845hs.
When I say reasonable speed I mean at a pace that you could read the text output… however if you needed to wait till the entire response is done then obviously that’s a different ball game.

My thought was if the response to piper was (almost) live streamed as the LLM model answers it would be much more seamless, rather than waiting 30s for the voice assistant to answer