Hello, is there a plan to implement a streaming feature for the voice assistant?
Similar to how the paid version of the ChatGPT voice assistant operates, it would be beneficial not to wait until the conversation agent’s response is fully generated before starting the text-to-speech (TTS) conversion. Instead, streaming the text of the reply while using the TTS streaming feature would enhance the experience.
Incorporating a streaming feature could reduce response times by 10-40 seconds per query for lengthy answers provided by GPT-4 as a conversation agent. This would align its performance with that of other commercial voice assistants.
+1, I’m looking for this feature as well. It is specially useful when running local models like Ollama, because it is quite slower than ChatGPT in my setup
I believe that this is what I would need to have an extended spoken exchange with an AI model. The OoenAI Assistant APi has a streaming operation that can be configured to feel and behave very naturally and begin responses while still generating. For my use case the Streaming Conversation would also make use of the ability of voice input to halt and interrupt the current response if it’s going the wrong way.
I am someone that best retains information learned by speaking, and not reading or listening [^hbmy] and I would love to switch from my bespoke widget at a computer to using my VoicePE pucks for brainstorming and thinking out loud with a feedback loop.
(I also spend a lot of time apart from my partner because I have young children, and sometimes I just need to say things to A Voice that acknowledges them and reflects it back ), which I suppose means I like an LLM to prompt me sometimes?!
(somewhat related, for about six months off and on I have been building a household assistant that can plot optimal travel schedules and predict likely windows for as-yet-unscheduled appointments via an imap bridge, my text messages, and family and household calendars across two households. it’s crazy and the craziest part is definitely text messages. I am constantly astounded by how much my tuned models and RAG can divine from my life 2012-present, and the first thing my dumbass did was ask who the most significant people in my life are and for someone with what can only be called hemorrhagic adhd, it easily squirrels me into a hundred directions. and worst of all it’s mostly motivated by the possibility of actually knowing what i have missed, what went wrong, and what it cost me so veer away from that black hole if you can long story short i already have an assistant running in my house that knows entirely too much about me and I would love to have those conversations via VoicePE than my phone or computer, especially during the work day.)
[hbmy]: the harvard business journal essay collection on managing yourself has a banger early in the book about this exact concept. it was extremely illuminating for me and i didn’t know this about myself until i was 40yo, so you might imagine how urgent it feels for me to ensure i can talk through an idea or thought on demand because i would benefit a lot from that and stop talking incessantly every time the housekeeper is here