Hopefully this hasn’t been asked yet, though I couldn’t find it if it has.
When using a voice assistance with openwakeword detection and Extended OpenAI Conversation integration, how can we add the ability to continue a conversation or give a follow up response or question in the same conversation. Currently is seems that every time you say the wake word, it starts a brand new conversation and forgets about the previous interaction. This would be similar to Alexa waiting for a few seconds for you to say a follow-up question or instruction without having to repeat the wake word. Ideally if all of the interactions could be kept in some memory or storage so that the voice assistant could learn over time to its environment, that would be amazing…but one step at a time!
Thank you and if there is anything I can assist with, let me know. I’m still quite green in my dev and AI skills but I’m slowly learning.
this will make wyoming satellite the killer app for home assistant. also the extension of this which is allowing the wyoming device to be awoken by a service call, as opposed to the wake word.
example use case is for example i have a smart button remote and i want to set the long press action to wake my voice assistant, in the event i’m not able to or dont want to use the wake word.
I think there are two separate issues here: (1) the assistant remembering the previous details from the conversation, and (2) needing to repeat the wake word to initiate any voice input from user.
For me, (1) is the bigger issue right now. I would be happy to say the wake word every time to continue the conversation. For me, (1) is already working when I used the assist via typing (assist button in Home Assistant). So somehow, the assistant is invoked in a different way between the “typing interface” and the voice interface
This “remembering” is done via “conversation ID” field. You can call service “conversation.process” several times, passing conversation ID from previous call, and assistants, that support it, will have context of previous requests.
AFAIK Assist doesn’t support conversation ID (it’s always null, even if you pass it), and I don’t see it yet how it could do that.
Seems use AlexiiT’s StreamAssist is the solution? It allow you to call Stream.run service to directly start from SST phase, or even allow you to start from let AI ask you a question & you reply it.
Until now I thought the same that saying the wake word again causes it to init a new conversation and all before was forgotten. Thus in the past I ran into a dead end more than once already where OpenAI asked something back and I was not able to reply. I few minutes ago I asked Assist to switch on my Yamaha Receiver and LG TV and it asked back which room this Receiver might be in. No clue why it asked that with only this ONE receiver, but that is not the problem here. I said the wake word again and just replied “the one in living room” and it happlily switched on the TV and receiver then. So it still was in the former conversation and knew then what it needed to do it’s job. Only thing I still would love to see is that the Assistant would immediately wait for my reply instead of making me say the wake word again, but all in all I was surprised that this worked so fine.
OpenAI integration has the context window (it is actually manageable, AFAIK).
However, Assist doesn’t. We want it.
Also, would be great to make it listening right away after the question, eliminating the need in wake word.
Ive wanted the same thing so am working on modifying the default voice assistant code so that the voice assistant turns wake word detection off at the end of it processing the initial command, and start is again to listen for the next thing said.
An idle timer and a HA automation resets it after 20 seconds so it waits for the wake word again. Planning to make that part of the ESPhome config instead.
Ive got a thread about it here but I think this is achievable with the currently available tools.
I’m also interested in having this functionality in the Voice Assist pipeline. It’s already there in the app – when using the assistant from there, even with voice, it keeps the context of the conversation and you can do things such as Turn on the lights in the kitchen → The lights are now on → Ah, no, actually I meant the office => and it works just fine. With the Wyoming-style voice integration however there seems to be no context saved anywhere. Are these completely different implementations, or could we perhaps re-use the pipeline from the app?