Hello everyone,
I have a working voice pipeline set up with an ESP device—it’s working perfectly for standard commands.
However, I’m now trying to automate a flow where the AI (LLM) initiates the conversation, asking something like: “Do you want me to turn on the lights?”
Here’s what I’m aiming for:
- A simple TTS prompt: “Do you want me to turn on the lights?” (This part works)
- Activate the ESP’s microphone, which triggers the default voice pipeline (this part works too)
- The user responds with any variation of a confirmation, e.g., “Yeah sure,” “Absolutely,” etc.
- The default voice pipeline uses the LLM to understand that response and continue the conversation or take appropriate action.
My question is:
How can I best set this up so the LLM retains context (i.e., understands the question it just asked with TTS) and proceeds based on the user’s answer—all using the ESP device’s voice pipeline?
I’m thinking if I could write a custom integration to inject chat history into default voice pipeline. OR, by just passing “conversation_id” around in the voice pipeline, so it would retain the memory.
Thank you!