First of all, I want to thank @NathanCu for the incredible explanatory work in the thread Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools - #8 by NathanCu, where I was able to find and fully understand how to make Assist work with Google Generative AI as a agent generating the corrects prompts.
My goal:
My objective is to enable my satellite (Home Assistant Voice PE) to proactively initiate a conversation by posing a question and, based on my response, executing the appropriate actions accordingly.
For example, if I turn on a light during the day, the agent will ask me if I want to turn off that specific light. If I respond yes, it will proceed to turn it off.
First part: let’s create the question:
We need to create enough context for the model to ensure that when we respond positively, it can correctly understand what action it needs to perform.
Let’s create an automation that:
- Triggers when I turn on the kitchen light.
- Checks if it is daytime (sun after sunrise and before sunset).
- Uses conversation.process to ask me: “Do you want to turn off the kitchen light?”
- IMPORTANT: To provide the initial context, we will use conversation_id: question_ask.
- The question will be announced on the designated satellite.
alias: Asking for kitchen light
description: ''
triggers:
- entity_id:
- light.kitchen
to: 'on'
trigger: state
conditions:
- condition: sun
before: sunset
after: sunrise
actions:
- alias: generate question with conversation.precess
action: conversation.process
metadata: {}
data:
text: 'Ask me without perform any actions unless I respond: Do you want to turn off the kitchen light? Do not perform any actions unless I respond positively. Example: 'Yo brother, do you want to turn off the kitchen light?'"
agent_id: conversation.google_generative_ai
conversation_id: question_ask
response_variable: action_response
- alias: Formatta la risposta
variables:
message: "{% if action_response and action_response.response and action_response.response.speech
\n and action_response.response.speech.plain and action_response.response.speech.plain.speech
%}\n {{ action_response.response.speech.plain.speech }}\n{% else %}\n Non
ho ricevuto una risposta chiara, ma il comando è stato eseguito.\n{% endif
%}\n"
- alias: annouce the question
action: assist_satellite.announce
metadata: {}
data:
message: '{{ message }}'
target:
device_id: <<YOUR DEVICE ID>>
mode: single
Second part: correct action by the model based on our answer
Now we need to ensure that if we respond with “yes,” “ok,” “alright,” etc., without an apparent context, the model can correctly identify the appropriate conversation_id: to execute the correct action if necessary.
Let’s create an intent script that, when we respond positively without apparent context, make the action conversation.process with:
- text: “yes”
- conversation_id set to the same one used in the automation (I used “question_ask”)
Question_ask:
description: >
# This intent handles generic affirmative responses such as "yes," "ok," "alright," "exactly"
# when they are not directly linked to a clear context.
#
# Functionality:
# - If the user says "yes" without explicitly referring to a previous request,
# a new `conversation.process` is automatically triggered with "yes" as the command.
# - If the "yes" is part of an already structured conversation, the LLM follows the natural flow.
#
# Output:
# - If the user's confirmation is recognized as independent, the system triggers:
# conversation.process('text': 'yes', conversation_id: 'question_ask')
# - The response generated by the conversation process is returned and announced by Assist.
# - If the model does not generate a clear output, a predefined message is returned.
#
# Best Practices:
# - For questions requiring confirmation, wait for an affirmative response before executing actions.
# - If the context is unclear, treat the confirmation as generic and allow the system
# to determine whether further clarification is needed.
# - The LLM should always maintain the natural flow of conversation without asking
# for unnecessary confirmations again.
action:
- action: conversation.process
metadata: {}
data:
agent_id: conversation.google_generative_ai
conversation_id: question_ask
text: "yes"
response_variable: action_response
- stop: ""
response_variable: action_response
speech:
text: >
{%- if action_response and action_response.response and action_response.response.speech and action_response.response.speech.plain and action_response.response.speech.plain.speech %}
{{ action_response.response.speech.plain.speech }}
{%- else %}
Ok, done.
{%- endif %}
In my case, it works perfectly.
Now, if I turn on the kitchen light:
- My Assist satellite will announce: “Yo bro, do you want to turn off the kitchen light?” (It has Snoop Dogg’s personality).
- When I respond “yes”, without any apparent context, it will trigger the Question_ask intent, which will execute the correct action taking the context from conversation_id: question_ask
To start a conversation with other conditions, just create onother Automation with your condition.