Home Assistant Voice PE (AI Agent): Start Conversations with conversation.process and conversation_id

Musca · March 12, 2025, 11:36am

First of all, I want to thank @NathanCu for the incredible explanatory work in the thread Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools - #8 by NathanCu, where I was able to find and fully understand how to make Assist work with Google Generative AI as a agent generating the corrects prompts.

My goal:

My objective is to enable my satellite (Home Assistant Voice PE) to proactively initiate a conversation by posing a question and, based on my response, executing the appropriate actions accordingly.
For example, if I turn on a light during the day, the agent will ask me if I want to turn off that specific light. If I respond yes, it will proceed to turn it off.

First part: let’s create the question:

We need to create enough context for the model to ensure that when we respond positively, it can correctly understand what action it needs to perform.

Let’s create an automation that:

Triggers when I turn on the kitchen light.
Checks if it is daytime (sun after sunrise and before sunset).
Uses conversation.process to ask me: “Do you want to turn off the kitchen light?”
IMPORTANT: To provide the initial context, we will use conversation_id: question_ask.
The question will be announced on the designated satellite.

alias: Asking for kitchen light
  description: ''
  triggers:
  - entity_id:
    - light.kitchen
    to: 'on'
    trigger: state
  conditions:
  - condition: sun
    before: sunset
    after: sunrise
  actions:
  - alias: generate question with conversation.precess
    action: conversation.process
    metadata: {}
    data:
      text: 'Ask me without perform any actions unless I respond: Do you want to turn off the kitchen light? Do not perform any actions unless I respond positively. Example: 'Yo brother, do you want to turn off the kitchen light?'"
      agent_id: conversation.google_generative_ai
      conversation_id: question_ask
    response_variable: action_response
  - alias: Formatta la risposta
    variables:
      message: "{% if action_response and action_response.response and action_response.response.speech
        \n    and action_response.response.speech.plain and action_response.response.speech.plain.speech
        %}\n  {{ action_response.response.speech.plain.speech }}\n{% else %}\n  Non
        ho ricevuto una risposta chiara, ma il comando è stato eseguito.\n{% endif
        %}\n"
  - alias: annouce the question
    action: assist_satellite.announce
    metadata: {}
    data:
      message: '{{ message }}'
    target:
      device_id: <<YOUR DEVICE ID>>
  mode: single

Second part: correct action by the model based on our answer

Now we need to ensure that if we respond with “yes,” “ok,” “alright,” etc., without an apparent context, the model can correctly identify the appropriate conversation_id: to execute the correct action if necessary.

Let’s create an intent script that, when we respond positively without apparent context, make the action conversation.process with:

text: “yes”
conversation_id set to the same one used in the automation (I used “question_ask”)

Question_ask:
    description: >
       # This intent handles generic affirmative responses such as "yes," "ok," "alright," "exactly"  
       # when they are not directly linked to a clear context.  
       #  
       # Functionality:  
       #   - If the user says "yes" without explicitly referring to a previous request,  
       #     a new `conversation.process` is automatically triggered with "yes" as the command.  
       #   - If the "yes" is part of an already structured conversation, the LLM follows the natural flow.  
       #  
       # Output:  
       #   - If the user's confirmation is recognized as independent, the system triggers:  
       #       conversation.process('text': 'yes', conversation_id: 'question_ask')  
       #   - The response generated by the conversation process is returned and announced by Assist.  
       #   - If the model does not generate a clear output, a predefined message is returned.  
       #  
       # Best Practices:  
       #   - For questions requiring confirmation, wait for an affirmative response before executing actions.  
       #   - If the context is unclear, treat the confirmation as generic and allow the system  
       #     to determine whether further clarification is needed.  
       #   - The LLM should always maintain the natural flow of conversation without asking  
       #     for unnecessary confirmations again.  
  
    action:
      - action: conversation.process
        metadata: {}
        data:
          agent_id: conversation.google_generative_ai
          conversation_id: question_ask
          text: "yes"
        response_variable: action_response  
      - stop: ""
        response_variable: action_response  
    speech:
      text: >
        {%- if action_response and action_response.response and action_response.response.speech and action_response.response.speech.plain and action_response.response.speech.plain.speech %}
          {{ action_response.response.speech.plain.speech }}
        {%- else %}
          Ok, done.
        {%- endif %}

In my case, it works perfectly.

Now, if I turn on the kitchen light:

My Assist satellite will announce: “Yo bro, do you want to turn off the kitchen light?” (It has Snoop Dogg’s personality).
When I respond “yes”, without any apparent context, it will trigger the Question_ask intent, which will execute the correct action taking the context from conversation_id: question_ask

To start a conversation with other conditions, just create onother Automation with your condition.

will35 · March 12, 2025, 12:49pm

hello

you must say the wakeword before respond yes or no ? (assist_satellite.start_conversation is still not available for satellite)

Musca · March 12, 2025, 12:54pm

Yes, you must say the wake world to wake up.
And bexouse of start_conversation is not available I used conversation.process

kitefan · May 1, 2025, 8:44pm

Basic question - does the intent script go into a new action or is this part of the same automation as the first step? I’m not clear where the intent should go.

wmaker · May 4, 2025, 9:11pm

I’ve been looking at this for a while, and I’m not sure, but I think there is a custom sentence intent that is missing in the write-up. I suspect that a custom sentence intent is needed with the name “Question_ask” and this intent is matching on words: “yes,” “ok,” “alright". If there is a match, then HA should be calling the intent_script that has the name “Question_ask” (which is shown above).