Home Assistant Voice PE (AI Agent): Start Conversations with conversation.process and conversation_id

First of all, I want to thank @NathanCu for the incredible explanatory work in the thread Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools - #8 by NathanCu, where I was able to find and fully understand how to make Assist work with Google Generative AI as a agent generating the corrects prompts.

My goal:

My objective is to enable my satellite (Home Assistant Voice PE) to proactively initiate a conversation by posing a question and, based on my response, executing the appropriate actions accordingly.
For example, if I turn on a light during the day, the agent will ask me if I want to turn off that specific light. If I respond yes, it will proceed to turn it off.

First part: let’s create the question:

We need to create enough context for the model to ensure that when we respond positively, it can correctly understand what action it needs to perform.

Let’s create an automation that:

  • Triggers when I turn on the kitchen light.
  • Checks if it is daytime (sun after sunrise and before sunset).
  • Uses conversation.process to ask me: “Do you want to turn off the kitchen light?”
  • IMPORTANT: To provide the initial context, we will use conversation_id: question_ask.
  • The question will be announced on the designated satellite.
alias: Asking for kitchen light
  description: ''
  triggers:
  - entity_id:
    - light.kitchen
    to: 'on'
    trigger: state
  conditions:
  - condition: sun
    before: sunset
    after: sunrise
  actions:
  - alias: generate question with conversation.precess
    action: conversation.process
    metadata: {}
    data:
      text: 'Ask me without perform any actions unless I respond: Do you want to turn off the kitchen light? Do not perform any actions unless I respond positively. Example: 'Yo brother, do you want to turn off the kitchen light?'"
      agent_id: conversation.google_generative_ai
      conversation_id: question_ask
    response_variable: action_response
  - alias: Formatta la risposta
    variables:
      message: "{% if action_response and action_response.response and action_response.response.speech
        \n    and action_response.response.speech.plain and action_response.response.speech.plain.speech
        %}\n  {{ action_response.response.speech.plain.speech }}\n{% else %}\n  Non
        ho ricevuto una risposta chiara, ma il comando è stato eseguito.\n{% endif
        %}\n"
  - alias: annouce the question
    action: assist_satellite.announce
    metadata: {}
    data:
      message: '{{ message }}'
    target:
      device_id: <<YOUR DEVICE ID>>
  mode: single

Second part: correct action by the model based on our answer

Now we need to ensure that if we respond with “yes,” “ok,” “alright,” etc., without an apparent context, the model can correctly identify the appropriate conversation_id: to execute the correct action if necessary.

Let’s create an intent script that, when we respond positively without apparent context, make the action conversation.process with:

  • text: “yes”
  • conversation_id set to the same one used in the automation (I used “question_ask”)
Question_ask:
    description: >
       # This intent handles generic affirmative responses such as "yes," "ok," "alright," "exactly"  
       # when they are not directly linked to a clear context.  
       #  
       # Functionality:  
       #   - If the user says "yes" without explicitly referring to a previous request,  
       #     a new `conversation.process` is automatically triggered with "yes" as the command.  
       #   - If the "yes" is part of an already structured conversation, the LLM follows the natural flow.  
       #  
       # Output:  
       #   - If the user's confirmation is recognized as independent, the system triggers:  
       #       conversation.process('text': 'yes', conversation_id: 'question_ask')  
       #   - The response generated by the conversation process is returned and announced by Assist.  
       #   - If the model does not generate a clear output, a predefined message is returned.  
       #  
       # Best Practices:  
       #   - For questions requiring confirmation, wait for an affirmative response before executing actions.  
       #   - If the context is unclear, treat the confirmation as generic and allow the system  
       #     to determine whether further clarification is needed.  
       #   - The LLM should always maintain the natural flow of conversation without asking  
       #     for unnecessary confirmations again.  
  
    action:
      - action: conversation.process
        metadata: {}
        data:
          agent_id: conversation.google_generative_ai
          conversation_id: question_ask
          text: "yes"
        response_variable: action_response  
      - stop: ""
        response_variable: action_response  
    speech:
      text: >
        {%- if action_response and action_response.response and action_response.response.speech and action_response.response.speech.plain and action_response.response.speech.plain.speech %}
          {{ action_response.response.speech.plain.speech }}
        {%- else %}
          Ok, done.
        {%- endif %}

In my case, it works perfectly.

Now, if I turn on the kitchen light:

  • My Assist satellite will announce: “Yo bro, do you want to turn off the kitchen light?” (It has Snoop Dogg’s personality).
  • When I respond “yes”, without any apparent context, it will trigger the Question_ask intent, which will execute the correct action taking the context from conversation_id: question_ask

To start a conversation with other conditions, just create onother Automation with your condition.

4 Likes

hello

you must say the wakeword before respond yes or no ? (assist_satellite.start_conversation is still not available for satellite)

1 Like

Yes, you must say the wake world to wake up.
And bexouse of start_conversation is not available I used conversation.process

Basic question - does the intent script go into a new action or is this part of the same automation as the first step? I’m not clear where the intent should go.

I’ve been looking at this for a while, and I’m not sure, but I think there is a custom sentence intent that is missing in the write-up. I suspect that a custom sentence intent is needed with the name “Question_ask” and this intent is matching on words: “yes,” “ok,” “alright". If there is a match, then HA should be calling the intent_script that has the name “Question_ask” (which is shown above).

No you don’t need a custom sentence if you use Gemini as agent. You need it using home assistant agent.
If you set a good description of the intent script and in the system prompt you describe how to use it, it will work
I’m using Gemini and no custom sentence set

1 Like

Let me ask, I don’t see an option to expose an intent script, so how does Gemini see it? … does HA automatically expose intent scripts?

All intent scripts are loaded by config yaml at startup

Hi Nathan,
For sure, (and I have used these myself) but in this case who is the user/caller of the intent script?
From what I have seen, the intent script is used with a sentence intent (particularly a custom sentence intent, and that’s how I’ve used them), so if a sentence intent is not being used here, its a mystery to me as to how the intent script is being called.

The llm reads the description of the intent script and the parameters and fills it it itself. There is no direct sentence match required. That’s the fundamental difference of assist v. An llm

In Friday’s Party I go into excruciating detail for building llm context.

OK!
But that gets back to my earlier question, how does the LLM see the intent script? For entities including regular scripts, one has to expose it to the LLM (often exposed by default). Since an intent script is not an entity, this led me to ask how does the LLM see the intent script …does HA expose all intent scripts under the covers?

Intent scripts are exposed by default

Ah…thanks :slight_smile:

You can create an AI gateway using Cloudflare to log the LLM conversation. The intents are submitted as tool use. Something like this:

{
  "messages": [
    {
      "role": "system",
      "content": "Current time is 14:42:29. Today's date is 2025-04-28.\nYou are a voice assistant for Home Assistant.\nAnswer questions about the world truthfully.\nAnswer in plain text. Keep it simple and to the point.\nWhen controlling Home Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff to unlock a lock. When controlling a device, prefer passing just name and domain. When controlling an area, prefer passing just area name and domain.\nWhen a user asks to turn on all devices of a specific type, ask user to specify an area, unless there is only one device of that type.\nThis device is not able to start timers.\nAn overview of the areas and the devices in this smart home:\n- names: Computer_Button\n  domain: input_button\n  state: '2025-04-28T06:06:15.441782+00:00'\n  areas: Bedroom\n- names: esp32-camera RF Gate Button\n  domain: button\n  state: '2025-04-26T11:38:03.126492+00:00'\n  areas: Living Room\n- names: esp8266-ir-3 On Panasonic\n  domain: switch\n  state: unavailable\n- names: esp8266-ir-3 On Aeha Panasonic\n  domain: switch\n  state: unavailable\n- names: Forecast Home\n  domain: weather\n  state: cloudy\n  attributes:\n    temperature: 31.0\n    temperature_unit: °C\n    humidity: '68'\n- names: Light_Switch\n  domain: switch\n  state: 'off'\n  areas: Bedroom\n  attributes:\n    device_class: outlet\n- names: Haos progress list\n  domain: todo\n  state: '3'\n- names: “WOL PC”\n  domain: switch\n  state: unavailable\n- names: Office AC\n  domain: climate\n  state: unavailable\n- names: Screen\n  domain: light\n  state: unavailable\n- names: jkm lx2 115\n  domain: media_player\n  state: unavailable\n"
    },
    {
      "role": "user",
      "content": "hi"
    }
  ],
  "model": "google/gemini-2.0-flash-exp:free",
  "max_tokens": 30000,
  "temperature": 1,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "HassTurnOn",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "awning",
                  "blind",
                  "curtain",
                  "damper",
                  "door",
                  "garage",
                  "gate",
                  "shade",
                  "shutter",
                  "window",
                  "water",
                  "gas",
                  "tv",
                  "speaker",
                  "receiver",
                  "outlet",
                  "switch"
                ]
              }
            }
          },
          "required": []
        },
        "description": "Turns on/opens a device or entity"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassTurnOff",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "awning",
                  "blind",
                  "curtain",
                  "damper",
                  "door",
                  "garage",
                  "gate",
                  "shade",
                  "shutter",
                  "window",
                  "water",
                  "gas",
                  "tv",
                  "speaker",
                  "receiver",
                  "outlet",
                  "switch"
                ]
              }
            }
          },
          "required": []
        },
        "description": "Turns off/closes a device or entity"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassCancelAllTimers",
        "parameters": {
          "type": "object",
          "properties": {
            "area": {
              "type": "string"
            }
          },
          "required": []
        },
        "description": "Cancels all timers"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassBroadcast",
        "parameters": {
          "type": "object",
          "properties": {
            "message": {
              "type": "string"
            }
          },
          "required": [
            "message"
          ]
        },
        "description": "Broadcast a message through the home"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassPressButton",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          },
          "required": []
        },
        "description": "Execute Home Assistant HassPressButton intent"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassLightSet",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "color": {
              "type": "string"
            },
            "temperature": {
              "type": "integer",
              "minimum": 0
            },
            "brightness": {
              "type": "integer",
              "minimum": 0,
              "maximum": 100
            }
          },
          "required": []
        },
        "description": "Sets the brightness or color of a light"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassMediaUnpause",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "media_player"
                ]
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "tv",
                  "speaker",
                  "receiver"
                ]
              }
            }
          },
          "required": []
        },
        "description": "Resumes a media player"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassMediaPause",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "media_player"
                ]
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "tv",
                  "speaker",
                  "receiver"
                ]
              }
            }
          },
          "required": []
        },
        "description": "Pauses a media player"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassMediaNext",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "media_player"
                ]
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "tv",
                  "speaker",
                  "receiver"
                ]
              }
            }
          },
          "required": []
        },
        "description": "Skips a media player to the next item"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassMediaPrevious",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "media_player"
                ]
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "tv",
                  "speaker",
                  "receiver"
                ]
              }
            }
          },
          "required": []
        },
        "description": "Replays the previous item for a media player"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassSetVolume",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "area": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            },
            "domain": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "media_player"
                ]
              }
            },
            "device_class": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "tv",
                  "speaker",
                  "receiver"
                ]
              }
            },
            "volume_level": {
              "type": "integer",
              "minimum": 0,
              "maximum": 100
            }
          },
          "required": [
            "volume_level"
          ]
        },
        "description": "Sets the volume of a media player"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassListAddItem",
        "parameters": {
          "type": "object",
          "properties": {
            "item": {
              "type": "string"
            },
            "name": {
              "type": "string"
            }
          },
          "required": [
            "item",
            "name"
          ]
        },
        "description": "Add item to a todo list"
      }
    },
    {
      "type": "function",
      "function": {
        "name": "HassClimateSetTemperature",
        "parameters": {
          "type": "object",
          "properties": {
            "temperature": {
              "type": "number"
            },
            "area": {
              "type": "string"
            },
            "name": {
              "type": "string"
            },
            "floor": {
              "type": "string"
            }
          },
          "required": [
            "temperature"
          ]
        },
        "description": "Sets the target temperature of a climate device or entity"
      }
    }
  ],:
  "top_p": 1,
  "user": "01JSXJ41MNP3HEH4GMBE"
}

You my create a custom intent other than the built-in one for your Voice Assist. For instance: “HassPressButton”.

  • append the configuration.yaml* as follow:
intent_script:
  HassPressButton:
    description: Press or Click a button entity
    speech:
      text: "Sure—pressing {{ domain[0] }} of {{ name }} now."
    slots:
      domain:
        type: domain
        domain:
          - input_button
          - button
      name:
        type: text
    action:
      - service: "{{ domain[0] }}.press"
      # this assume the target entity id follow convention of the name
        target:
          entity_id: "{{ domain[0] }}.{{ name | lower | replace(' ', '_') | replace('-', '_') }}"

Now you can instruct the Conversation Agent to press or click the buttons.

I’ve been using debug logs to see what is being sent to (in my case) OpenAI. When I first analyzed it a few months back, I didn’t really think about the IntentTool Functions being correlated with the built-in intents, but yes I now see that most of the built-in intent commands (along with its short “description” text) as well as my custom intent script command (along with its description text) is being sent.

1 Like