Passing variables or response data to Conversation Process LLM

I’ve been struggling for a couple of weeks in getting what I feel should be pretty simple to work properly. I am trying to create tools for LLM based assist agents using scripts. I understand how to create and expose scripts, and I understand how to use fields to give interactability to the assist agent. What I am trying to figure out is how to give the LLM assist agent the ability to use scripts to fetch and read external information.

For example I am using the LLM Vision integration to analyze weather radar animations and radar simulation model guidance. Ideally I want to take a response from this integration, and pass it to a current conversation process I have going with ChatGPT based assist agent.

I want to be able to ask my assistant things like:

What does the current weather radar look like?
Will it rain later today?

My latest attempt was to try and push the LLM Vision response to set_conversation_response action which I hoped would expose the response to the currently invoked assist conversation agent, but this does not seem to be working.

Here is the exposed script YAML and also latest trace. I can see the LLM agent is calling the script when prompted, it’s just not getting any feedback from the LLM Vision response. Thanks in advance for anyone who has any ideas on how to achieve this!

alias: Analyze NAM-HIRES Weather Radar Simulation Model Guidance
sequence:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      provider: OpenAI
      model: gpt-4o
      include_filename: false
      detail: high
      max_tokens: 300
      temperature: 0.6
      image_file: |-
        /config/www/weather_radar/nam/2024090706-nam-003.gif
        /config/www/weather_radar/nam/2024090706-nam-006.gif
        /config/www/weather_radar/nam/2024090706-nam-009.gif
        /config/www/weather_radar/nam/2024090706-nam-012.gif
        /config/www/weather_radar/nam/2024090706-nam-015.gif
        /config/www/weather_radar/nam/2024090706-nam-018.gif
        /config/www/weather_radar/nam/2024090706-nam-021.gif
      message: >-
        The sequence of images attached is the current NAM-HIRES precipitation
        simulation model guidance for North East US.

        You should see EDT time in the top right corner, that is our time zone.
        The current date and time is {{ now().strftime('%B %-d, %Y, %-I:%M %p') }}


        Each consecutive image represents a specific hour and the projected
        preciptation across the country. We are located in central New Jersey so
        focus your response to this region.


        Can you tell me if I should expect any precipitation in the near future?
        Do not describe each image, consider the image frames of an animation or
        video sequence forecasting the next several hours. Concisely summarize
        the conditions I should be expecting.
    response_variable: nam_hires
    alias: >-
      Analyze radar simulation sequence of images using LLM Vision Image
      Analyzer and the latest OpenAI ChatGPT multi-modal model.
  - set_conversation_response: "{{nam_hires.response_text}}"
    alias: Precipitation forecast
description: >-
  Analyze latest NAM model guidance animation and report back projected
  precipitation to be expected over the next several hours. This is good for
  answering questions about near-term weather and precipitation expectations.

I had the same problem. it seems as of now script return variables are not available to the LLM.

a workaround that works for me is to add an intent script, which calls the script and passes the script return value into the speech data.

you can find it here Exposing HA Scripts to Assist API: Questions on Script Results Access by LLMs - #3 by super-qua

1 Like

Maybe the ‘remember’ feature added in v1.3 helps. It exposes responses as calendar events.

1 Like

i want to do something similar but as to ask Assist about certain camera

There may be other ways to do this but I would approach it similarly to how I approached the weather radar script. Use LLM Vision to analyze the snapshot or video clip and then pass the written analysis to your agent for contextual summarization or whatever you want to do with the feedback.

here is my script but it is very bruteforce but it works!
problem is i need a more natural way to talk to the agent and the agent should be smart enought to understand when it needs to pull from the cameras.

the below script will be useful for some people. enjoy

alias: Smart Camera Ask
description: >
  Provide an overview of home cameras when prompted by Assist, using wildcard
  triggers and a multi-step user response.
triggers:
  - command:
      - check {extra} camera
      - check {extra} cameras
      - check {extra} camera {extra2}
      - check {extra} cameras {extra2}
      - "{extra} check {extra2} camera"
      - "{extra} check {extra2} cameras"
      - "{extra} check {extra2} camera {extra3}"
      - "{extra} check {extra2} cameras {extra3}"
      - check cameras {extra}
      - check camera {extra}
      - "{extra} check cameras {extra2}"
      - "{extra} check camera {extra2}"
      - check and see {extra}
      - "{extra} camera outside"
      - "{extra} cameras outside"
      - "{extra} camera outside {extra2}"
      - "{extra} cameras outside {extra2}"
      - "{extra} camera {extra2} outside {extra3}"
      - "{extra} cameras {extra2} outside {extra3}"
      - "{extra} check and see {extra2}"
      - "{extra} any {extra2} package"
      - "{extra} any {extra2} package {extra3}"
      - "{extra} a package {extra2}"
      - "{extra} any {extra2} packages"
      - "{extra} any {extra2} packages {extra3}"
      - "{extra} check {extra2} packages {extra3}"
      - "{extra} check {extra2} package {extra3}"
      - check {extra} package {extra2}
      - check {extra} packages {extra2}
      - do you see {extra}
      - "{extra1} do you see {extra2}"
      - any {extra} outside
      - "{extra} see {extra2} camera"
      - "{extra} any {extra2} outside"
      - "{extra} anyone {extra2} outside"
      - "{extra} anyone outside"
      - "{extra} anyone outside {extra2}"
      - "{extra} anyone {extra2} outside {extra3}"
      - "{extra} anything outside"
      - "{extra} anything outside {extra2}"
      - any {extra} driveway
      - "{extra} any {extra2} driveway"
      - "{extra} any {extra2} driveway{extra3}"
      - "{extra} my driveway"
      - "{extra} have {extra2} package"
      - "{extra} parked {extra2} outside"
      - "{extra} parked {extra2} outside {extra3}"
      - "{extra} parked outside"
      - "{extra} parked outside {extra2}"
    trigger: conversation
actions:
  - set_conversation_response: This will take a minute...
    enabled: true
  - variables:
      user_input: "{{ trigger.sentence | lower | default('') }}"
      matched_cameras: >-
        {# Define synonyms for different camera areas #} {% set synonyms = {
          'front': ['doorbell', 'front', 'door'],
          'porch': ['porch', 'back porch'],
          'garage': ['garage', 'driveway', 'trash', 'bin'],
          'side_yard': ['side yard', 'yard', 'front yard', ' side'],
          'packages': ['package', 'delivery', 'deliveries', 'parcel', 'fedex', 'ups', 'amazon', 'mail'],
          'car': ['car', 'truck', 'vehicle', 'park'],
          'suspicious': ['someone', 'person', 'suspicious', 'intruder', 'lurking', 'trespasser']
        } %}

        {# Initialize a namespace for cams #} {% set ns = namespace(cams=[]) %}

        {# Define all available cameras #} {% set all_cameras = [
          'camera.front_door_camera_mainstream',
          'camera.back_porch_camera_mainstream',
          'camera.garage_left_mainstream',
          'camera.garage_right_mainstream',
          'camera.front_yard_camera_mainstream'
        ] %}

        {# Check if "all cameras" is mentioned #} {% if 'all cameras' in
        user_input or ('all' in user_input and 'cameras' in user_input) %}
          {% set ns.cams = all_cameras %}
        {% else %}
          {# Iterate through each category and its synonyms #}
          {% for key, synonyms_list in synonyms.items() %}
            {% for synonym in synonyms_list %}
              {% if synonym in user_input %}
                {% if key == 'front' %}
                  {% set ns.cams = ns.cams + ['camera.front_door_camera_mainstream'] %}
                {% elif key == 'porch' %}
                  {% set ns.cams = ns.cams + ['camera.back_porch_camera_mainstream'] %}
                {% elif key == 'garage' %}
                  {% set ns.cams = ns.cams + ['camera.garage_left_mainstream', 'camera.garage_right_mainstream'] %}
                {% elif key == 'side_yard' %}
                  {% set ns.cams = ns.cams + ['camera.front_yard_camera_mainstream'] %}
                {% elif key == 'packages' %}
                  {% set ns.cams = ns.cams + ['camera.front_door_camera_mainstream', 'camera.garage_left_mainstream', 'camera.garage_right_mainstream'] %}
                {% elif key == 'car' %}
                  {% set ns.cams = ns.cams + ['camera.garage_left_mainstream', 'camera.garage_right_mainstream'] %}
                {% elif key == 'suspicious' %}
                  {% set ns.cams = ns.cams + all_cameras %}
                {% endif %}
                {# Stop checking synonyms once a match is found for the current category #}
                {% break %}
              {% endif %}
            {% endfor %}
          {% endfor %}

          {# If no specific cameras matched, default to all cameras #}
          {% if ns.cams | length == 0 %}
            {% set ns.cams = all_cameras %}
          {% endif %}
        {% endif %}

        {# Ensure the list of cameras is unique and properly formatted #} {{
        ns.cams | unique | list }}
      message: |-
        {% if 'quick' in user_input %}
          Analyze the images from this CCTV footage around a house and answer the user input in details.
          Only if there is something out of the ordinary, point it out, otherwise focus only on answering the user input.
          Ignore any neighboring homes and focus on areas around the door and white garage belonging to the house of the CCTV footage.
          If you need to refer to a certain camera, use friendly name and not its id and drop the keyword mainStream.
          User input: "{{ user_input }}"
        {% else %}
          Analyze the images from this CCTV footage around a house and answer the user input.
          User input: "{{ user_input }}".
          Use humor, melancholy tone and some sarcasm with your answers.
          If you are asked about a certain thing, you need to go in excruciating details for that thing only.
          Respond in plain text without any asterisk.
          Ignore any neighboring homes and focus on areas around the door and white garage belonging to the house of the CCTV footage.
          If you need to refer to a certain camera, use friendly name and not its id and drop the keyword mainStream.
        {% endif %}
    enabled: true
  - action: llmvision.image_analyzer
    data:
      provider: 01JFM798JHK3EEEDM7M40VXNDS
      model: gemini-1.5-pro
      remember: true
      include_filename: true
      max_tokens: 150
      temperature: 0.3
      image_entity: "{{ matched_cameras }}"
      message: "{{ message }}"
    response_variable: response
    enabled: true
  - set_conversation_response: "{{ response.response_text }}"
    enabled: true
mode: restart