LLM Vision: Let Home Assistant see!

Not sure if this is the right place or not but I am hoping someone might be able to provide some guidance. I have a script using LLM Vision that appears to run run correctly but I can’t get the reponse to return and be spoken by my voice satellite. If anyone has any ideas please let me know.

Here is my script code:

sequence:
  - action: camera.snapshot
    metadata: {}
    data:
      filename: /config/www/tmp/snapshot-backyard.jpg
    target:
      entity_id: camera.reolink_bakyard_camera_fluent
  - delay:
      hours: 0
      minutes: 0
      seconds: 4
      milliseconds: 0
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      remember: false
      include_filename: false
      target_width: 1280
      max_tokens: 100
      temperature: 0.2
      generate_title: false
      expose_images: false
      expose_images_persist: false
      provider: 01JMJ53RZV1AA9RZJ194FFN7E9
      model: gpt-4o-mini
      message: >-
        Analyze the backyard camera snapshot. Report anything of interest like
        packages, cars, trucks, people. Also report the weather conditions and
        backyard status like ice, rain, snow etc. Be witty
      image_file: /config/www/tmp/snapshot-backyard.jpg
    response_variable: response
  - variables:
      response:
        query_backyard: "{{ response.response_text }}"
  - stop: ""
    response_variable: response
alias: Backyard Security
description: >-
  Check the backyard camera to determine what is happening outside. Also check
  the weather conditions.

Here is the trace details that it appears to be working but I just can’t get the voice response to work:

Executed: February 21, 2025 at 10:02:43 AM
Result:
params:
  domain: llmvision
  service: image_analyzer
  service_data:
    remember: false
    include_filename: false
    target_width: 1280
    max_tokens: 100
    temperature: 0.2
    generate_title: false
    expose_images: false
    expose_images_persist: false
    provider: 01JMJ53RZV1AA9RZJ194FFN7E9
    model: gpt-4o-mini
    message: >-
      Analyze the backyard camera snapshot. Report anything of interest like
      packages, cars, trucks, people. Also report the weather conditions and
      driveway status like ice, rain, snow etc. Be witty
    image_file: /config/www/tmp/snapshot-backyard.jpg
  target: {}
running_script: false
response:
  response_text: >-
    In this snapshot from the backyard camera, it looks like winter has decided
    to throw a little party! The ground is covered in a blanket of snow, giving
    the scene a serene, albeit chilly, vibe. 


    As for the backyard, it appears to be quite the winter wonderland—no cars,
    trucks, or surprise packages in sight. Just a couple of empty chairs and a
    table, likely longing for warmer days and a good barbecue. 


    The weather is crisp and clear, with bright sunlight illuminating

Hi, I’m requesting support for Tuya Smart video. I have a video intercom connected to Tuya. LLM Vision sees the camera, but when testing with AI, the AI ​​can’t see through the camera video giving this error.
The llmvision.stream_analyzer action could not be performed. No image input provided
I think you need some permission to be able to review the video from the video intercom
I can see the video intercom from Home Assistant without any problems.

Two things: 1. You don’t have to create a snapshot yourself; image_analyzer does that automatically if you pass the camera entity as image_entity. 2. I can’t see where you call a tts action. You response variable is correct but you need to pass this to a tts entity. For example:

action: tts.speak
data:
  media_player_entity_id: media_player.example_speaker
  message: "{{response.response_variable}}"
target:
  entity_id: tts.home_assistant_cloud

See also the ‘Image to TTS’ example in the gallery:

Thanks for responding. I seem to have it working by using an automation but could never get it working via a script. That makes total sense to have the tts.speak action. I was trying to follow an example someone else had provided but could never get the response. I’ll try this out as well.

Just wondering though, or maybe it doesn’t really matter but is it better or more efficient to have it done through a script or automation. Seems like it might not matter.

The difference between scripts and automations is that automations can have one (or multiple triggers) while scripts have to be called by the user (or by another automation).

In your case an automation would make more sense, as you would probably want to announce automatically when a person is detected.

I’m not sure why I don’t get any pictures anymore in my notifications. I only get the text in my notifications. Any clue?

alias: AI deurbel Frigate
description: ""
use_blueprint:
  path: valentinfrlch/event_summary.yaml
  input:
    mode: Frigate
    notify_device:
      - f5fc620b4132f43fde52fcfc86def4ae
      - 93870dccbfbc2d6bf7d381d0f2714284
    cooldown: 5
    provider: 01JBPTWC6XXGZM2E1JNQFZW2GH
    camera_entities:
      - camera.camera
    frigate_url: 192.168.1.118:5000
    important: false
    motion_sensors:
      - binary_sensor.reolink_video_doorbell_wifi_persoon
    message: >-
      Ga in één zin samenvatten met maximaal 10 woorden. Beschrijf de scène
      niet! Als er een persoon is, beschrijf dan wat diegene doet en zorg dat je
      diegene op een grappige manier kritiek geeft.
    duration: 5
    remember: true
    preview_mode: Snapshot
    max_frames: 3
    temperature: 0.5
    max_tokens: 20
    input_mode: Frigate
    required_zones:
      - Oprit
      - Weg
    object_type:
      - person
    model: gemini-2.0-flash

How can I setup the Memory feature? Memory | LLM Vision | Getting Started

1 Like

Hi @MicNeo !

Just add a new LLM integration and choose Memory in the list

Its not showing in my list either - In fact I don’t have Timeline OR Memory…

I’ve been trying to add it for the last hour. Looks like there may be some kind of issue. (Yes, I updated LLM Vision yesterday because I saw it was a feature of the latest builds)

@valentinfrlch known issue?

It is in beta (v1.4.0). I have now marked it as such in the docs.

2 Likes

Ahhhh yeah that was not apparent at all in the docs. ETA? (read am I going to blow anything up running this beta, heck with it - the ability to edit the motion detected subject line alone is worth it ; ) )

Thank you! I can’t wait for that feature :slight_smile:

There will be a second beta by the end of this week. The first one is already out and as far as I can tell there aren’t any major bugs currently. Check the discussion if you’re interested in the beta!

1 Like

I currently have Frigate events being ran though my NodeRed setup thus I’m not looking to manage all of my Frigate event via LLM Vision. However what I would like to do is enrich some of my existing NodeRed events by calling LLM Vision service with a url to an image either relative or full uri and have the LLM vision service handle the base64 encoding and sending the request to the vision service as usual.

I’m not sure if this is currently possible. So here are some examples.

action: llmvision.image_analyzer
data:
  include_filename: false
  target_width: 1280
  max_tokens: 100
  temperature: 0.2
  provider: [LOCAL_OLLAM_UID]
  model: llama3.2-vision:11b
  message: Is there a package
  image_url: >-
    /api/frigate/notifications/1740403924.78291-7cotr08/snapshot.jpg
  #Relative could be nice and add the homeassistant http(s)//(host):(port)

action: llmvision.image_analyzer
data:
  include_filename: false
  target_width: 1280
  max_tokens: 100
  temperature: 0.2
  provider: [LOCAL_OLLAM_UID]
  model: llama3.2-vision:11b
  message: Is there a package
  image_url: >-
    https://[host]:[port]/api/frigate/notifications/1740403924.78291-7cotr08/snapshot.jpg
#Or just always expect the user to give the full uri
  

I really appreciate the work you have done.

Cheers,

I am getting this error on 1.3.8

I am having this issue, but when I went to the URL like you said, the clip loads fine.

ok it is no longer giving me that error, but I am not getting:

Error: unexpected server status: llm server loading model

I think it has to do with json encoding the response, I am using llava:7b

I saw this issue when searching:

how do I specify a local ollama provider?

hello!

I see this has been added to 1.4.0, awesome!

Added a run_condition: You can now add custom conditions to the blueprint. All conditions have to be true in order for the blueprint to run. Use this to only run the automation when no one is home or when the alarm is enabled.

How do I actually use this now? I don’t see the option on the blueprint (v 1.3.8)?

thanks!

EDIT: Oh, nvm, I see that I need the beta blueprint.

LLM Vision 1.4.0: Major Update with Exciting New Features

LLM Vision 1.4.0 is here with major improvements, new features, but also some breaking changes. This update enhances usability, adds powerful memory capabilities, and improves event tracking with the new Timeline and Timeline Card. Read on to see what’s new!

:rotating_light: Important Updates & Breaking Changes

Major Restructure & Frigate Compatibility

The blueprint has been significantly restructured, meaning you’ll need to set up some parts again. Frigate mode has been removed due to recurring issues when fetching clips. The blueprint remains compatible using Frigate’s camera and binary_sensor (”occupancy”) entities.

Deprecated Features

  • input_mode has been removed
  • **detail**has been removed. Use target_width to control the resolution of images.
  • expose_images_persist has been deprecated. Images are now saved in /www/llmvision/<uid>-<key_frame>.jpg. The path is returned in response as key_frame.

:sparkles: New Features & Enhancements

:framed_picture: Timeline Replaces Event Calendar

Introducing the LLM Vision Timeline Card—a new dashboard card to display events. Install it now: GitHub

:brain: Memory: Store & Use Reference Images

  • Save reference images with descriptions to provide additional context to the model for more accurate results.
  • Syncs across all providers (except Groq).
  • Use Memory Toggle: Control memory usage in all actions.

Disclaimer: AI may make mistakes—do not use for critical security tasks!

:memo: Customizable Prompts

  • System Prompt: Improves accuracy and applies to all providers.
  • Title Prompt: Customize the prompt for event title generation.

:fire: New Actions

  • remember: Manually store custom events in the Timeline.
  • cleanup: Remove unused images from /www/llmvision

:gear: Blueprint Enhancements

  • Run Condition: Define conditions (e.g., only run when no one is home or the alarm is enabled).
  • Notification Delivery Options:
    • Dynamic: Sends an initial live camera preview notification, then silently updates with the summary. (Live Preview not compatible with Android)
    • Consolidated: Sends a single notification when the summary is ready.
  • Use Memory: Enables access to stored information.

:pushpin: How to Update

  • Re-import the updated blueprint.
  • Install the new LLM Vision Timeline Card to use the updated Timeline.

This update brings powerful improvements that enhance usability and performance. Read the full release notes here, and check out the website as well.

Happy automating!

4 Likes