LLM Vision: Let Home Assistant see!

customstudioyandexru · March 2, 2025, 9:44am

Anyway, for some reason it doesn’t work for me. (

haus · March 2, 2025, 9:36pm

Edit 3: Got it working by using the image_entity, I’ll post details at the bottom. I think I’d still like to get it working with image_file, but this will suffice for now.

Edit 2: It seems like images work when I use image_entity in the YAML, but not when I use image_file. I think if I can figure out how to get from a trigger entity_id to an image_entity (camera name) then I can make this work.

This update looks awesome! Is there a way to diagnose why images aren’t showing up in the timeline card? Events are being added to the calendar and they’re showing up with text descriptions in the timeline card, but I’m not getting the keyframe image.

When I look at the script timeline under changed variables I see:

generated_content:
  response_text: No obvious motion detected.
  key_frame: /config/www/llmvision/46c782f3-0.jpg

That image exists inside that path (I can view it from the file plugin editor), but it’s not showing in the card.

I did notice the documentation says " expose_image" but the YAML configuration offers expose_images. I tried both and only expose_images seems to show the key_frame in the generated_content variable, so I don’t know if that matters.

Edit: I set up debug logging and I don’t see any errors in the logs.

To get this working with image_entity, I needed to get the name of the camera sub entity from the device ID of the automation trigger. I’m using this to pass the “sub_entity” name to the script:

sub_entity: >-
      {{ device_entities(device_id(trigger.entity_id)) | select('match',
      '^camera.*sub$') | list | join('') }}

That results in something like “camera.living_room_camera_sub” depending on the trigger. Then in the script, I use:

 image_entity:
    - "{{ sub_entity }}"

And now my images are showing up in the timeline card.

valentinfrlch · March 3, 2025, 7:10am

Thanks for the detailed post!
This is a bug - it should work with image_file as well. expose_image is a typo in the docs. The correct parameter name is expose_images.

Greminn · March 5, 2025, 2:32am

Such an awesome update @valentinfrlch! I am having a play with the integration like this, but for some reason cant see anything in the timeline entity. Am I missing something?

alias: LLM Testing
description: ""
triggers: []
conditions: []
actions:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      include_filename: false
      max_tokens: 100
      temperature: 0.2
      provider: BLABLABLAETC
      image_entity:
        - camera.garden_camera
      message: Describe the image.
      remember: true
      expose_images: true
    response_variable: response
  - action: persistent_notification.create
    metadata: {}
    data:
      title: Front Camera Motion
      message: "{{response.response_text}}"
mode: single

kastein · March 5, 2025, 5:13pm

@valentinfrlch Loving the updates you recently made to LLM Vision! Especially the new timeline card that will show the images next to the response.

I have a couple of questions / feature requests though. Do you have a place we should submit them? Adding below in case this is the best place.

Allow you to select which timeline entity to use. This is helpful if you have different types of events you want to remember/capture (i.e. security camera motion events, vs road cam data analyzer events).
Timeline Card - Ability to filter what events are displayed based on the entity ID associated with the event (or even the name or something similar).
Ability to Remember Data Analyzer events. This is essential when you want to then use assist to ask about those data analysis operations later on. (Understood that you can use the Remember action here but it’s a few extra steps to get that data into the remember call)

Thanks again for all your work on this! It’s been invaluable in so many ways!

NathanCu · March 5, 2025, 6:13pm

there is a git repository for the project - submit an issue, of type: feature request (You’ll see the option when you click new issue)

here:
Issues · valentinfrlch/ha-llmvision

valentinfrlch · March 6, 2025, 3:41pm

Only thing I can see is that there are no triggers. If it still doesn’t work, please create an issue here, and attach some logs/trace of the automation.

Greminn · March 6, 2025, 9:43pm

Yea - just manually triggering the automation whilst having a play/testing. Have submitted a bug report. Thanks!

deputyd0ng · March 7, 2025, 1:18pm

I have this up and running VERY well now and love it. Thank you for an awesome integration.

Honestly, the key for me was not using the blueprint (although I haven’t tried the updated one)

Using the integration in a custom automation is doing exactly what I want. I just have a question.

When testing prompts in openwebui, I tend to get better responses, but when I pass the same photo and prompt from HA, the responses are good, but not as good as when prompting directly.

I noticed in openwebui, the default temperature is 0.8 I adjusted this in my automation, and it is a little better.

Just wasn’t sure if there were some other settings or adjustments I should look into?
Absolutely love this integration.

I use frigate and ollama on unraid with a GTX1080SC 8GB and the prompts fire off quick. (Had to add environmental variable ollama_keep_alive 24h)

For anyone else here is a simple automation you can use if having trouble with blueprint, maybe this can help someone else going forward:

I have been experimenting with playing the description over TTS in my office, if I am in the office when the detection happens. Unfortunately, I have not even touched voice assistants in HA yet to improve upon this.

alias: AI Image Description Front Door -Sean
description: ""
triggers:
  - trigger: state
    entity_id:
      - image.doorbell_camera_person
    to: null
conditions: []
actions:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      include_filename: true
      target_width: 1280
      max_tokens: 100
      temperature: 0.8
      generate_title: true
      expose_images: true
      model: llava:7b
      message: >-
        PROMPT
      provider: Select your provider
      remember: true
      image_entity:
        - image.doorbell_camera_person
    response_variable: description
  - action: notify.mobile_app_iphone15
    data:
      data:
        image: /api/image_proxy/image.doorbell_camera_person
      title: "{{ description.title }}"
      message: "{{ description.response_text }}"
  - condition: state
    entity_id: input_boolean.office_occupancy
    state: "on"
    enabled: false
  - action: tts.google_translate_say
    metadata: {}
    data:
      entity_id: media_player.rk3326
      message: " {{ description.response_text }} "
    enabled: false
mode: single

Jovink · March 8, 2025, 9:44am

I use the LLM Vison card, I see the response but no image. Does it something to do with my settings in the automation?

danwie · March 8, 2025, 1:50pm

I have a new TAPO C200 camera. I’ve set up a automation that triggers when the state of “cell motion detection” becomes “on”.

I want to use the stream or picture from the camera (don’t know what’s best - please recommend) together with LLM Vision. Does anyone have an example of how to do this? I can only find examples that are using a button as a trigger.

valentinfrlch · March 8, 2025, 7:58pm

You could modify the system prompt. You can change it in Memory settings (Memory provider needs to be set up).

The blueprint has been completely rewritten, as there have been lots of problems with frigate clips specifically. From what I can tell, it is pretty stable now, a lot faster, and also very customizable with the new run_conditions.

@Jovink There is a known bug for image_file inputs. It should work for entities.

@danwie Check the blueprint. I think it does exactly what you want.

Dreamoffice · March 9, 2025, 7:53am

is it possible to do this with a script after every reboot?

what is the problem that the text description is cut off?

pbox · March 10, 2025, 12:26pm

probably increase max_tokens

pbox · March 10, 2025, 12:31pm

Quick question. It’ been a week or 2 that my prompt is no longer working as expected.
whenever llmvision is triggered the answer I get starts with my question being rephrased like “of course, here’s a description of the people in the shot : tall man standing near blablabla” and I want only the part with “tall man standing near blablabla”. LLM used : gemini

maglat · March 14, 2025, 7:42am

Right now I test Gemma 3 with home assistant. I configured it as assistant and pointed LLM Vision stuff to the same model. What is curios, when I us the assist function of HA, Ollama load up Gemma 3 and answer my request. When I initiate a LLM Vision request, Ollama unload the model and load it again. So HA and LLM vision always play ping pong by loading and unloading the same model, which makes the response time very slow. Also LLM vision heavily needs to pass the keep alive variable. HA I can set it to -1 (Forever) but when I initate a LLM request, the keep alive is set to default (5min). Yes I could set it in Ollama settings but I do not want to keep all my models loaded all the time.

Bobbik_the_Pit_Droid · March 14, 2025, 8:48am

Sorry for lame question, I made Groq setup, but where provider ID comees from?.I can’t find it.
Integration LLM entry Groq has no devices or entities

Waldoocs · March 15, 2025, 12:10pm

Is Danish support coming for the UI and Icons?

Jeroen_Brouwers · March 16, 2025, 8:28am

@valentinfrlch ,

Im trying to use a example script from yours, but i get a template error.
the script:

alias: Kenteken check
sequence:
  - service: llmvision.image_analyzer
    data:
      max_tokens: 100
      provider: 01JP73JVPR5YRERKGM6RKVR9FM
      image_file: /config/www/snapshot_oprit/oprit.jpg
      model: gpt-4o
      target_width: 512
      temperature: 0.5
      detail: low
      include_filename: false
      message: >-
        Please check if there is a car in the driveway with the license plate "JX-820-Z" and respond with a
        JSON object. The JSON object should have a single key,
        "car_in_driveway," which should be set to true if - and only  if - there is a car with the license number provided above in the
        driveway and false otherwise.
    response_variable: response
  - choose:
      - conditions:
          - condition: template
            value_template: >-
              {{ response.response_text |
              from_json.car_in_driveway }}
            enabled: true
        sequence:
          - service: input_boolean.turn_on
            target:
              entity_id: input_boolean.car_in_driveway
            data: {}
    default:
      - service: input_boolean.turn_off
        target:
          entity_id: input_boolean.car_in_driveway
        data: {}
    enabled: true
mode: single

The error that i get:
Message malformed: invalid template (TemplateAssertionError: No filter named ‘from_json.car_in_driveway’.) for dictionary value @ data[‘sequence’][1][‘choose’][0][‘conditions’][0][‘value_template’]

Do the error i cant even save the script, i copied and past the code, so a typo sould not be in there.

tekno-yanqui · March 17, 2025, 5:11am

Doesn’t work. Don’t know if it’s a bug. The trace stops when it complains about a “timeline.” I can see the calls getting to google cloud but have no idea what that message means. There’s nothing meaningful in HA documentation about a “timeline” the script is complaining about.

This is what google is registering around the same time.