LLM Vision: Let Home Assistant see!

I’ve got some estimates and real world on that in the Friday party thread if you’re interested. Don’t want to veer off LLM Vision here in this thread.

When using Stream_analyzer, how do the parameters recording_duration and max_frames relate to the number of streams selected? For example, if I choose multiple streams, are each of the streams recorded in parallel for the recording_duration? And then how are the max_frames selected?

1 Like

I tried installing it today, but I get the following error:

even when I choose a different Release, e.g. 1.4.3, the same happens …anyone has a tip ?

Hi everyone,

I wanted to share a simple automation I’ve set up for those looking to get longer, more descriptive notifications via Telegram from AI analysis.

I still have a lot to refine or improve, but as a first step but it works using Telegram with longer descriptions than iOS or Android limitation. :slight_smile:

On another note, I have a couple of questions to see if they’re possible:

  1. Is there a way to retrieve recent images and make a request to the AI to analyze them all in a block, perhaps for a more global daily summary?
  2. Is there a way for me to trigger analysis only when there’s movement or a change in a specific area of a video (like a distant door of a house), and if so, to analyze the scene for several seconds afterward?
  3. Is it possible to send image+description out of HA to sort of RAG within N8N or Flowise so I can make more question about the historical data?

Here’s the automation code:

alias: AI Gemini-Analisis (using Telegram)
description: ""
triggers:
  - type: value
    device_id: your camera device
    entity_id: your person count entity
    domain: sensor
    trigger: device
    above: 0
conditions: []
actions:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      remember: true
      use_memory: true
      include_filename: true
      target_width: 1280
      max_tokens: 100
      generate_title: true
      expose_images: true
      provider: Select your provider
      image_entity:
        - camera.garden
      message: |2-
            message: >-
					YOUR PROMPT
    response_variable: geminiresponse
  - action: telegram_bot.send_photo
    metadata: {}
    data:
      file: "{{ geminiresponse.key_frame }}"
      caption: |-
        *{{ geminiresponse.title.strip() }}*
        {{ geminiresponse.response_text }}
      config_entry_id: your telegram entry id
    enabled: true
mode: single

Looking forward to your thoughts and suggestions!


1 Like

Thank you for creating and sharing this!

  1. When using the timeline, you can access the titles and descriptions for the 10 most recent events via the calendar entity. Check the HA developer tools > states to see the attribute names. Note: I am working on a “today summary” that would summarize all events from the current day. Not sure how soon it’ll be ready though.
  2. You’d have to find a way to compare two given frames and calculate the visual difference between them. LLM Vision does include this functionality to find the most relevant frames, but it can’t be used (yet) to decide whether to do an analysis in the first place. Sounds like a good idea for a future update, feel free to create a feature request on GitHub!
  3. Similar to (1), the calendar entity has the file path to each image (/www/llmvision/<id>.jpg). This image would be accessible outside of HA (if you know the image id). This way you can access it from n8n or similar.

Cheers, Valentin

1 Like

I had this working for awhile with my Ring MQTT setup but it’s been dead for awhile. Anybody have it working with Ring still?

Hi Valentin, first of kudos to the great work you are doing.

Since yesterday it is not working anymore and my suspicion related to the HA 2025.8.1 update. However I’ve removed everything LLM vision, configured everything as before and still not working.

Below is the information I’m getting in the System Protokoll and maybe you can help to resolve that as it was working flawlessly. I also change on the weekend from http to https with a self signed cert but that shouldn’t be the root cause as it was stopping yesterday morning after the HA25.8.1 update.

Logger: homeassistant.components.automation.ai_event_summary_v1_5_0
Quelle: components/automation/init.py:663
Integration: Automation (Dokumentation, Probleme)
Erstmals aufgetreten: 10:43:41 (1 Vorkommnis)
Zuletzt protokolliert: 10:43:41

Error rendering variables: UndefinedError: ‘dict object’ has no attribute ‘entity_id’

Logger: homeassistant.helpers.template
Quelle: helpers/template.py:2990
Erstmals aufgetreten: 10:43:41 (2 Vorkommnisse)
Zuletzt protokolliert: 10:43:41

Template variable warning: 'camera' is undefined when rendering '{{ camera }} Snapshot'
Template variable error: 'dict object' has no attribute 'entity_id' when rendering '{% if motion_sensors_list and not trigger.entity_id.startswith("camera") %} {% set index = motion_sensors_list.index(trigger.entity_id) %} {{ camera_entities_list[index] }} {% else %} {{ trigger.entity_id }} {% endif %}'

Cheers
efkay

Hi efkay, thanks for the kind words!
I think you’re right, the error logs suggest the issue is with the blueprint, not a certificate issue.
The v1.5.0 blueprint has some issues. Yours might be related to this: Blueprint v1.5.0 shows wrong camera · Issue #403 · valentinfrlch/ha-llmvision · GitHub.

Just a quick update for anyone who was having issues with Gemini after LLMV update to 1.5, and Google deprecating Gemini 1.5.

I tried moving to Gemini 2.0 after this, but not only would it quickly eat up the 200 RPD (1.5 never did this) with 10+ requests per event, but it would take like 3+ minutes after event trigger to actually get the notification. Tried a bunch of things, just couldn’t get it running like when I was using Gemini 1.5.

So I tried using GPT API with 4o mini, but this was also using a bunch of requests and adding up. And for something like this, I wasn’t trying to spend a bunch of money on using.

So then I’m like ok, let’s see what Groq is about. I didn’t realize at first that there is Groq and Grok, but anyway… Signed up for an account with Groq and got the API key in seconds. Loaded into LLMV integration and was up and running in less than a minute. Tested a bunch through the blueprint, and not only was it lightning fast (2 models, even the more accurate, less speed was super quick), it used 1, maybe 2 requests per event (1,000 daily vs. Googles 200 on free tier). and it followed prompts very well. Ao after a few days of use, really liking Groq a lot.

TL/DR: If you want to use this awesome integration, with free LLM access without having to set up locally… Groq is easily the best way to go.

1 Like

Thanks Valentin. Unfortunately this is also not working based on the fork. I did create both blueprints yours and the fork one and basically it is coming back equally the same. So something has been tremendously changed based on the new update I suppose.

I’d like to briefly share my experience with my setup. I’m using Frigate in Docker on a Lenovo ThinkCenter with Hailo8 hardware. Ollama runs on a Mac Mini M4. I’m using the qwen2.5vl:latest model. I’ve tried a lot of things, but I’m very surprised by how well this model can describe situations in the recordings. I’ll keep an eye on it for a while. Thanks for this great integration.

2 Likes

Hi. I’ve been using Gemini and it’s not been working for awhile. I’ve just tried Groq and I can’t get it working. Could you share your working code please?

@Dreamoffice On average, how long does it take for the qwen model on your Mac Mini M4 to return a response for each image?

Yea so I signed up on Groq and got the API. Popped it into LLMV integration. Tested the blueprint automations and noticed it didn’t work when I had the duration and max frames higher like I had previously. So I noticed when I set duration at 3, max frames at 2, it worked, same quality. I set Groq as provider and used:

meta-llama/llama-4-maverick-17b-128e-instruct as the model. It was a little unclear about the working models, because on LLMV’s providers page, it mentions 1 model, which doesn’t actually work. So did some digging and found the one I put above, works. There’s also:

meta-llama/llama-4-scout-17b-16e-instruct which is supposed to be a little faster, less accurate. I did not notice any speed difference between the 2, so I went with the more accurate maverick.

Hope this helps!

Thanks for that. It works when using the developer tool to trigger it so will see how it goes over the next few days.

it takes some minutes but i dont care, i do this only when i m not at home. and if i get this messages 2 minutes earlier it is not that important for me. i think it need 2-4 minutes till these 15 images are analysed. i get pretty soon after the event the snapshot analysis and then after few minutes the analysis of the clip

1 Like

Hi, I thought I had a similiar problem but honestly I have never gotten this to work at all despite following the instructions to the letter. I dont know what I am missing. I used Groq, and can confirm it’s getting the API call when I run the automation manually… it uses 11 tokens for input and approx 26 for output… but I see nothing on my end, timeline card shows nothing… I have ring cameras and can see live views in my camera cards… but this integration nothing seems to happen with this. I know I am missing something because I am somewhat of a novice user although I have added several automations and have configured basic YAML updates. Is there anything someone has to do thats not in the instructions? create variables? any other yaml updates or scripts that are not documented that someone HA fluent would know? Figured I would ask since your name was Nice… haha… also, I added the Blueprint, created the automation from the blueprint, etc.

I’d like to say THANK YOU for this integration, it’s fantastic!
I’m using Azure OpenAI for my processing as I have a bunch of free credits and no GPU, so may as well!

I’m wondering, if there was some possibility of adding facial recognition into this at some point?

I am using Frigate with 3 cameras and I configured the LLM summary blueprint 1.5.0. I don’t get why I need to set the trigger state (recording as default) if I only want to use the binary_sensors from frigate of ‘something’ occupancy.

How can I disable the recording trigger state? and why to trigger the automotion on going from idle to recording? I get 3 notifications at once when I set my cameras to start recording when I leave my home. I only want to run the automation when the frigate detectes occupancy.

Hey I’m not really sure here. I just tried messing with the settings (lowering duration, max frames) and that seemed to work. Am also using Reolink cams, so I’m not sure if Ring has anything to do with it. The only thing that wasn’t mentioned in LLMV was the model was wrong in providers. The ones I listed in my previous comment I tested and both worked. And all was just setup through the blueprint. Sorry there’s not much here, but was lucky enough for it to pretty much just work with minimal effort after getting the API.