LLM Vision: Let Home Assistant see!

Or indeed the answer to my original ask is in 1.2 -the new stream analzer - which avoids the intermediate file. I’ll test that when I get a chance - brilliant, thank you!! Much easier than having fragments of files all over the place and hooks into ring_mqtt nicely.

One thing I would like to do is submit images as well as video. Can I include image files in video analyzer requests (I found a great prompt someone used where they include photos of key people alongside the video and ask Gemini to call people by their names if they recognise them)

And if you can (and I appreciate I could just test it to find out), more importantly can I do so with stream analyzer? (which I can’t test yet as I havent got the prerelease installed yet)

Thanks!

This is not yet possible with stream analyzer. It is also not possible with video analyzer. I will add that later (to both stream and video analyzer) as it sounds like a good idea!

If you have any feedback for v1.2 please let me know.
Thanks!

Hi,

I wonder if theres an option to use text input for changeing the Message to AI for exaple: I want to receive notification only if someone with white shirt has been detected in the picture
and after that change the input text to look for someone with blue shirt. for the trigger i want to use frigate person detection and get the messsage only if the picture is what i asked.

Thank you

That’s great thank you! It is accepted in config, I assume it’s just not doing anything.

I’m paused this anyway atm as I need to find out why ring_mqtt is murdering my camera batteries with continuous usage. By the time I trust ring_mqtt and/or have guardrails in place in automations to prevent heavy unintentional streaming it ought to be about the time for 1.2!

1 Like

my input text is not responding on the vertical stack text example I tried to recreate from the wiki. I only get “response.response_text” for the text field.
image

my yaml:

alias: LLM doorbell
sequence:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      provider: OpenAI
      model: gpt-4o-mini
      detail: low
      max_tokens: 60
      temperature: 0.5
      message: Describe what you see. Be short and brief.
      image_entity:
        - camera.ring_front_doorbell_snapshot
    response_variable: response
  - action: input_text.set_value
    metadata: {}
    data:
      value: "{{response.response_text}}"
    target:
      entity_id: input_text.llmvision_response
  - action: notify.mobile_app_luffy1987
    data:
      data:
        image:url
        entity_id: camera.ring_doorbell_camera
        actions:
          - action: URI
            title: View Doorbell
            uri: /dashboard-mobile/security
      title: Someone is at the Front Door!
      message: "{{response.response_text}}"
description: ""
icon: phu:ring

I think the problem is the card not the automation. How does the yaml of your card look like? Did you put “{{input_text.llmvision_response}}”?

Also, I think you might want to delete the url from the yaml, it is publicly accessible.

You can use an input_text helper to store the variable prompt. In your automation, every time a person with a white shirt has been detected, you could change the text_input helper to the new prompt with instructions to look for a blue shirt.

Tested successfully (in beta.6) on the screen shots from HASS.Agent yesterday, working well :smiley:

1 Like

The card yaml:


type: custom:vertical-stack-in-card
cards:
  - type: tile
    entity: script.llm_doorbell
    icon_tap_action:
      action: toggle
    vertical: false
  - type: markdown
    content: "{{states(\"input_text.llmvision_response\")}}"

v1.2 Stream Analyzer and new Provider Configurations

:warning:Breaking Changes

  • :warning: Unfortunately, due to the changes in how provider configurations are stored, providers may have to be set up again!
  • provider now requires a config entry. Use the UI to pick your provider configuration!
  • include_filename is now a required parameter. Make sure you include it in all your scripts and automations!

Changelog

  • v1.2 adds stream_analyzer: It records for a set duration and analyzes frames in a given interval (much like video_analyzer). This is faster as it avoids writing the files to disk and reading them again.

  • The setup has been rewritten: You can now have multiple configurations for each provider and a restart is no longer necessary to delete configurations.

    This could be useful if you host multiple models on different servers (e.g. a small LLM on a Raspberry Pi and a larger model on a PC)

    :warning:Not all provider configurations will migrate automatically, so you may have to do the setup again.

Thanks to everyone who helped test this release!

1 Like

Automation Blueprints

Writing scripts and automations can be intimidating.
To make it easier to get started with LLM Vision, I have gathered some useful automations using LLM Vision and converted into blueprints so you can easily import and customize them for your needs.

For example, everyone using Frigate can now easily get event notifications summarized:

frame_111_delay-0.04s

You can find this and other blueprints in the examples wiki with more to come. If you have an automation, script or blueprint you’d like to share, please post them here and I’ll add them to the wiki!

i keep getting this error 500.
i can press run again and it works, but still get this error every now and again

Executed: 21 October 2024 at 09:15:45
Error: Fetch failed with status code 500
Result:
params:
domain: llmvision
service: stream_analyzer
service_data:
interval: 2
duration: 1
include_filename: false
target_width: 1280
detail: high
max_tokens: 100
temperature: 0.1
provider: 01JAP03NNG4W9J7Y7J6QP7QSD4
model: gpt-4o-mini
message: describe what you see in a few words
image_entity:
- camera.camera1
target: {}
running_script: false

did you get all this set up? have no clue what im doing and just want to display the response to an automation on a card

The error occurs when trying to fetch the latest frame recorded by the camera.
Here’s something you can try: In Home Assistant go to developer tools > States. Search for your camera and check its attributes. It should have entity_picture. Copy this and append to your Home Assistant url (e.g. http://homeassistant.local:8123/api/camera.entity_id?token=<accesstoken>

This should show you the latest camera frame in your browser. Try refreshing a few times. Will this also produce a 500 error?

ok yeah, so its only on the first time it tries to basically load. i can keep refreshing and its fine but if i leave a few seconds i get the error 500 again

This means Home Assistant doesn’t keep the stream running and it takes too long to load it. You can force the camera stream to stay active in the background. This will however take up some resources on the server. Here’s how you can do that:

  1. Find your camera entity in Settings Devices & services > Entities
  2. In the camera preview, click the settings cog
  3. There should be a setting called ‘preload camera stream’
  4. Check your CPU usage

For more information, see this: Camera - Home Assistant

Update: This should be fixed in the next version. If a request fails it will try again.

With v1.2.1 the blueprint has gotten some big upgrades:
AI understands what happens in the video, decides whether you should be notified and sends you notifications with a preview and summary of what happened.
Using LLM Vision has never been easier!

notification
Check out the post in Blueprint Exchange.

I’m trying to configure the llm vision component with Anthropic. I’m getting an invalid key error during setup. The key starts with “sk-ant-api03…”. I’m on the evaluation plan. Do I have to purchase a plan to use this?

Just realized that my remaining balance is 0.00. But the key should still work or not?

If I remember correctly you need to add some funds. I think you can get $5 for free though if you confirm your phone number.

I added some funds and the key was accepted.
Thanks

1 Like