LLM Vision: Let Home Assistant see!

Robert_Shed · October 14, 2024, 1:43pm

my input text is not responding on the vertical stack text example I tried to recreate from the wiki. I only get “response.response_text” for the text field.

my yaml:

alias: LLM doorbell
sequence:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      provider: OpenAI
      model: gpt-4o-mini
      detail: low
      max_tokens: 60
      temperature: 0.5
      message: Describe what you see. Be short and brief.
      image_entity:
        - camera.ring_front_doorbell_snapshot
    response_variable: response
  - action: input_text.set_value
    metadata: {}
    data:
      value: "{{response.response_text}}"
    target:
      entity_id: input_text.llmvision_response
  - action: notify.mobile_app_luffy1987
    data:
      data:
        image:url
        entity_id: camera.ring_doorbell_camera
        actions:
          - action: URI
            title: View Doorbell
            uri: /dashboard-mobile/security
      title: Someone is at the Front Door!
      message: "{{response.response_text}}"
description: ""
icon: phu:ring

valentinfrlch · October 15, 2024, 8:14am

I think the problem is the card not the automation. How does the yaml of your card look like? Did you put “{{input_text.llmvision_response}}”?

Also, I think you might want to delete the url from the yaml, it is publicly accessible.

valentinfrlch · October 15, 2024, 8:16am

You can use an input_text helper to store the variable prompt. In your automation, every time a person with a white shirt has been detected, you could change the text_input helper to the new prompt with instructions to look for a blue shirt.

slack1 · October 15, 2024, 8:59pm

Tested successfully (in beta.6) on the screen shots from HASS.Agent yesterday, working well

Robert_Shed · October 15, 2024, 10:17pm

The card yaml:


type: custom:vertical-stack-in-card
cards:
  - type: tile
    entity: script.llm_doorbell
    icon_tap_action:
      action: toggle
    vertical: false
  - type: markdown
    content: "{{states(\"input_text.llmvision_response\")}}"

valentinfrlch · October 16, 2024, 11:08am

v1.2 Stream Analyzer and new Provider Configurations

Breaking Changes

Unfortunately, due to the changes in how provider configurations are stored, providers may have to be set up again!
provider now requires a config entry. Use the UI to pick your provider configuration!
include_filename is now a required parameter. Make sure you include it in all your scripts and automations!

Changelog

v1.2 adds stream_analyzer: It records for a set duration and analyzes frames in a given interval (much like video_analyzer). This is faster as it avoids writing the files to disk and reading them again.
The setup has been rewritten: You can now have multiple configurations for each provider and a restart is no longer necessary to delete configurations.

This could be useful if you host multiple models on different servers (e.g. a small LLM on a Raspberry Pi and a larger model on a PC)

Not all provider configurations will migrate automatically, so you may have to do the setup again.

Thanks to everyone who helped test this release!

valentinfrlch · October 18, 2024, 3:48pm

Automation Blueprints

Writing scripts and automations can be intimidating.
To make it easier to get started with LLM Vision, I have gathered some useful automations using LLM Vision and converted into blueprints so you can easily import and customize them for your needs.

For example, everyone using Frigate can now easily get event notifications summarized:

frame_111_delay-0.04s

You can find this and other blueprints in the examples wiki with more to come. If you have an automation, script or blueprint you’d like to share, please post them here and I’ll add them to the wiki!

karlhfr · October 21, 2024, 8:16am

i keep getting this error 500.
i can press run again and it works, but still get this error every now and again

Executed: 21 October 2024 at 09:15:45
Error: Fetch failed with status code 500
Result:
params:
domain: llmvision
service: stream_analyzer
service_data:
interval: 2
duration: 1
include_filename: false
target_width: 1280
detail: high
max_tokens: 100
temperature: 0.1
provider: 01JAP03NNG4W9J7Y7J6QP7QSD4
model: gpt-4o-mini
message: describe what you see in a few words
image_entity:
- camera.camera1
target: {}
running_script: false

karlhfr · October 21, 2024, 8:18am

did you get all this set up? have no clue what im doing and just want to display the response to an automation on a card

valentinfrlch · October 21, 2024, 8:58am

The error occurs when trying to fetch the latest frame recorded by the camera.
Here’s something you can try: In Home Assistant go to developer tools > States. Search for your camera and check its attributes. It should have entity_picture. Copy this and append to your Home Assistant url (e.g. http://homeassistant.local:8123/api/camera.entity_id?token=<accesstoken>

This should show you the latest camera frame in your browser. Try refreshing a few times. Will this also produce a 500 error?

karlhfr · October 21, 2024, 1:54pm

ok yeah, so its only on the first time it tries to basically load. i can keep refreshing and its fine but if i leave a few seconds i get the error 500 again

valentinfrlch · October 21, 2024, 3:49pm

This means Home Assistant doesn’t keep the stream running and it takes too long to load it. You can force the camera stream to stay active in the background. This will however take up some resources on the server. Here’s how you can do that:

Find your camera entity in Settings Devices & services > Entities
In the camera preview, click the settings cog
There should be a setting called ‘preload camera stream’
Check your CPU usage

For more information, see this: Camera - Home Assistant

Update: This should be fixed in the next version. If a request fails it will try again.

valentinfrlch · October 22, 2024, 8:04am

With v1.2.1 the blueprint has gotten some big upgrades:
AI understands what happens in the video, decides whether you should be notified and sends you notifications with a preview and summary of what happened.
Using LLM Vision has never been easier!

notification
Check out the post in Blueprint Exchange.

starob · October 22, 2024, 12:33pm

I’m trying to configure the llm vision component with Anthropic. I’m getting an invalid key error during setup. The key starts with “sk-ant-api03…”. I’m on the evaluation plan. Do I have to purchase a plan to use this?

Just realized that my remaining balance is 0.00. But the key should still work or not?

valentinfrlch · October 22, 2024, 1:21pm

If I remember correctly you need to add some funds. I think you can get $5 for free though if you confirm your phone number.

starob · October 22, 2024, 3:16pm

I added some funds and the key was accepted.
Thanks

GavinCampbell · October 23, 2024, 1:01pm

Thanks for this project. I’ve been having some fun with it.

I have an automation where at night when the alarm is armed it would wake me up if a person was detected on the driveway. However this was always a pain as rain or even reflections would trigger it even though I have blueiris setup nicely. Could never get it to work properly.

Now I have it setup so that if blueiris thinks there is a person, it will ping the llm to double check before waking me up to let me know. Works 100% more reliably.

starob · October 23, 2024, 3:27pm

I had the same idea. I just implemented this using Frigate and LLM Vision.

zuzzy · October 28, 2024, 12:26pm

In case others have Ring cameras and use this, some notes on its use (having just worked through it all with @valentinfrlch)

TL;DR: to get Ring + LLM Vision working together use an action in an automation to save a short video clip in a local folder then point LLM Vision at it. I always use the same file name so it’s easy then to point the LLM Vision action at the file

First of all, you need to have Ring cameras and the Ring native HA plugin and Ring-MQTT HACS addon here installed. You need to have streaming video configured using Generic Camera(s) as per instructions here and that all needs to be working. If it’s not, I’m sure @valentinfrlch will agree that this thread isn’t the right place, ask in the dedicated thread for it here

You also need to have rights to save files in your home assistant file store somewhere. If you choose anything but the config folder you may need to permit writing to those folders. How this is done varies depending on how you run it but start here

Now to the point of the post.

I have got LLM Vision working with Ring cameras but the fact that they are not typical CCTV cameras and thus don’t go through a system such as Frigate presents some nuances to how compatible certain aspects of LLM Vision are with them. In short:

Images
Video
Streaming

The problems come from Ring not LLM Vision because of how Ring itself works. Detail below, but in summary (so far as I can tell) there is no way to get Ring to snapshot images on demand, which means that the snapshot image is useless for realtime analysis like this

Streaming can’t ever work because LLM Vision relies on taking several frames from the video using entity_image attribute of the camera, rather than processing a stream of video. But the Generic Camera entity doesn’t quite work like that when created to host the ring-mqtt rtsp stream. As per the instructions you set the live stream to point at the rtsp stream exposed by the ring-mqtt addon but the snapshot attribute is pointed at the offical ring addon snapshot attribute.

And hence images doesn’t work either, because the snapshot on ring is not related to motion, it’s configured in the ring app as an image taken every x minutes regardless of activity. The camera.snapshot action has no effect as ring doesn’t have a “take snapshot” function within it (and equally therefore entity_image that LLM Vision Streaming function requires has the same limitation)

Some other pointers:

You could call past recorded events not live stream if you want by changing the path you query from ring - see ring-mqtt docs
Be aware that it can take a few seconds for ring video to start to live stream from the point the stream is ‘viewed’ (including using actions like camera.record). Part of this is delay in connecting to ring live streaming, some is buffering within HA itself. You can tell streaming to start using an action, which might shave a bit off that time
Within Ring any HA live camera activity is shown as live viewing and is recorded like any other ring activity. This means it is subject to the same limitations as the Ring app in terms of duration.
Beware if you have a battery Ring camera (or you are precious about your bandwidth) as once you start a live stream it may not always reliably stop (the forums are full of examples, but it’s unclear why it happens). You can issue a stop live stream action in HA which helps prevent this happening.
If you want to include an image in any notification following LLM Vision being run, it’s not included in the response variable LLM Vision returns. Normally you would include a snapshot for anything else but Ring, however since Ring snapshots are periodic it’s not going to have any relevance to what was submitted for analysis by LLM Vision. As of writing I don’t know a workaround however I believe @valentinfrlch is potentially going to help Ring users as a by-product of including a debug function that can write the first submitted frame to a folder - which Ring users can use as a workaround for this problem as that frame can be our notification image we can’t otherwise get.

johboh · November 1, 2024, 4:04pm

I’m using LLM Vision + gpt4o-mini to identify how my 3D printing is going, works great!

For another project, I want to send a history graph for a sensor value (weight of person in bed during sleep, indicating sleep “quality” and movements). ~~Do anyone know if it possible to grab a graph/history graph for a sensor and feed it to llmvision.image_analyzer?~~
edit: I solved this using my existing Grafana setup, where I can easily download a graph as PNG: Generating PNG images from a Grafana chart – Correct URL, Settings and Authentication | j3t.ch | Julien Perrochet's Blog

//Johan