LLM Vision: Let Home Assistant see!

gumbald · April 2, 2025, 8:31pm

I’ve searched and searched and while I can do queries, what I’m trying to do is have a variable periodically/trigger updated with number of cars, garage door open etc. But I’m having trouble with parsing the response!

Here’s my script - but stumbling at "
Error: Error rendering data template: UndefinedError: ‘str object’ has no attribute ‘number_of_cars’"

sequence:
  - data:
      max_tokens: 100
      image_entity:
        - camera.g4_doorbell_pro_poe_high_resolution_channel
      provider: 01JE9RZWENF675FC56ZHH9WZD8
      target_width: 512
      temperature: 0.5
      detail: low
      include_filename: false
      message: >-
        Please count the number of cars in view and respond with a JSON object.
        The JSON object should have a single key, "number_of_cars" which should
        be the number of cars visible on the driveway.
    response_variable: response
    action: llmvision.image_analyzer
  - action: counter.set_value
    metadata: {}
    data:
      value: >
        '{{ response.response_text['number_of_cars'] | int }}'
    target:
      entity_id: counter.cars
alias: Car Count
mode: single
description: ""

I’ve managed to fix it, by skipping the JSON object step - can someone explain what I was doing wrong with the JSON?

Working YAML:

sequence:
  - data:
      max_tokens: 100
      image_entity:
        - camera.g4_doorbell_pro_poe_high_resolution_channel
      provider: 01JE9RZWENF675FC56ZHH9WZD8
      target_width: 512
      temperature: 0.5
      detail: low
      include_filename: false
      message: >-
        Please count the number of cars in view and respond with a number.
    response_variable: response
    action: llmvision.image_analyzer
  - action: counter.set_value
    metadata: {}
    data:
      value: |
        {{ response.response_text | int }}
    target:
      entity_id: counter.cars
alias: Car Count
mode: single
description: ""

CB1975 · April 4, 2025, 9:43am

@valentinfrlch Wow! Thanks so much for all your hard work on this, it’s amazing and I’ve been having lots of fun implementing it over the last few weeks. I’ve been using your blueprint to provide summaries of frigate camera footage when someone is spotted on my driveway and it’s working great. I took a look at my openai usage stats after only 14 days I was surprised to see how many tokens it was using. I’m utilising the cheaper gpt-4o-mini model and have my maximum tokens parameter set to 50 in the blueprint. I’m only sending 3 frames with target width 1280. In 429 requests I have used over 25 million tokens ( 26,250k input tokens & only 10k output tokens) 23 output tokens per request makes sense looking at the parameters, but is there any way I can lower the input tokens from 60k per request? Is this figure normal? It seems massively disproportionate and looks like it will be costing me around $10/month, which isn’t the end of the world but seems high when reading forums where people claim they only pay a few cents per month.

valentinfrlch · April 4, 2025, 11:08am

Glad you like the integration! The input token figure is so high because of the images. You can either lower the number of images sent (though I think 3 is reasonable), or you can lower the resolution (I’d recommend 720).

To get the cost down, I would mostly focus on the number of activations. I think 429 in just two weeks is very high. Try increasing cooldown to avoid multiple activations for the same event and use run_conditions to add some custom logic. For example you could only run the automation when you’re not home. If you have frigate installed, you might also want to use the binary sensors for the zones set up in frigate within run_conditions.

Of course you can also switch your AI provider to Google Gemini (which has a free tier). There are some drawbacks to this of course, e.g. that Google might use your input for training. There is also currently an issue where requests to Gemini might be rejected due to Google’s servers being overloaded (documented here).

CB1975 · April 4, 2025, 12:41pm

Excellent, thanks for the advice and quick reply. I’m more at ease knowing I’m not missing something glaringly obvious, I’ll have a play around with resolutions and reducing the number of hits. I just couldn’t get my head around why we were capping output replies at 50 tokens when inputs are costing 60,000 tokens each.

Bbobak · April 10, 2025, 10:01pm

Love this, helping me turn off notification from other smart apps and be fully HA. However I am using Ring Cameras. I used the blueprint on the LLM Vision page as an automation. Seems to be working well despite a lot of posts about issues with it. The only issue is not all the cameras are notifying me.

I selected both the camera stream and snapshot entities for the camera entities. Wasn’t sure which to use so I grabbed both for each camera. I then use the camera motion entity for the trigger. Out of the three cameras, the garage sends me the notification 100% of the time. The other two not so much. Backyard I have gotten two notifications both of which were wind moving things. But if the dog or kids go out back, I get no notification from LLM. The front door is right next to the garage and I’m assuming since the garage just notified me the cool down stops the front door from doing the same thing at about the same time.

I read in one of the post that an upgrade to LLM now decides if it should notify you. Could that be what’s happening?

Any other tips? If I can get this to work 100% of the time, I’ll be able to turn ring notifications off and be solely relying on HA and my goals complete of using one app for everything.

PieBru · April 12, 2025, 1:02pm

Awesome integration!
I get an error trying to analyze a JPEG, it’s a meteogram: Failed to perform the action llmvision.image_analyzer. Error: cannot write mode P as JPEG

Here is the YAML:

action: llmvision.image_analyzer
data:
  include_filename: false
  max_tokens: 100
  temperature: 0.2
  message: >-
    Nel meteogramma fornito, quali sono gli eventi più importanti, ad esempio
    raffiche di vento a più di 20 Km/h, nuvolosità più del 50%, Temperature
    massime più di 25 gradi o minime inferiori a 15 gradi?
  provider: 01JK16EC4B9WVQN7H5SH7QEV6T
  model: cogito
  target_width: 1280
  image_file: /config/www/meteogram_marina-di-schiavonea_italy_2524258.jpg

The file is found, if I try /local/ instead of /config/www/ it isn’t found, that seems OK.

I did not find any log, please help me narrow down this problem.

Thank you,
Piero

RoadkillUK · April 12, 2025, 3:03pm

Try it without /config …

image_file: /www/meteogram_marina-di-schiavonea_italy_2524258.jpg

PieBru · April 12, 2025, 5:00pm

Error: File /www/meteogram_marina-di-schiavonea_italy_2524258.jpg does not exist

zodyking · April 13, 2025, 2:13am

Couldnt help but notice everyone using gpt, gemini is free. and good enough lol. Anyone using grok? is there api availible yet? i like that it can say swear words and it also is spot on when i need help figuring out complex scripts/intergrations that gpt would often make some b.s up and then have me doing b.s for a few hours until i realize its b.s

valentinfrlch · April 15, 2025, 9:04am

This sounds like a problem with the image format. Images should be converted automatically, but it seems like it doesn’t work in this case. Could you create a new issue here: GitHub · Where software is built?

PieBru · April 15, 2025, 3:09pm

@zodyking I live in a RV, I’m using Llama-3.3-70b. Via Cerebras when internet is connected, or (slowly) inferenced by a LAN computer when I’m off-cloud. It’s nice to use the same LLM both with a fast and free cloud provider and locally when off-cloud.

PieBru · April 15, 2025, 3:12pm

@valentinfrlch The test meteogram was originally PNG, after receiving similar errors I converted it to JPG using the “standard” Arch Linux convert command.
I will proceed with opening an issue, thank you,
Piero

Edit: btw, this is where the meteogram caomes from: Meteogram Trento - meteoblue

fnxpt · April 16, 2025, 9:15am

Hey, I’ve been reading the thread but I didn’t saw anyone ask this question. On LLM Vision website there is a prompt “Who’s at the door?”, this hints that it’s possible to identify the person that its on the image instead of describing it. Is my interpretation correct and if so, how do you train it to know who the person is.

NathanCu · April 16, 2025, 11:05am

The memory features allow you to provide samples. It’s not a true training it’s sample matching. It works if you get a clear face shot but it’s not ‘training’

fnxpt · April 16, 2025, 11:30am

thanks @NathanCu I will give it a try

zodyking · April 16, 2025, 2:42pm

How’s the performance of it? I plan to do things with it that would cost me arm and leg via the official api

karloldreyes · April 19, 2025, 2:03am

The same thing is happening to me using Gemma 3. Did you ever figure this out?

Nick4 · April 19, 2025, 9:00am

Hi, welcome to the forum!

Please don’t expect to get immediate response and post the same question again within 5 hours.

Bbobak · April 19, 2025, 7:43pm

Trying to setup Memory feature. Got one file path and description setup. When I try to setup 3 more, I get an error message saying “one or more image paths are invalid”. Anyone else have this issue?

Eoin · April 25, 2025, 2:30am

Just stated playing with this using Unifi Protect Cameras, NodeRed and Gemini, absolutely love it, amazing work, thank you valentinfrlch!