LLM Vision: Let Home Assistant see!

Sorry guys, possibly a stupid question. I’m having codeproject.ai running is this LLM integration compatible with it?

Thanks and best regards

Hey everyone! ive followed every guide i can get my hands on but when using this with gemini i just keep getting an unknown error when trying to trigger anything LLM related any ideas? cheers

Looking for some guidance as well. very new to home assisstant…set it up just for llm vision.

im running into this error: Error: can only concatenate str (not “NoneType”) to str

:Processing: 1000007779.jpg…

quick overview
running home assistance in docker (synology)
using hubitat intergration in hacs (no devices, cameras, motion, etc added directly in ha.
setup to use “camera”, live preview, gemini, motion activated…
used the blue prints provided.
the flow seems to work up untill the error.

any ideas where i can look to resolve?
thanks!

How to get that AI full description response via notification. In my case it just triggered Motion Detected with the name of the camera. I’m using the LLM blueprint 1.3.1 with Google gemini.

But in developer tools → actions I can manually triggered and the AI response is in full description as requested in prompt

Does this only happen with Gemini? You could try another provider (Groq is free). Also make sure you see your camera streams in Home Assistant.

Dosen’t matter, still the same:

What did you use for max_tokens? You might have to increase that as it determines how long the generated response will be.

I’m not sure if it’s only gemini. To test, I tried to setup Groq, but I can’t seem to get the API to submit during the setup. I get a Looking at the logs on the groq side:
“Could not connect to the server. Check you API key or IP and port”
and groq side looks like this:
The model gemma-7b-it has been decommissioned and is no longer supported. Please refer to GroqCloud for a recommendation on which model to use instead.

I see in the documentation that llama3.2 is supported, I’m not sure how to authenticate against that off hand. In Gorq studio, I use the drop down in the top right to pick llama3.2 11b-preview-vision, but same error when trying to enter the API key in llm mision HA entity setup.

1 Like

Getting a little bit closer, seems to be tied to the ‘motion sensor’ Entity type. When I have it set as a hubitat virtual motion sensor, it does not seem to fire (get the error in the workflow). I switched that ‘motion sensor’ entity to a physical hubitat sensor and it seems to now work. At least the sending for analysis.

The v1.3.5-beta.2 is out and should fix Gemini and Groq. Changelog is here: Release genAI Titles & Bug fixes · valentinfrlch/ha-llmvision · GitHub.

Feedback is appreciated!

v1.3.5 is out!
It adds an option to generate a title based on the response. These titles will also serve as event titles when both remember and generate_title are enabled.
There are also a few bug fixes (e.g. Google and Groq should now set up correctly). Check out the full changelog here: v1.3.5 Release notes

LLM Vision now also has a website to better showcase its features and how it works.

Happy New Year, everyone!

3 Likes

I still (have updated to 1.3.5) get this “two pass” behaviour on iOS, but I guess that is expected? But it does not happen silently - I get two sounds and two buzzes but only one notification. So the second one updates the first.

Also: I have a problem that is driving me nuts. I seem to get an old image as part of the notification eventhough I cannot see this image in /config/www/llmvision where it is also overwriting the old images with new images.

I wonder where it picks up this old image from?

As part of this “two pass” notification and for the first notification it sends off: does it already have the new images from the camera when doing that? Should it wait with including the image for the second notification after the response from AI?

I use Gemini, 1.3.5 latest blueprint, I use Camera with Snapshot mode and have a unifi camera, so not Frigate.

I need help because I can’t do it or I misunderstood my camera works with frigate and a coral; the direction is the following image_file: /config/media/frigate/clips/ (except that the images are random) example (frigate1-1735750557.182234-2cehsy.jpg) and then I’m missing the frigate tap action url to navigate on my ios mobile
help please

alias: test vision
description: ""
triggers:
  - trigger: state
    entity_id:
      - binary_sensor.frigate1_all_occupancy
conditions: []
actions:
  - action: llmvision.image_analyzer
    metadata: {}
    data:
      remember: false
      include_filename: false
      target_width: 1280
      max_tokens: 100
      temperature: 0.2
      generate_title: true
      expose_images: true
      expose_images_persist: true
      provider: XXXXXXXXXXXXXX
      model: gpt-4o-mini
      message: |-
        Résumez ce qui se passe dans le flux de la caméra (une phrase maximum).
        Ne décrivez pas la scène ! S'il y a une personne, décrivez ce qu'elle
        fait et à quoi elle ressemble.
      image_file: /config/media/frigate/?
      image_entity:
        - camera.frigate1
    response_variable: response
  - action: notify.mobile_app_iphone_de_younes
    metadata: {}
    data:
      message: "Motion detected by Frigate. Analysis result: {{ analysis_result }}"
      data:
        url: ?
mode: single

How did you get openrouter to work with it?
I tried with the “https://openrouter.ai/api/v1/chat/completions” endpoint and it lets me add it as a Custom provider, but I’m getting errors when I try to use it instead of the OpenAI API which works flawlessly.

websocket_api script: Error executing script. Unexpected error for call_service at pos 1: 'str' object has no attribute 'get'
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
    await getattr(self, handler)()
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 764, in _async_call_service_step
    response_data = await self._async_run_long_action(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
    )
    ^
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 727, in _async_run_long_action
    return await long_task
           ^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2802, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2845, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/llmvision/__init__.py", line 325, in image_analyzer
    response = await request.call(call)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/llmvision/providers.py", line 198, in call
    response_text = await provider_instance.vision_request(call)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/llmvision/providers.py", line 279, in vision_request
    return await self._make_request(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/llmvision/providers.py", line 340, in _make_request
    response = await self._post(url=self.endpoint.get('base_url'), headers=headers, data=data)
                                    ^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'

Exécuté : 4 janvier 2025 à 18:40:04

Erreur : Groq does not support videos or streams

Exécuté : 4 janvier 2025 à 18:40:04
Erreur : Groq does not support videos or streams
Résultat :
params:
  domain: llmvision
  service: image_analyzer
  service_data:
    include_filename: false
    target_width: 1280
    max_tokens: 50
    temperature: 0.2
    expose_images: true
    expose_images_persist: true
    message: |
      identifier femme ou homme ,decrire
    image_file: /media/frigate/clips/frigate1-1735867068.682389-13tazr.jpg
    image_entity:
      - camera.frigate1
    provider: xxxxxxxxxxxxxx
    generate_title: true
  target: {}
running_script: false

All screent shoot that are shown are taking to fast so in 99% of the time no one is in the picture, is there any way to wait 1-3s before it take the screenshoot?

using G4 doorbell unifi

delay:
  hours: 0
  minutes: 0
  seconds: 5