LLM Vision: Let Home Assistant see!

valentinfrlch · November 4, 2024, 6:13am

Providers have been overhauled in v1.2. This means you can no longer paste ‘OpenAI’ as provider. Have you updated the integration as well?

See this page in the docs to learn how to set up providers: https://llm-vision.gitbook.io/getting-started/provider-configuration

ainen · November 5, 2024, 2:18am

I don’t think the Snapshot Preview Mode works how I would expect it to. It seems to constantly send a new snapshot of the camera instead of the event. Because of this, if I don’t check a notification immediately it just shows me the current view of the camera instead of the event.

Is this expected behavior?

valentinfrlch · November 5, 2024, 7:23am

Snapshot mode will store analyzed images in /config/www/llmvision. Images are named after the camera that captured them. This means if there is a new event, the old image will be overwritten by the new one. For your notifications, this means that older notifications might display the preview of a newer event, even though the summary will still refer to the original event.
This is done to avoid filling your filesystem with these images.

Is this what you experience?

ainen · November 5, 2024, 2:31pm

Unfortunately no, this has not been my experience.

When I get a notification, I essentially see a very low FPS live view of the camera. If I check an older notifications it is the same, a low FPS live view.

johboh · November 6, 2024, 2:36pm

@valentinfrlch Any thoughts on the Speech API and support in LLM Vision? As in providing a convenient way of using the llmvision.image_analyzer and getting back a mp3 for the generated text for a selected voice.

Something like:

  - action: llmvision.image_analyzer
    metadata: {}
    data:
      max_tokens: 8192
      model: gpt-4o-mini
      include_filename: false
      temperature: 1.0
      provider: <provider_id>
      voice: alloy
      format: mp3
      message: You are a bot for a home automation system. Tell me what is in this photo.
      image_file: /path/to/file
    response_variable: response
  - action: media_player.play_media
    data_template:
      media_content_id: "{{ response.response_path|string }}"
      announce: true
      media_content_type: audio/mpeg
      target_id: media_player.livingroom

valentinfrlch · November 7, 2024, 12:36pm

That sounds if Live Preview is used instead of Snapshot mode. Did you install the latest version of the integration (v1.3) as well?
Update: I have identified the issue and it will be fixed in the next update. Thanks for reporting it!

valentinfrlch · November 7, 2024, 12:43pm

I think this is already quite easy to set up in an automation.
The only reason this could be useful is, if there was a vision to audio model which would skip the text generation and in turn improve latency. To my knowledge this is not available in the API currently (correct me if I’m wrong!). So, at least right now this would only complicate the integration (which already has many options) for only a very small benefit.

fversteegen · November 8, 2024, 6:26am

Blueprint (V1.3) isn’t working properly here. Everything (including event calendar) is properly configured as far as I can see, but notifications only have the text “Person”

Relevant log entries:

This error originated from a custom integration.

Logger: custom_components.llmvision.request_handlers
Source: custom_components/llmvision/request_handlers.py:444
integration: LLM Vision (documentation, issues)
First occurred: 7 November 2024 at 21:02:10 (15 occurrences)
Last logged: 06:16:17

Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731014693.044706-tkpuik/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731034059.027834-rj9sat/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731030480.475938-4g1j3f/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731035607.413499-x95a79/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731040278.032353-1ekusx/clip.mp4 after 2 retries

and:

Logger: homeassistant.components.automation.ai_event_summary_llm_vision_v1_3
Source: helpers/script.py:2032
integration: Automation (documentation, issues)
First occurred: 7 November 2024 at 19:35:08 (56 occurrences)
Last logged: 06:16:17

AI Event Summary (LLM Vision v1.3): Error executing script. Error for choose at pos 4: Failed to fetch frigate clip 1731030480.475938-4g1j3f
AI Event Summary (LLM Vision v1.3): Analyze event: choice 1: Error executing script. Error for call_service at pos 1: Failed to fetch frigate clip 1731035607.413499-x95a79
AI Event Summary (LLM Vision v1.3): Error executing script. Error for choose at pos 4: Failed to fetch frigate clip 1731035607.413499-x95a79
AI Event Summary (LLM Vision v1.3): Analyze event: choice 1: Error executing script. Error for call_service at pos 1: Failed to fetch frigate clip 1731040278.032353-1ekusx
AI Event Summary (LLM Vision v1.3): Error executing script. Error for choose at pos 4: Failed to fetch frigate clip 1731040278.032353-1ekusx

valentinfrlch · November 8, 2024, 7:13am

Thanks for the detailed logs!
The problem occurs when LLM Vision (the integration) tries to fetch the clip from frigate. It does that through the Frigate Integration for Home Assistant. Have you installed the integration?
You can verify the Frigate integration works by navigating to http://192.168.1.10:8123/api/frigate/notifications/1731040278.032353-1ekusx/clip.mp4 on your computer. You should see the clip.

fversteegen · November 8, 2024, 8:07am

Frigate integration is up and running, but if I access http://192.168.1.10:8123/api/frigate/notifications/1731040278.032353-1ekusx/clip.mp4 then I am getting a play button with a strike through it (see screenshot)

valentinfrlch · November 8, 2024, 12:36pm

Not sure but it might be related to the Frigate beta. In the release notes of 0.15.0 beta 1 they mention that they migrated to a new API (FastAPI instead of Flask) so I think this is likely the problem.

valentinfrlch · November 8, 2024, 12:55pm

v1.3 of the blueprint seems to have a few issues.
I have updated and hopefully fixed them in v1.3.1 (still in beta). Feel free to test and provide feedback.
Because it is not in the main branch you’ll need to import it again and then overwrite the existing blueprint (your existing configuration will remain).
Just paste this url: https://github.com/valentinfrlch/ha-llmvision/blob/547ae32b56f25c0f3ff5ab2130997fa5ac56c6a9/blueprints/event_summary.yaml

Multi-device support has also been added so you can now have the notification sent to multiple devices (via HA mobile app).

pav · November 9, 2024, 8:20am

Why is it that in the Calendar events are marked with the heading ‘Nothing seen’, but when opened contain the exact response text ? Very misleading …

valentinfrlch · November 9, 2024, 6:06pm

This is the default title if no keywords are found in the reponse. Will change next update.

pav · November 10, 2024, 9:30am

In this particular case I asked to verify and acknowledge whether any persons were to be seen, and if so to describe them. Which it did correctly - so I wonder why this would be a case of ‘no keywords found in the response’ …
Makes me wonder what ‘keywords’ are then ?

valentinfrlch · November 10, 2024, 10:15am

Right now titles of notifications and events are not generated by AI. Only the body is. The title is simply label + " seen", where label is ‘Person’, ‘Car’, etc.
For example the title is “Person seen” if the summary contains ‘person’, ‘man’, ‘woman’ or ‘individual’.

I am working on AI generated titles, but this is how it works for now.

pav · November 10, 2024, 11:56am

I’m afraid this is NOT ‘how it works for now’
Consider this : the response I got was “A man and a woman are on a porch. The woman is wearing a light-colored, sleeveless top and a skirt. The man is wearing a dark-colored shirt and light-colored shorts.”
Yet it was labeled ‘Nothing seen’
But as you’re working on a better titling system anyhow, let’s not make a fuss about it …

valentinfrlch · November 10, 2024, 12:28pm

v1.3.1 Data Analyzer

Today’s update adds a new action to seamlessly update sensors based on image/video input. Just describe what data you want to extract and select a sensor to update. You can use Helpers to create virtual sensors.
Supported sensors are number , text , boolean and select. Data types and available options for select sensors are recognized automatically.

heviiguy · November 11, 2024, 12:03am

INSTALLATION ISSUE

Okay, this is embarrasing. I’ve installed Ollama on the same Linux box on which Frigate and HA are running. The latter 2 are in docker containers. Ollama was installed directly.

I verified correct installtion by entering http://127.0.0.1:11434/ in a browser on that box. The resultant dialogue was: Ollama is running

Problem
The problem arises when, from another box on the same local network, I try to specify the Ollama server address during the integration set-up. After entering the same domain name used to access HA, the entry is not accepted.

port 11434 was selected
port 11434 has been forwarded on my router
the https option was activated

Interestingly, when I ask Ollama to confirm the port, this is what she spits out:

ollama run llava-phi3
>>> What port is the Ollama server running on this machine?
The Ollama server is currently running on the localhost at port 3001.

Can somebody please point out where I've missed something which is probably woefully quite basic?

NIUB · November 11, 2024, 9:07am

Is Llama3.2-vision supported?