I don’t think the Snapshot Preview Mode works how I would expect it to. It seems to constantly send a new snapshot of the camera instead of the event. Because of this, if I don’t check a notification immediately it just shows me the current view of the camera instead of the event.
Snapshot mode will store analyzed images in /config/www/llmvision. Images are named after the camera that captured them. This means if there is a new event, the old image will be overwritten by the new one. For your notifications, this means that older notifications might display the preview of a newer event, even though the summary will still refer to the original event.
This is done to avoid filling your filesystem with these images.
Unfortunately no, this has not been my experience.
When I get a notification, I essentially see a very low FPS live view of the camera. If I check an older notifications it is the same, a low FPS live view.
@valentinfrlch Any thoughts on the Speech API and support in LLM Vision? As in providing a convenient way of using the llmvision.image_analyzer and getting back a mp3 for the generated text for a selected voice.
Something like:
- action: llmvision.image_analyzer
metadata: {}
data:
max_tokens: 8192
model: gpt-4o-mini
include_filename: false
temperature: 1.0
provider: <provider_id>
voice: alloy
format: mp3
message: You are a bot for a home automation system. Tell me what is in this photo.
image_file: /path/to/file
response_variable: response
- action: media_player.play_media
data_template:
media_content_id: "{{ response.response_path|string }}"
announce: true
media_content_type: audio/mpeg
target_id: media_player.livingroom
That sounds if Live Preview is used instead of Snapshot mode. Did you install the latest version of the integration (v1.3) as well?
Update: I have identified the issue and it will be fixed in the next update. Thanks for reporting it!
I think this is already quite easy to set up in an automation.
The only reason this could be useful is, if there was a vision to audio model which would skip the text generation and in turn improve latency. To my knowledge this is not available in the API currently (correct me if I’m wrong!). So, at least right now this would only complicate the integration (which already has many options) for only a very small benefit.
Blueprint (V1.3) isn’t working properly here. Everything (including event calendar) is properly configured as far as I can see, but notifications only have the text “Person”
This error originated from a custom integration.
Logger: custom_components.llmvision.request_handlers
Source: custom_components/llmvision/request_handlers.py:444
integration: LLM Vision (documentation, issues)
First occurred: 7 November 2024 at 21:02:10 (15 occurrences)
Last logged: 06:16:17
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731014693.044706-tkpuik/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731034059.027834-rj9sat/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731030480.475938-4g1j3f/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731035607.413499-x95a79/clip.mp4 after 2 retries
Failed to fetch http://192.168.1.10:8123/api/frigate/notifications/1731040278.032353-1ekusx/clip.mp4 after 2 retries
and:
Logger: homeassistant.components.automation.ai_event_summary_llm_vision_v1_3
Source: helpers/script.py:2032
integration: Automation (documentation, issues)
First occurred: 7 November 2024 at 19:35:08 (56 occurrences)
Last logged: 06:16:17
AI Event Summary (LLM Vision v1.3): Error executing script. Error for choose at pos 4: Failed to fetch frigate clip 1731030480.475938-4g1j3f
AI Event Summary (LLM Vision v1.3): Analyze event: choice 1: Error executing script. Error for call_service at pos 1: Failed to fetch frigate clip 1731035607.413499-x95a79
AI Event Summary (LLM Vision v1.3): Error executing script. Error for choose at pos 4: Failed to fetch frigate clip 1731035607.413499-x95a79
AI Event Summary (LLM Vision v1.3): Analyze event: choice 1: Error executing script. Error for call_service at pos 1: Failed to fetch frigate clip 1731040278.032353-1ekusx
AI Event Summary (LLM Vision v1.3): Error executing script. Error for choose at pos 4: Failed to fetch frigate clip 1731040278.032353-1ekusx
Thanks for the detailed logs!
The problem occurs when LLM Vision (the integration) tries to fetch the clip from frigate. It does that through the Frigate Integration for Home Assistant. Have you installed the integration?
You can verify the Frigate integration works by navigating to http://192.168.1.10:8123/api/frigate/notifications/1731040278.032353-1ekusx/clip.mp4 on your computer. You should see the clip.
Frigate integration is up and running, but if I access http://192.168.1.10:8123/api/frigate/notifications/1731040278.032353-1ekusx/clip.mp4 then I am getting a play button with a strike through it (see screenshot)
Not sure but it might be related to the Frigate beta. In the release notes of 0.15.0 beta 1 they mention that they migrated to a new API (FastAPI instead of Flask) so I think this is likely the problem.
v1.3 of the blueprint seems to have a few issues.
I have updated and hopefully fixed them in v1.3.1 (still in beta). Feel free to test and provide feedback.
Because it is not in the main branch you’ll need to import it again and then overwrite the existing blueprint (your existing configuration will remain).
Just paste this url: https://github.com/valentinfrlch/ha-llmvision/blob/547ae32b56f25c0f3ff5ab2130997fa5ac56c6a9/blueprints/event_summary.yaml
Multi-device support has also been added so you can now have the notification sent to multiple devices (via HA mobile app).
In this particular case I asked to verify and acknowledge whether any persons were to be seen, and if so to describe them. Which it did correctly - so I wonder why this would be a case of ‘no keywords found in the response’ …
Makes me wonder what ‘keywords’ are then ?
Right now titles of notifications and events are not generated by AI. Only the body is. The title is simply label + " seen", where label is ‘Person’, ‘Car’, etc.
For example the title is “Person seen” if the summary contains ‘person’, ‘man’, ‘woman’ or ‘individual’.
I am working on AI generated titles, but this is how it works for now.
I’m afraid this is NOT ‘how it works for now’
Consider this : the response I got was “A man and a woman are on a porch. The woman is wearing a light-colored, sleeveless top and a skirt. The man is wearing a dark-colored shirt and light-colored shorts.”
Yet it was labeled ‘Nothing seen’
But as you’re working on a better titling system anyhow, let’s not make a fuss about it …
Today’s update adds a new action to seamlessly update sensors based on image/video input. Just describe what data you want to extract and select a sensor to update. You can use Helpers to create virtual sensors.
Supported sensors are number , text , boolean and select. Data types and available options for select sensors are recognized automatically.
Okay, this is embarrasing. I’ve installed Ollama on the same Linux box on which Frigate and HA are running. The latter 2 are in docker containers. Ollama was installed directly.
I verified correct installtion by entering http://127.0.0.1:11434/ in a browser on that box. The resultant dialogue was: Ollama is running
Problem
The problem arises when, from another box on the same local network, I try to specify the Ollama server address during the integration set-up. After entering the same domain name used to access HA, the entry is not accepted.
port 11434 was selected
port 11434 has been forwarded on my router
the https option was activated
Interestingly, when I ask Ollama to confirm the port, this is what she spits out:
ollama run llava-phi3
>>> What port is the Ollama server running on this machine?
The Ollama server is currently running on the localhost at port 3001.
Can somebody please point out where I've missed something which is probably woefully quite basic?