For scripts you need to provide a response variable so that Home Assistant knows where to store the response and so you can use it in subsequent actions.
You need to add this line to your service call:
response_variable: response
The response is then stored in response.response_text
Check out the examples page which already has scripts you can use as inspiration:
LLM Vision was created to analyze images - mostly in the context of smart homes. Even if it could read PDFs, there are already tools out there that do a far better job at this and have cruical features (like citing etc.). I recommend taking a look at tools like private-gpt.
Well Home Assistant is a smart home platform which has very little to do with PDFs. I would really suggest you try out some other tools. Most run as a docker container which are really quite simple to set up, if you follow the instructions. Docker also has a GUI (Docker Desktop) which might be less intimidating than using the terminal directly.
This looks awesome. I can see how I can take one image from my ring vis the camera entity but I was wondering if there is a way to extract a few images a second apart from that sensor, or even better a brief video segment from the rtsp stream provided through ring-mqtt).
In my head there are two ways, assuming itās not already a feature to be able to do it, either create a short segment using one script (no idea how!) that writes to a temp file that I reference in an LLM-Vision call, or I can pipe the rtsp into LLM-Vision somehow.
The challenge I have is that the longer after the event I do the call the easier it gets as I can probably use recorded events, or mimic the frigate integration somehow to allow a few static images over a period of time, stage them and make the call. However what I want is to take 1-2s of video and get that sent off so when I get a response the person who walked up to my door is actually still there and hasnāt left several minutes before, which makes the exercise pointless (there is a DHL courier looking bored and waiting at the door for you is very unhelpful five minutes later)
Any ideas? I canāt see any examples that match my use case ā¦
What youāre looking for is the video_analyzer action.
It works much the same as image_analyzer except that it takes in one or multiple videos. In addition to video files frigate is also supported! All you need is the event_id which you can find in frigateās mqtt events.
Thanks but as per post I donāt have the video files. i want to point the video analyzer section at a streaming camera entity not a file. Or ā¦ I donāt have frigate, I have ring, but I want to replicate the function to send a collection of images takes from a video stream.
I have now found that camera entities have a record function, which I didnāt know. My plan is to capture a short video always to the same file name, so avoiding filling my pi up with thousands of videos, then pointing to that video file in the video analzer section.
Thanks for this. Still using the old version (gpt4 vision), but it imports my meter readings into HA daily. Created a separate sensor which sets itself to the value of the helper (input_number.gas_meter_reading). Had a spare camera which was dirt cheap from AliExpress lying around, hence doing this rather than AI on the edge via esp32.
alias: Gas Meter GPT Reading
description: ""
trigger:
- platform: time
at: "14:00:00"
action:
- service: gpt4vision.image_analyzer
data:
provider: Anthropic
model: claude-3-5-sonnet-20240620
include_filename: false
target_width: 512
detail: high
max_tokens: 500
temperature: 0.5
image_entity:
- camera.alicam1proxy
message: >-
what is the 5 digit number shown in this image? don't reply with any
other words, only the number.
response_variable: gpt_response
- service: input_number.set_value
target:
entity_id: input_number.gas_meter_reading
data:
value: |
{{ gpt_response.response_text | int }}
Iāve got a case where Iād like to be able to use this integration but it seems that my images are in PNG format rather than JPG, so the addon throws an error:
Logger: homeassistant.helpers.script.websocket_api_script
Source: helpers/script.py:525
First occurred: September 29, 2024 at 6:17:45 PM (4 occurrences)
Last logged: 7:44:41 PM
websocket_api script: Error executing script. Unexpected error for call_service at pos 1: cannot write mode RGBA as JPEG
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/PIL/JpegImagePlugin.py", line 639, in _save
rawmode = RAWMODE[im.mode]
~~~~~~~^^^^^^^^^
KeyError: 'RGBA'
The image is coming from HASS.Agent where itās using the screenshot sensor to bring a screenshot image into HA as a camera entity.
Would it be best for this integration to be able to handle PNG format images? Or should that agent integration change to use JPG format for the screenshots? Iām not sure if thereās a standard/expectation of jpg for camera entities in HA or not.
It looks like OpenAI at least support png upload as well as jpeg.
You will probably have to download it manually in HACS since it is still a beta.
You can do so in HACS > ā¦ > redownload, then select the latest version (v1.2.0-beta.5).
Or indeed the answer to my original ask is in 1.2 -the new stream analzer - which avoids the intermediate file. Iāll test that when I get a chance - brilliant, thank you!! Much easier than having fragments of files all over the place and hooks into ring_mqtt nicely.
One thing I would like to do is submit images as well as video. Can I include image files in video analyzer requests (I found a great prompt someone used where they include photos of key people alongside the video and ask Gemini to call people by their names if they recognise them)
And if you can (and I appreciate I could just test it to find out), more importantly can I do so with stream analyzer? (which I canāt test yet as I havent got the prerelease installed yet)
This is not yet possible with stream analyzer. It is also not possible with video analyzer. I will add that later (to both stream and video analyzer) as it sounds like a good idea!
If you have any feedback for v1.2 please let me know.
Thanks!
I wonder if theres an option to use text input for changeing the Message to AI for exaple: I want to receive notification only if someone with white shirt has been detected in the picture
and after that change the input text to look for someone with blue shirt. for the trigger i want to use frigate person detection and get the messsage only if the picture is what i asked.
Thatās great thank you! It is accepted in config, I assume itās just not doing anything.
Iām paused this anyway atm as I need to find out why ring_mqtt is murdering my camera batteries with continuous usage. By the time I trust ring_mqtt and/or have guardrails in place in automations to prevent heavy unintentional streaming it ought to be about the time for 1.2!