For now you can take a look at these examples:
These can be triggered when a person is detected by frigate for example.
The next release will include support for image entities for better support for frigate.
For now you can take a look at these examples:
These can be triggered when a person is detected by frigate for example.
The next release will include support for image entities for better support for frigate.
For all the frigate users:
v0.3.8 has just been released which now supports entities of the image
domain. If you have the frigate integration installed and have object recognition enabled, frigate will provide such image entities for every camera and every object type.
You can even use this new image_entity
input together with the existing image_file
to submit multiple images from both frigate and the file system in one call.
This could be useful if you want to add a reference photo vs how it looks right now.
Cheers!
-valentin
Awesome work with this integration! Thanks for making it! I have it responding to what is currently happening around the house via snapshots, and it gives a very accurate descriptions
Any ideas on getting this to work with Unifi cameras and Extended-OpenAI-Conversation, to look at specific camera past motion events, so I can ask things like “when was the last time someone was in the back yard with a blue shirt on?” or “was there a white car in the driveway, if so, what time?” ?
I’m sure it would cut down on the processing time if things like car, animal, and person detections from those specific cameras could be processed vs all detections from all cameras, over x amount of time.
This is not currently possible as it requires sending the filename with the request. But I will add an option for that in the next release.
car
folder.- snapshots
- person
- 2024-06-22-07:42.jpg
- ...
- car
- ...
- package
- bicycle
- ...
There should folder in snapshots
with the exact name of the objects your integration can detect.
You then need two automations:
To make a request, your spec should have two inputs: object
, prompt
.
To get the folder contents you could use the built-in folder sensor:
sensor:
- platform: folder
folder: "/config/snapshots/"
With the following template you can get the contents of your snapshots folder:
{%- set folders = namespace(value=[]) %}
{%- for folder in state_attr("sensor.snapshots", "file_list") | sort %}
{%- set folders.value = folders.value + [folder] %}
{%- endfor %}
Then you send all files in the object
folder.
The include file_name option is needed because even if you have timestamps on your snapshots, LLMs are too susceptible to halluzination.
I hope this helpful. If you do end up writing the spec, please share it as I think this is a very interesting use case.
And if you need help, feel free to reach out.
Thanks for the detailed reply!
I’m trying to figure out if I can get recognition images from the NVR that are already categorized. It does this on the Unifi end somehow, at least for people, cars, animals and packages (and just movement).
I had everything working great a month ago, and now I get a message saying “Error running action” - OpenAI provider is not configured.
I didn’t change a thing other than upgrade the integration. Any ideas where to change it so it works?
I uninstalled the OpenAI integration and re-added it and it worked.
This is probably due to a variable name change. A while back, gpt4vision was only compatible with OpenAI, so I renamed a variable from API_KEY
to OPENAI_API_KEY
for more consistency. I should have put this into the changelog as a warning, sorry! Glad it’s working again!
Thank you for creating this! I’m having trouble getting a script to work when I specify the image_entity, but it works perfectly when I specify the path to the image_file.
This works:
alias: write what you see
sequence:
- service: gpt4vision.image_analyzer
data:
max_tokens: 100
provider: OpenAI
model: gpt-4o
target_width: 512
temperature: 0.5
detail: low
message: Please describe the scene
image_file: /config/www/test/corridor.jpg
response_variable: response
- service: system_log.write
metadata: {}
data:
level: error
message: log the {{ response.response_text }}
mode: single
but this does not (same code except I’m specifying my camera entity instead of the image file:
alias: write what you see (streaming)
sequence:
- service: gpt4vision.image_analyzer
data:
max_tokens: 100
provider: OpenAI
model: gpt-4o
target_width: 512
temperature: 0.5
detail: low
message: Please describe the scene
image_entity:
- camera.corridor
response_variable: response
- service: system_log.write
metadata: {}
data:
level: error
message: log the {{ response.response_text }}
mode: single
My possibly inaccurate understanding is that I can just use the image_entity of my camera and it can parse the stream for the image at the time the script is called? When I try running the script I get the following error:
“Failed to call service script/write_what_you_see_streaming. cannot access local variable ‘client’ where it is not associated with a value”
Or maybe I have to use the entity_picture of my camera as the image_entity? These are the attributes from my camera: (token is cut off so I don’t think my security is compromised by pasting this)
Thank you for your help!
You are right, there is a mistake in the code. Some results are not properly awaited, which is why you get this error. This should be fixed soon, it has been pushed to the latest dev release so if you want you can test it right now. I think it will be ready tomorrow to push as public/normal release.
Your understanding is correct, if you submit an image or camera entity the integration will fetch the latest frame.
v0.4.7 is out now which should fix this issue.
Thanks.
If I want the response to be a notification on my mobile screen, what should I do? And another question: can I use the response on a dashboard card?
Tnx
To get the response as a notification on your phone or tablet use the notify service in the automation or script where you call the gpt4vision service:
service: notify.mobile_app_your_phone
data:
title: Front Door
message: "{{response.response_text}}"
(Assuming your response_variable
is response
.)
To use the response on your dashboard you could create a input_text
helper to store the response in. You can do so in your script/automation with:
service: input_text.set_value
data:
value: "{{response.response_text}}"
target:
entity_id: input_text.gpt4vision_response
I have also put together a quick vertical stack in card to run your script and display the results using a markdown card:
type: custom:vertical-stack-in-card
cards:
- type: tile
entity: script.analyse_front_door
icon_tap_action:
action: toggle
vertical: false
- type: markdown
content: '{{states("input_text.gpt4vision_response")}}'
Note: Requires vertical-stack-in-card
Thank you so much!
I’m getting error - status code 500
I have tried the vertical stack in card above (thank you) ; and got this:
Here is the script -
alias: Script Chiller card
sequence: