GPT-4 vision image analysis in Home Assistant

valentinfrlch · June 14, 2024, 1:12pm

For now you can take a look at these examples:

These can be triggered when a person is detected by frigate for example.
The next release will include support for image entities for better support for frigate.

valentinfrlch · June 17, 2024, 2:38pm

For all the frigate users:
v0.3.8 has just been released which now supports entities of the image domain. If you have the frigate integration installed and have object recognition enabled, frigate will provide such image entities for every camera and every object type.

You can even use this new image_entity input together with the existing image_file to submit multiple images from both frigate and the file system in one call.

This could be useful if you want to add a reference photo vs how it looks right now.

Cheers!
-valentin

CircuitSetup · June 22, 2024, 5:29pm

Awesome work with this integration! Thanks for making it! I have it responding to what is currently happening around the house via snapshots, and it gives a very accurate descriptions

Any ideas on getting this to work with Unifi cameras and Extended-OpenAI-Conversation, to look at specific camera past motion events, so I can ask things like “when was the last time someone was in the back yard with a blue shirt on?” or “was there a white car in the driveway, if so, what time?” ?

I’m sure it would cut down on the processing time if things like car, animal, and person detections from those specific cameras could be processed vs all detections from all cameras, over x amount of time.

jackpea · June 22, 2024, 7:27pm

@WouterN Thanks very much for sharing the doorbell prompt, seems to work perfectly!

valentinfrlch · June 23, 2024, 6:52am

This is not currently possible as it requires sending the filename with the request. But I will add an option for that in the next release.

Some prerequisites

Object recognition: This is not a must, but it will work much better with
Snapshots folder: I propose folder structure as described below:
This is to reduce the amount of images that need to be analyzed. If the LLM finds you want to know about a ‘white car’, it will only look at images in the car folder.

- snapshots
	- person
		- 2024-06-22-07:42.jpg
		- ...
	- car
		- ...
	- package
	- bicycle
	- ...

There should folder in snapshots with the exact name of the objects your integration can detect.

You then need two automations:

One that is triggered whenever there is an object detection and saves the snapshot in the corresponding folder.
To avoid these folders from getting too large use Delete Files Home Assistant or a similar integration to delete old files.

Making a request

To make a request, your spec should have two inputs: object, prompt.
To get the folder contents you could use the built-in folder sensor:

sensor:
  - platform: folder
    folder: "/config/snapshots/"

With the following template you can get the contents of your snapshots folder:

{%- set folders = namespace(value=[]) %}
{%- for folder in state_attr("sensor.snapshots", "file_list") | sort %}
    {%- set folders.value = folders.value + [folder] %}
{%- endfor %}

Then you send all files in the object folder.
The include file_name option is needed because even if you have timestamps on your snapshots, LLMs are too susceptible to halluzination.

I hope this helpful. If you do end up writing the spec, please share it as I think this is a very interesting use case.
And if you need help, feel free to reach out.

CircuitSetup · June 23, 2024, 2:35pm

Thanks for the detailed reply!

I’m trying to figure out if I can get recognition images from the NVR that are already categorized. It does this on the Unifi end somehow, at least for people, cars, animals and packages (and just movement).