GPT-4 vision image analysis in Home Assistant

For now you can take a look at these examples:

These can be triggered when a person is detected by frigate for example.
The next release will include support for image entities for better support for frigate.

1 Like

For all the frigate users:
v0.3.8 has just been released which now supports entities of the image domain. If you have the frigate integration installed and have object recognition enabled, frigate will provide such image entities for every camera and every object type.

You can even use this new image_entity input together with the existing image_file to submit multiple images from both frigate and the file system in one call.

This could be useful if you want to add a reference photo vs how it looks right now.

Cheers!
-valentin

1 Like

Awesome work with this integration! Thanks for making it! I have it responding to what is currently happening around the house via snapshots, and it gives a very accurate descriptions

Any ideas on getting this to work with Unifi cameras and Extended-OpenAI-Conversation, to look at specific camera past motion events, so I can ask things like “when was the last time someone was in the back yard with a blue shirt on?” or “was there a white car in the driveway, if so, what time?” ?

I’m sure it would cut down on the processing time if things like car, animal, and person detections from those specific cameras could be processed vs all detections from all cameras, over x amount of time.

@WouterN Thanks very much for sharing the doorbell prompt, seems to work perfectly!

This is not currently possible as it requires sending the filename with the request. But I will add an option for that in the next release.

Some prerequisites

  • Object recognition: This is not a must, but it will work much better with
  • Snapshots folder: I propose folder structure as described below:
    This is to reduce the amount of images that need to be analyzed. If the LLM finds you want to know about a ‘white car’, it will only look at images in the car folder.
- snapshots
	- person
		- 2024-06-22-07:42.jpg
		- ...
	- car
		- ...
	- package
	- bicycle
	- ...

There should folder in snapshots with the exact name of the objects your integration can detect.

You then need two automations:

  • One that is triggered whenever there is an object detection and saves the snapshot in the corresponding folder.
  • To avoid these folders from getting too large use Delete Files Home Assistant or a similar integration to delete old files.

Making a request

To make a request, your spec should have two inputs: object, prompt.
To get the folder contents you could use the built-in folder sensor:

sensor:
  - platform: folder
    folder: "/config/snapshots/"

With the following template you can get the contents of your snapshots folder:

{%- set folders = namespace(value=[]) %}
{%- for folder in state_attr("sensor.snapshots", "file_list") | sort %}
    {%- set folders.value = folders.value + [folder] %}
{%- endfor %}

Then you send all files in the object folder.
The include file_name option is needed because even if you have timestamps on your snapshots, LLMs are too susceptible to halluzination.

I hope this helpful. If you do end up writing the spec, please share it as I think this is a very interesting use case.
And if you need help, feel free to reach out.

Thanks for the detailed reply!

I’m trying to figure out if I can get recognition images from the NVR that are already categorized. It does this on the Unifi end somehow, at least for people, cars, animals and packages (and just movement).

I had everything working great a month ago, and now I get a message saying “Error running action” - OpenAI provider is not configured.

I didn’t change a thing other than upgrade the integration. Any ideas where to change it so it works?

I uninstalled the OpenAI integration and re-added it and it worked.

This is probably due to a variable name change. A while back, gpt4vision was only compatible with OpenAI, so I renamed a variable from API_KEY to OPENAI_API_KEY for more consistency. I should have put this into the changelog as a warning, sorry! Glad it’s working again!

Thank you for creating this! I’m having trouble getting a script to work when I specify the image_entity, but it works perfectly when I specify the path to the image_file.

This works:

alias: write what you see
sequence:
  - service: gpt4vision.image_analyzer
    data:
      max_tokens: 100
      provider: OpenAI
      model: gpt-4o
      target_width: 512
      temperature: 0.5
      detail: low
      message: Please describe the scene
      image_file: /config/www/test/corridor.jpg
    response_variable: response
  - service: system_log.write
    metadata: {}
    data:
      level: error
      message: log the {{ response.response_text  }}
mode: single

but this does not (same code except I’m specifying my camera entity instead of the image file:

alias: write what you see (streaming)
sequence:
  - service: gpt4vision.image_analyzer
    data:
      max_tokens: 100
      provider: OpenAI
      model: gpt-4o
      target_width: 512
      temperature: 0.5
      detail: low
      message: Please describe the scene
      image_entity:
        - camera.corridor
    response_variable: response
  - service: system_log.write
    metadata: {}
    data:
      level: error
      message: log the {{ response.response_text  }}
mode: single

My possibly inaccurate understanding is that I can just use the image_entity of my camera and it can parse the stream for the image at the time the script is called? When I try running the script I get the following error:

“Failed to call service script/write_what_you_see_streaming. cannot access local variable ‘client’ where it is not associated with a value”

Or maybe I have to use the entity_picture of my camera as the image_entity? These are the attributes from my camera: (token is cut off so I don’t think my security is compromised by pasting this)

Thank you for your help!