GPT-4o vision capabilities in Home Assistant

Hey all,

ha-gpt4vision is a custom component available in HACS. It brings GPT-4o’s vision capabilities to Home Assistant as a service.

Responses are returned as response variables for easy use with automations. The usage possibilities are limitless. You could request a car’s license plate number when one is detected, create custom delivery announcements, or set an alarm to trigger when suspicious activity is detected.

Installation Instructions

Installation via HACS

For the most up-to-date instructions, check ha-gpt4vision on Github

  1. Add the repository’s url (GitHub - valentinfrlch/ha-gpt4vision: Image Analyzer for Home Assistant using GPT-4o) to HACS under custom repositories.
  2. Install through HACS
  3. Add the following code to your configuration.yaml:
# gpt4vision service setup
gpt4vision:
  api: "[Your OpenAI API key]"
  1. Restart Home Assistant

Manual Installation

  1. Download and copy the folder gpt4vision from the GitHub repository to your custom_components folder.
  2. Add the following code to your configuration.yaml
# gpt4vision service setup
gpt4vision:
  api: "[Your OpenAI API key]"
  1. Restart Home Assistant to load the gpt4vision custom_component.

Service call and usage

After restarting, the gpt4vision.image_analyzer service will be available. You can test it in the developer tools section in home assistant. To get GPT’s analysis of a local image, use the following service call.

service: gpt4vision.image_analyzer
data:
  message: [Prompt message for AI]
  model: [model]
  image_file: [path for image file]
  target_width: [Target width for image downscaling]
  max_tokens: [maximum number of tokens]

The parameters message, max_tokens and image_file are mandatory for the execution of the service. Optionally, the model and the target_width can be set. For available models check this page: https://platform.openai.com/docs/models.

Automations Example

In automations the response can be accessed as {{response.response_text}} (if your response variable name is response:

sequence:
  - service: gpt4vision.image_analyzer
    metadata: {}
    data:
      message: Describe the person in the image
      image_file: /config/www/tmp/test.jpg
      max_tokens: 100
    response_variable: response
  - service: tts.speak
    metadata: {}
    data:
      cache: true
      media_player_entity_id: media_player.entity_id
      message: "{{response.response_text}}"
    target:
      entity_id: tts.tts_entity

Links

ha-gpt4vision on GitHub

GitHub - filipecanedo/ha-gpt4vision: Image Analyzer using GPT-4 Turbo with vision and Home Assistant (forked)

4 Likes