[Custom Component] extended_openai_conversation: Let's control entities via ChatGPT

I just tested it. it’s really impressive !
I am trying to write a spec to ask openai to analyze the camera


I’m waiting, friend :smiling_face_with_tear:

awesome, thank you so much

And… we can now ask assist to analyze the images / cameras !!!

Here you go, works perfectly ! :smiley:

Require GPT-4 Vision created by @valentinfrlch

This is the spec:

- spec:
    name: vision
    description: Analyze images
      type: object
          type: string
          description: Analyze the images as requested by the user
      - request
    type: script
    - service: gpt4vision.image_analyzer
        max_tokens: 400
        message: "{{request}}"
        image_file: |-
        provider: OpenAI
        model: gpt-4-vision-preview
        target_width: 1280
        temperature: 0.5
      response_variable: _function_result

You have to provide the path to your camera’s snapshots (you might want to create an automation to create snapshots every x minutes)

I also adedd to the prompt template:

If I ask you to analyze an image or a camera, use the spec vision

In action:


Thanks for sharing!
May I add your script to the wiki of the gpt4vision repository? I think this would be a great source of inspiration for others.

1 Like

Absolutely :smiley:


One thing i would like to do, and i don’t no if it’s possible:
I would like to ask something to Assist and say: do it in 20 minutes

“Please turn the lights off in 20 minutes”

Do you think it’s possible ?

1 Like

I have Ollama setup on WSL on a different machine than I am using with Home Assistant but within the same network. I want to be able to use this extension with my Ollama setup but I am not having luck. Is it possible to use Ollama with this?

I have tried using many different base url but they are all some variant of the ip address ollama is on

Can you access the 11434 port from your Home Assistant (or any other device on the network?) If you haven’t already, you need to set OLLAMA_HOST to inside WSL. You can test if Ollama is accessible by going to It should say something like Ollama is running.

I think there might be some problems with the endpoints. They are not the same. OpenAI uses /v1/chat/completions whereas Ollama uses /api/chat.
The json sent has the same keys as far as I can tell but the responses are different again.

Yes I can access on devices in my network and get the Ollama is running confirmation and have set OLLAMA_HOST to No matter what IP address or combination I put in Extended OpenAI Conversation when adding a service I get “Failed to Connect”.

wow :heart_eyes:
Finally, someone understood what I meant!
When I said that I don’t want to write a separate script for each prompt and I want the prompt to be variable, I meant exactly that.
I checked it and the truth is that it works great :no_mouth:

I wrote an automation that takes a snapshot from the camera whenever the wake word is said and saves it
But there is a small problem that I really want to fix :smile:
50% of the time when I tell him, for example, what do you see on the camera or how many people do you see on the camera? The answer received by the voice agent is that I do not have the ability to analyze the images, but I can turn the camera on or off for you.
I’m new but I feel the problem is that “gpt doesn’t understand that it should use the function you wrote andemphasized text it does call the ha-gpt4vision service at all”

I tried in promt
I wrote extended openai conversion to use the ha-gpt4vision service if there is a request about the camera, but it had no effect…
However, it still understands well 50% of the time and answers our questions about the camera analysis, and I am very grateful to you. :heart_eyes: :heartbeat: @Simone77
I just got to know the big world of ha and I still can’t think of an idea to solve this problem. How about you?
The problem is minor, but he does not understand exactly when to use this function

1 Like

Extended OpenAI Conversation probably validates the IP and port by sending a request to a specific endpoint which may not exist on Ollama. It is unlikely that Extended OpenAI Conversation works with Ollama out of the box.
But why would it anyway? It is not advertised anywhere that Ollama is supported…

The closest you’re going to get is probably the official Ollama integration.
If you’re interested in multi-modal conversations (images for now) with Ollama then this might be for you: gpt4vision.

ok thank you, I just read comments here in this thread of a few that say they have it working with Ollama and thought I would have at it. I know Ollama recently added and API so I was hoping it would work as a replacement to OpenAI API such as others have indicated Local AI can do. I will take a look at the suggestions as an alt.

1 Like

Asking OpenAI Extended about security cameras

The script by @Simone77 works well but it requires a script to capture the images every x minutes which means the’ll likely be out of date by the time you ask about them.

So I improved on the spec. The LLM will dynamically consider which camera entities to include (you need to expose them via Assist).
It then captures a snapshot on each of the cameras and then passes them all into one single call to gpt4vision:

:bulb: Requires gpt4vision (HACS custom component)

Example: “Is someone at the front door?”
The LLM understands that you want to know about the front door and therefore only passes your front door camera to gpt4vision.

Or: “What’s happening around the house?”
The LLM will pass all available cameras to gpt4vision and respond appropriately.

- spec:
    name: describe_camera_feed
    description: Get a description whats happening on security cameras around the house
      type: object
          type: string
          description: The prompt for the image analyzer
          type: array
          description: List of camera entities
            type: string
            description: Entity id of the camera
      - message
      - entity_ids
    type: script
    - repeat:
          - service: camera.snapshot
            metadata: {}
              filename: /config/www/tmp/{{repeat.item}}.jpg
              entity_id: "{{repeat.item}}"
        for_each: "{{ entity_ids }}"
    - service: gpt4vision.image_analyzer
      metadata: {}
        provider: Ollama
        max_tokens: 100
        target_width: 1000
        temperature: 0.3
        image_file: |-
          {%for camera in entity_ids%}/config/www/tmp/{{camera}}.jpg
        message: "{{message}}"
      response_variable: _function_result

Hope this helps!


I added the following to the config area. During any false activations it will respond with “Cancelled” instead of a long response on how its here to help us with our smart homes…This has been working well.
If a request seems like it may be an accidental prompt,or makes no sense, do nothing and respond with “Cancelled”

Im getting an error
Unexpected error during intent recognition
I know its taking the snapshot. I can see it in the folder. I know the gpt4vision is working, Ive tested it in Developer tools.
Where you have Ollama for the provider, should I be using something else?
Edit: I got it. I changed Ollama to OpenAi

1 Like

I have added a detailed setup guide in the wiki: here.

1 Like

The image is successfully created in the tmp folder but I keep getting Something went wrong: invalid_image_path as a response. I am using GPT4o

I think this may be because I have camera entities that don’t allow snapshot function. Like my nest doorbell and the map from my vacuum. Could you give an option to list only the cameras the spec can use?

If the image has actually been created, then this is not the issue. Have you modified the paths in any way? If so, did you change them in both places in the spec?