[Custom Component] extended_openai_conversation: Let's control entities via ChatGPT

Simone77 · May 21, 2024, 2:51am

I just tested it. it’s really impressive !
I am trying to write a spec to ask openai to analyze the camera

The-Erf · May 21, 2024, 4:55am

I’m waiting, friend

hitnrun30 · May 21, 2024, 8:33pm

awesome, thank you so much

Simone77 · May 22, 2024, 10:05am

And… we can now ask assist to analyze the images / cameras !!!

Here you go, works perfectly !

Require GPT-4 Vision created by @valentinfrlch

This is the spec:

- spec:
    name: vision
    description: Analyze images
    parameters:
      type: object
      properties:
        message:
          type: string
          description: Analyze the images as requested by the user
      required:
      - request
  function:
    type: script
    sequence:
    - service: gpt4vision.image_analyzer
      data:
        max_tokens: 400
        message: "{{request}}"
        image_file: |-
          /media/Allarme_Camera.jpg
          /media/Allarme_Sala1.jpg
          /media/Snapshot_Giardino1_20240425-090813.jpg
        provider: OpenAI
        model: gpt-4-vision-preview
        target_width: 1280
        temperature: 0.5
      response_variable: _function_result

You have to provide the path to your camera’s snapshots (you might want to create an automation to create snapshots every x minutes)

I also adedd to the prompt template:

If I ask you to analyze an image or a camera, use the spec vision

In action:

valentinfrlch · May 22, 2024, 10:40am

Thanks for sharing!
May I add your script to the wiki of the gpt4vision repository? I think this would be a great source of inspiration for others.

Simone77 · May 22, 2024, 11:10am

Absolutely

stban1983 · May 24, 2024, 12:29pm

One thing i would like to do, and i don’t no if it’s possible:
I would like to ask something to Assist and say: do it in 20 minutes

like:
“Please turn the lights off in 20 minutes”

Do you think it’s possible ?

cormoo · May 24, 2024, 3:02pm

I have Ollama setup on WSL on a different machine than I am using with Home Assistant but within the same network. I want to be able to use this extension with my Ollama setup but I am not having luck. Is it possible to use Ollama with this?

I have tried using many different base url but they are all some variant of the ip address ollama is on http://192.168.5.16/v1

valentinfrlch · May 25, 2024, 5:58am

Can you access the 11434 port from your Home Assistant (or any other device on the network?) If you haven’t already, you need to set OLLAMA_HOST to 0.0.0.0 inside WSL. You can test if Ollama is accessible by going to 192.168.5.16:11434. It should say something like Ollama is running.

I think there might be some problems with the endpoints. They are not the same. OpenAI uses /v1/chat/completions whereas Ollama uses /api/chat.
The json sent has the same keys as far as I can tell but the responses are different again.

cormoo · May 25, 2024, 1:18pm

Yes I can access 192.168.5.16:11434 on devices in my network and get the Ollama is running confirmation and have set OLLAMA_HOST to 0.0.0.0. No matter what IP address or combination I put in Extended OpenAI Conversation when adding a service I get “Failed to Connect”.

The-Erf · May 25, 2024, 1:32pm

wow
Finally, someone understood what I meant!
When I said that I don’t want to write a separate script for each prompt and I want the prompt to be variable, I meant exactly that.
I checked it and the truth is that it works great

I wrote an automation that takes a snapshot from the camera whenever the wake word is said and saves it
But there is a small problem that I really want to fix
50% of the time when I tell him, for example, what do you see on the camera or how many people do you see on the camera? The answer received by the voice agent is that I do not have the ability to analyze the images, but I can turn the camera on or off for you.
I’m new but I feel the problem is that “gpt doesn’t understand that it should use the function you wrote andemphasized text it does call the ha-gpt4vision service at all”

I tried in promt
I wrote extended openai conversion to use the ha-gpt4vision service if there is a request about the camera, but it had no effect…
However, it still understands well 50% of the time and answers our questions about the camera analysis, and I am very grateful to you. @Simone77
I just got to know the big world of ha and I still can’t think of an idea to solve this problem. How about you?
The problem is minor, but he does not understand exactly when to use this function

valentinfrlch · May 25, 2024, 1:38pm

Extended OpenAI Conversation probably validates the IP and port by sending a request to a specific endpoint which may not exist on Ollama. It is unlikely that Extended OpenAI Conversation works with Ollama out of the box.
But why would it anyway? It is not advertised anywhere that Ollama is supported…

The closest you’re going to get is probably the official Ollama integration.
If you’re interested in multi-modal conversations (images for now) with Ollama then this might be for you: gpt4vision.

cormoo · May 25, 2024, 2:56pm

ok thank you, I just read comments here in this thread of a few that say they have it working with Ollama and thought I would have at it. I know Ollama recently added and API so I was hoping it would work as a replacement to OpenAI API such as others have indicated Local AI can do. I will take a look at the suggestions as an alt.

valentinfrlch · May 26, 2024, 3:13pm

Asking OpenAI Extended about security cameras

The script by @Simone77 works well but it requires a script to capture the images every x minutes which means the’ll likely be out of date by the time you ask about them.

So I improved on the spec. The LLM will dynamically consider which camera entities to include (you need to expose them via Assist).
It then captures a snapshot on each of the cameras and then passes them all into one single call to gpt4vision:

Requires gpt4vision (HACS custom component)

Example: “Is someone at the front door?”
The LLM understands that you want to know about the front door and therefore only passes your front door camera to gpt4vision.

Or: “What’s happening around the house?”
The LLM will pass all available cameras to gpt4vision and respond appropriately.

- spec:
    name: describe_camera_feed
    description: Get a description whats happening on security cameras around the house
    parameters:
      type: object
      properties:
        message:
          type: string
          description: The prompt for the image analyzer
        entity_ids:
          type: array
          description: List of camera entities
          items:
            type: string
            description: Entity id of the camera
      required:
      - message
      - entity_ids
  function:
    type: script
    sequence:
    - repeat:
        sequence:
          - service: camera.snapshot
            metadata: {}
            data:
              filename: /config/www/tmp/{{repeat.item}}.jpg
            target:
              entity_id: "{{repeat.item}}"
        for_each: "{{ entity_ids }}"
    - service: gpt4vision.image_analyzer
      metadata: {}
      data:
        provider: Ollama
        max_tokens: 100
        target_width: 1000
        temperature: 0.3
        image_file: |-
          {%for camera in entity_ids%}/config/www/tmp/{{camera}}.jpg
          {%endfor%}
        message: "{{message}}"
      response_variable: _function_result

Hope this helps!

Rich37804 · May 27, 2024, 10:47pm

I added the following to the config area. During any false activations it will respond with “Cancelled” instead of a long response on how its here to help us with our smart homes…This has been working well.
If a request seems like it may be an accidental prompt,or makes no sense, do nothing and respond with “Cancelled”

Rich37804 · May 27, 2024, 11:08pm

Im getting an error
Unexpected error during intent recognition
I know its taking the snapshot. I can see it in the folder. I know the gpt4vision is working, Ive tested it in Developer tools.
Where you have Ollama for the provider, should I be using something else?
Edit: I got it. I changed Ollama to OpenAi

valentinfrlch · May 28, 2024, 6:09am

I have added a detailed setup guide in the wiki: here.

bartonbrownings · May 29, 2024, 3:10pm

The image is successfully created in the tmp folder but I keep getting Something went wrong: invalid_image_path as a response. I am using GPT4o

bartonbrownings · May 29, 2024, 3:19pm

I think this may be because I have camera entities that don’t allow snapshot function. Like my nest doorbell and the map from my vacuum. Could you give an option to list only the cameras the spec can use?

valentinfrlch · May 29, 2024, 4:46pm

If the image has actually been created, then this is not the issue. Have you modified the paths in any way? If so, did you change them in both places in the spec?