[Custom Component] extended_openai_conversation: Let's control entities via ChatGPT

I would need to create one per person right? Is there a way to use the chatgpt todo so it knows what phone I want to send a message to?

Ok I modified the script this way so that you can specify the service in the todo or in the prompt.
I tested it adding this to the prompt ant ito works:

To send a notification to someone, use the following services and the spec notify_smartphone:
Simone notify.mobile_app_redmi_note_8_pro
Alessia notify.mobile_app_redmi_11_pro_alessia

This is the updated script:

- spec:
    name: notify_smartphone
    description: Send a notification to the smartphone
    parameters:
      type: object
      properties:
        notification:
          type: string
          description: The notification to be sent and the service to be used
      required:
      - notification
      - service
  function:
    type: script
    sequence:
    - service: '{{service}}'
      data:
        message: '{{notification}}'

I just tested it. it’s really impressive !
I am trying to write a spec to ask openai to analyze the camera

2 Likes

I’m waiting, friend :smiling_face_with_tear:

awesome, thank you so much

And… we can now ask assist to analyze the images / cameras !!!

Here you go, works perfectly ! :smiley:

Require GPT-4 Vision created by @valentinfrlch

This is the spec:

- spec:
    name: vision
    description: Analyze images
    parameters:
      type: object
      properties:
        message:
          type: string
          description: Analyze the images as requested by the user
      required:
      - request
  function:
    type: script
    sequence:
    - service: gpt4vision.image_analyzer
      data:
        max_tokens: 400
        message: "{{request}}"
        image_file: |-
          /media/Allarme_Camera.jpg
          /media/Allarme_Sala1.jpg
          /media/Snapshot_Giardino1_20240425-090813.jpg
        provider: OpenAI
        model: gpt-4-vision-preview
        target_width: 1280
        temperature: 0.5
      response_variable: _function_result

You have to provide the path to your camera’s snapshots (you might want to create an automation to create snapshots every x minutes)

I also adedd to the prompt template:

If I ask you to analyze an image or a camera, use the spec vision

In action:

5 Likes

Thanks for sharing!
May I add your script to the wiki of the gpt4vision repository? I think this would be a great source of inspiration for others.

1 Like

Absolutely :smiley:

2 Likes

One thing i would like to do, and i don’t no if it’s possible:
I would like to ask something to Assist and say: do it in 20 minutes

like:
“Please turn the lights off in 20 minutes”

Do you think it’s possible ?

1 Like

I have Ollama setup on WSL on a different machine than I am using with Home Assistant but within the same network. I want to be able to use this extension with my Ollama setup but I am not having luck. Is it possible to use Ollama with this?

I have tried using many different base url but they are all some variant of the ip address ollama is on http://192.168.5.16/v1

Can you access the 11434 port from your Home Assistant (or any other device on the network?) If you haven’t already, you need to set OLLAMA_HOST to 0.0.0.0 inside WSL. You can test if Ollama is accessible by going to 192.168.5.16:11434. It should say something like Ollama is running.

I think there might be some problems with the endpoints. They are not the same. OpenAI uses /v1/chat/completions whereas Ollama uses /api/chat.
The json sent has the same keys as far as I can tell but the responses are different again.

Yes I can access 192.168.5.16:11434 on devices in my network and get the Ollama is running confirmation and have set OLLAMA_HOST to 0.0.0.0. No matter what IP address or combination I put in Extended OpenAI Conversation when adding a service I get “Failed to Connect”.

wow :heart_eyes:
Finally, someone understood what I meant!
When I said that I don’t want to write a separate script for each prompt and I want the prompt to be variable, I meant exactly that.
I checked it and the truth is that it works great :no_mouth:

I wrote an automation that takes a snapshot from the camera whenever the wake word is said and saves it
But there is a small problem that I really want to fix :smile:
50% of the time when I tell him, for example, what do you see on the camera or how many people do you see on the camera? The answer received by the voice agent is that I do not have the ability to analyze the images, but I can turn the camera on or off for you.
I’m new but I feel the problem is that “gpt doesn’t understand that it should use the function you wrote andemphasized text it does call the ha-gpt4vision service at all”

I tried in promt
I wrote extended openai conversion to use the ha-gpt4vision service if there is a request about the camera, but it had no effect…
However, it still understands well 50% of the time and answers our questions about the camera analysis, and I am very grateful to you. :heart_eyes: :heartbeat: @Simone77
I just got to know the big world of ha and I still can’t think of an idea to solve this problem. How about you?
The problem is minor, but he does not understand exactly when to use this function

1 Like

Extended OpenAI Conversation probably validates the IP and port by sending a request to a specific endpoint which may not exist on Ollama. It is unlikely that Extended OpenAI Conversation works with Ollama out of the box.
But why would it anyway? It is not advertised anywhere that Ollama is supported…

The closest you’re going to get is probably the official Ollama integration.
If you’re interested in multi-modal conversations (images for now) with Ollama then this might be for you: gpt4vision.

ok thank you, I just read comments here in this thread of a few that say they have it working with Ollama and thought I would have at it. I know Ollama recently added and API so I was hoping it would work as a replacement to OpenAI API such as others have indicated Local AI can do. I will take a look at the suggestions as an alt.

1 Like

Asking OpenAI Extended about security cameras

The script by @Simone77 works well but it requires a script to capture the images every x minutes which means the’ll likely be out of date by the time you ask about them.

So I improved on the spec. The LLM will dynamically consider which camera entities to include (you need to expose them via Assist).
It then captures a snapshot on each of the cameras and then passes them all into one single call to gpt4vision:

:bulb: Requires gpt4vision (HACS custom component)

Example: “Is someone at the front door?”
The LLM understands that you want to know about the front door and therefore only passes your front door camera to gpt4vision.

Or: “What’s happening around the house?”
The LLM will pass all available cameras to gpt4vision and respond appropriately.

- spec:
    name: describe_camera_feed
    description: Get a description whats happening on security cameras around the house
    parameters:
      type: object
      properties:
        message:
          type: string
          description: The prompt for the image analyzer
        entity_ids:
          type: array
          description: List of camera entities
          items:
            type: string
            description: Entity id of the camera
      required:
      - message
      - entity_ids
  function:
    type: script
    sequence:
    - repeat:
        sequence:
          - service: camera.snapshot
            metadata: {}
            data:
              filename: /config/www/tmp/{{repeat.item}}.jpg
            target:
              entity_id: "{{repeat.item}}"
        for_each: "{{ entity_ids }}"
    - service: gpt4vision.image_analyzer
      metadata: {}
      data:
        provider: Ollama
        max_tokens: 100
        target_width: 1000
        temperature: 0.3
        image_file: |-
          {%for camera in entity_ids%}/config/www/tmp/{{camera}}.jpg
          {%endfor%}
        message: "{{message}}"
      response_variable: _function_result

Hope this helps!

6 Likes

I added the following to the config area. During any false activations it will respond with “Cancelled” instead of a long response on how its here to help us with our smart homes…This has been working well.
If a request seems like it may be an accidental prompt,or makes no sense, do nothing and respond with “Cancelled”

Im getting an error
Unexpected error during intent recognition
I know its taking the snapshot. I can see it in the folder. I know the gpt4vision is working, Ive tested it in Developer tools.
Where you have Ollama for the provider, should I be using something else?
Edit: I got it. I changed Ollama to OpenAi

1 Like

I have added a detailed setup guide in the wiki: here.

1 Like

The image is successfully created in the tmp folder but I keep getting Something went wrong: invalid_image_path as a response. I am using GPT4o