[Custom Component] extended_openai_conversation: Let's control entities via ChatGPT

Is there a better way to send a consolidated/smaller payload than sending all of the entities every time? My bill would be huge if I used this properly!

1 Like

Is there a way to make the notification work with the Home Assistant mobile app? I would love to say “notify xxxx that the food is ready” and it know who xxxx is.

Silly question - I see there are lots of functions that are being shared for really interesting things. Can these be combined? If so, how is that done? Just concatenate the -spec underneath the previous one in the UI and modify the prompt to describe why/when the new function is used?

Yes, just set the notify.YOURDEVICE created by the companion app, for example notify.mobile_app_redmi_note_8_pro is my phone enity and it is sending notification to my home assistant app

Yes, just copy and paste the functions you want one after one.

Hello
What I have in mind is that gpt can analyze the images of the cameras connected to the home assistant
for example :
How many people do you see on the camera?
Or what is the color of their clothes?
Do they look suspicious?

In the first step, I tried to set its language model to gpt-4o in the extended open ai conversion settings
as a result :
The response speed is relatively better
But when I asked him to analyze the camera images, he replied that I don’t have access to cameras or that I don’t have the ability to process images.

After a little searching, I found this: GPT-4o vision capabilities in Home Assistant
I installed it and after 1 day
I succeeded!
in such a way that
When I say open ai conversion
“what do you see?”
1- My automation or script is executed
2- A photo is taken from the camera I specified
3-Then I send that photo to ha-gpt4vision
4- The response of ha-gpt4vision is converted to sound with tts

If I’m honest, the result is good. lol:)
But its problems are many
For example, it is very limited
Or sometimes its tts sound interferes with openai conversion (tts sounds are played at the same time)

Or I have to write a lot of scripts to run ha-gpt4vision (for example, if the word x is said, take a picture and analyze the picture.
If the word b is said, take a picture and say what it is used for.
If the word c is said, take a picture and tell if the person in the picture is suspicious or not.
In this way, you have to write a lot of scripts to analyze each different photo

I’m looking for a way to not write scripts
For example, extended open ai conversion can directly access the cameras, and when we say for example, what do you see in the camera? Analyze the camera image in real time with GPT-4O

In the end, I hope I have explained correctly and you understand because I used Google translator :heart:

I would need to create one per person right? Is there a way to use the chatgpt todo so it knows what phone I want to send a message to?

Ok I modified the script this way so that you can specify the service in the todo or in the prompt.
I tested it adding this to the prompt ant ito works:

To send a notification to someone, use the following services and the spec notify_smartphone:
Simone notify.mobile_app_redmi_note_8_pro
Alessia notify.mobile_app_redmi_11_pro_alessia

This is the updated script:

- spec:
    name: notify_smartphone
    description: Send a notification to the smartphone
    parameters:
      type: object
      properties:
        notification:
          type: string
          description: The notification to be sent and the service to be used
      required:
      - notification
      - service
  function:
    type: script
    sequence:
    - service: '{{service}}'
      data:
        message: '{{notification}}'

I just tested it. it’s really impressive !
I am trying to write a spec to ask openai to analyze the camera

2 Likes

I’m waiting, friend :smiling_face_with_tear:

awesome, thank you so much

And… we can now ask assist to analyze the images / cameras !!!

Here you go, works perfectly ! :smiley:

Require GPT-4 Vision created by @valentinfrlch

This is the spec:

- spec:
    name: vision
    description: Analyze images
    parameters:
      type: object
      properties:
        message:
          type: string
          description: Analyze the images as requested by the user
      required:
      - request
  function:
    type: script
    sequence:
    - service: gpt4vision.image_analyzer
      data:
        max_tokens: 400
        message: "{{request}}"
        image_file: |-
          /media/Allarme_Camera.jpg
          /media/Allarme_Sala1.jpg
          /media/Snapshot_Giardino1_20240425-090813.jpg
        provider: OpenAI
        model: gpt-4-vision-preview
        target_width: 1280
        temperature: 0.5
      response_variable: _function_result

You have to provide the path to your camera’s snapshots (you might want to create an automation to create snapshots every x minutes)

I also adedd to the prompt template:

If I ask you to analyze an image or a camera, use the spec vision

In action:

5 Likes

Thanks for sharing!
May I add your script to the wiki of the gpt4vision repository? I think this would be a great source of inspiration for others.

1 Like

Absolutely :smiley:

2 Likes

One thing i would like to do, and i don’t no if it’s possible:
I would like to ask something to Assist and say: do it in 20 minutes

like:
“Please turn the lights off in 20 minutes”

Do you think it’s possible ?

1 Like

I have Ollama setup on WSL on a different machine than I am using with Home Assistant but within the same network. I want to be able to use this extension with my Ollama setup but I am not having luck. Is it possible to use Ollama with this?

I have tried using many different base url but they are all some variant of the ip address ollama is on http://192.168.5.16/v1

Can you access the 11434 port from your Home Assistant (or any other device on the network?) If you haven’t already, you need to set OLLAMA_HOST to 0.0.0.0 inside WSL. You can test if Ollama is accessible by going to 192.168.5.16:11434. It should say something like Ollama is running.

I think there might be some problems with the endpoints. They are not the same. OpenAI uses /v1/chat/completions whereas Ollama uses /api/chat.
The json sent has the same keys as far as I can tell but the responses are different again.

Yes I can access 192.168.5.16:11434 on devices in my network and get the Ollama is running confirmation and have set OLLAMA_HOST to 0.0.0.0. No matter what IP address or combination I put in Extended OpenAI Conversation when adding a service I get “Failed to Connect”.

wow :heart_eyes:
Finally, someone understood what I meant!
When I said that I don’t want to write a separate script for each prompt and I want the prompt to be variable, I meant exactly that.
I checked it and the truth is that it works great :no_mouth:

I wrote an automation that takes a snapshot from the camera whenever the wake word is said and saves it
But there is a small problem that I really want to fix :smile:
50% of the time when I tell him, for example, what do you see on the camera or how many people do you see on the camera? The answer received by the voice agent is that I do not have the ability to analyze the images, but I can turn the camera on or off for you.
I’m new but I feel the problem is that “gpt doesn’t understand that it should use the function you wrote andemphasized text it does call the ha-gpt4vision service at all”

I tried in promt
I wrote extended openai conversion to use the ha-gpt4vision service if there is a request about the camera, but it had no effect…
However, it still understands well 50% of the time and answers our questions about the camera analysis, and I am very grateful to you. :heart_eyes: :heartbeat: @Simone77
I just got to know the big world of ha and I still can’t think of an idea to solve this problem. How about you?
The problem is minor, but he does not understand exactly when to use this function

1 Like

Extended OpenAI Conversation probably validates the IP and port by sending a request to a specific endpoint which may not exist on Ollama. It is unlikely that Extended OpenAI Conversation works with Ollama out of the box.
But why would it anyway? It is not advertised anywhere that Ollama is supported…

The closest you’re going to get is probably the official Ollama integration.
If you’re interested in multi-modal conversations (images for now) with Ollama then this might be for you: gpt4vision.