I just tested it. it’s really impressive !
I am trying to write a spec to ask openai to analyze the camera
I’m waiting, friend
awesome, thank you so much
And… we can now ask assist to analyze the images / cameras !!!
Here you go, works perfectly !
Require GPT-4 Vision created by @valentinfrlch
This is the spec:
- spec:
name: vision
description: Analyze images
parameters:
type: object
properties:
message:
type: string
description: Analyze the images as requested by the user
required:
- request
function:
type: script
sequence:
- service: gpt4vision.image_analyzer
data:
max_tokens: 400
message: "{{request}}"
image_file: |-
/media/Allarme_Camera.jpg
/media/Allarme_Sala1.jpg
/media/Snapshot_Giardino1_20240425-090813.jpg
provider: OpenAI
model: gpt-4-vision-preview
target_width: 1280
temperature: 0.5
response_variable: _function_result
You have to provide the path to your camera’s snapshots (you might want to create an automation to create snapshots every x minutes)
I also adedd to the prompt template:
If I ask you to analyze an image or a camera, use the spec vision
In action:
Thanks for sharing!
May I add your script to the wiki of the gpt4vision repository? I think this would be a great source of inspiration for others.
Absolutely
One thing i would like to do, and i don’t no if it’s possible:
I would like to ask something to Assist and say: do it in 20 minutes
like:
“Please turn the lights off in 20 minutes”
Do you think it’s possible ?
I have Ollama setup on WSL on a different machine than I am using with Home Assistant but within the same network. I want to be able to use this extension with my Ollama setup but I am not having luck. Is it possible to use Ollama with this?
I have tried using many different base url but they are all some variant of the ip address ollama is on http://192.168.5.16/v1
Can you access the 11434 port from your Home Assistant (or any other device on the network?) If you haven’t already, you need to set OLLAMA_HOST
to 0.0.0.0
inside WSL. You can test if Ollama is accessible by going to 192.168.5.16:11434. It should say something like Ollama is running
.
I think there might be some problems with the endpoints. They are not the same. OpenAI uses /v1/chat/completions
whereas Ollama uses /api/chat
.
The json sent has the same keys as far as I can tell but the responses are different again.
Yes I can access 192.168.5.16:11434 on devices in my network and get the Ollama is running confirmation and have set OLLAMA_HOST to 0.0.0.0. No matter what IP address or combination I put in Extended OpenAI Conversation when adding a service I get “Failed to Connect”.
wow
Finally, someone understood what I meant!
When I said that I don’t want to write a separate script for each prompt and I want the prompt to be variable, I meant exactly that.
I checked it and the truth is that it works great
I wrote an automation that takes a snapshot from the camera whenever the wake word is said and saves it
But there is a small problem that I really want to fix
50% of the time when I tell him, for example, what do you see on the camera or how many people do you see on the camera? The answer received by the voice agent is that I do not have the ability to analyze the images, but I can turn the camera on or off for you.
I’m new but I feel the problem is that “gpt doesn’t understand that it should use the function you wrote andemphasized text it does call the ha-gpt4vision service at all”
I tried in promt
I wrote extended openai conversion to use the ha-gpt4vision service if there is a request about the camera, but it had no effect…
However, it still understands well 50% of the time and answers our questions about the camera analysis, and I am very grateful to you. @Simone77
I just got to know the big world of ha and I still can’t think of an idea to solve this problem. How about you?
The problem is minor, but he does not understand exactly when to use this function
Extended OpenAI Conversation probably validates the IP and port by sending a request to a specific endpoint which may not exist on Ollama. It is unlikely that Extended OpenAI Conversation works with Ollama out of the box.
But why would it anyway? It is not advertised anywhere that Ollama is supported…
The closest you’re going to get is probably the official Ollama integration.
If you’re interested in multi-modal conversations (images for now) with Ollama then this might be for you: gpt4vision.
ok thank you, I just read comments here in this thread of a few that say they have it working with Ollama and thought I would have at it. I know Ollama recently added and API so I was hoping it would work as a replacement to OpenAI API such as others have indicated Local AI can do. I will take a look at the suggestions as an alt.
Asking OpenAI Extended about security cameras
The script by @Simone77 works well but it requires a script to capture the images every x minutes which means the’ll likely be out of date by the time you ask about them.
So I improved on the spec. The LLM will dynamically consider which camera entities to include (you need to expose them via Assist).
It then captures a snapshot on each of the cameras and then passes them all into one single call to gpt4vision:
Requires gpt4vision (HACS custom component)
Example: “Is someone at the front door?”
The LLM understands that you want to know about the front door and therefore only passes your front door camera to gpt4vision.
Or: “What’s happening around the house?”
The LLM will pass all available cameras to gpt4vision and respond appropriately.
- spec:
name: describe_camera_feed
description: Get a description whats happening on security cameras around the house
parameters:
type: object
properties:
message:
type: string
description: The prompt for the image analyzer
entity_ids:
type: array
description: List of camera entities
items:
type: string
description: Entity id of the camera
required:
- message
- entity_ids
function:
type: script
sequence:
- repeat:
sequence:
- service: camera.snapshot
metadata: {}
data:
filename: /config/www/tmp/{{repeat.item}}.jpg
target:
entity_id: "{{repeat.item}}"
for_each: "{{ entity_ids }}"
- service: gpt4vision.image_analyzer
metadata: {}
data:
provider: Ollama
max_tokens: 100
target_width: 1000
temperature: 0.3
image_file: |-
{%for camera in entity_ids%}/config/www/tmp/{{camera}}.jpg
{%endfor%}
message: "{{message}}"
response_variable: _function_result
Hope this helps!
I added the following to the config area. During any false activations it will respond with “Cancelled” instead of a long response on how its here to help us with our smart homes…This has been working well.
If a request seems like it may be an accidental prompt,or makes no sense, do nothing and respond with “Cancelled”
Im getting an error
Unexpected error during intent recognition
I know its taking the snapshot. I can see it in the folder. I know the gpt4vision is working, Ive tested it in Developer tools.
Where you have Ollama for the provider, should I be using something else?
Edit: I got it. I changed Ollama to OpenAi
The image is successfully created in the tmp folder but I keep getting Something went wrong: invalid_image_path as a response. I am using GPT4o
I think this may be because I have camera entities that don’t allow snapshot function. Like my nest doorbell and the map from my vacuum. Could you give an option to list only the cameras the spec can use?
If the image has actually been created, then this is not the issue. Have you modified the paths in any way? If so, did you change them in both places in the spec?