[Custom Component] extended_openai_conversation: Let's control entities via ChatGPT

I wanted to ask that as well. I like the additional control we get but I don’t know how we would configure additional tool calls in the official integration. Can anyone else chime in?

What you can do with the extended one witch you can’t in the official one?

Off the top of my head:

  1. Ability to point to alternative API endpoints for things like local models or other OpenAI API compatible services (like Groq)
  2. Control over what function calls/tools are available to the model
  3. Ability to set the max context length of a conversation thread (less useful than the first two)

I also think you get more control over how data is passed to the model, like the ability to use variables (don’t know if the official one supports this, it may).

There’s no way to let the agent read a pdf file? I’ve tried putting the info in the prompt like ‘read the file located here and tell me what you read’ but he says that he do not have file access. The file is located in the www directory so should be accessible

Same issue here.
Have you solved?

This is what I have found:

How to Use ChatGPT to Read PDF in 3 Easy Ways? | UPDF

You can just drop the pdf in the chatgpt intrrface and he can read it. i want to know how can i pass the file with the intergration

well, looks like i have some difficulties to control lights. It says that he cannot change the colour sometimes, sometimes says that the colour is changed but isn’t.
Oh and sometimes i get:

Something went wrong: Action light.set_color not found

Maybe i need some function for changing color?

Hi All,

I can’t get my head around solving an issue I experience using any of the database functions that can be found here: extended_openai_conversation/examples/function/sqlite at main · jekalmin/extended_openai_conversation · GitHub.

None of them work for me because HASS complains that it cannot open the database. Now, I am running MariaDB (mysql) instead of SQLITE but when I replace “sqlite” with “mysql” it says it’s an incorrect function. Rewriting the Sqlite queries/examples for mysql doesn’t help either.

So, for example, I want to use this function:

- spec:
    name: query_histories_from_db
    description: >-
      Use this function to query histories from Home Assistant SQLite database.
      Example:
        Question: When did bedroom light turn on?
        Answer: SELECT datetime(s.last_updated_ts, 'unixepoch', 'localtime') last_updated_ts FROM states s INNER JOIN states_meta sm ON s.metadata_id = sm.metadata_id INNER JOIN states old ON s.old_state_id = old.state_id WHERE sm.entity_id = 'light.bedroom' AND s.state = 'on' AND s.state != old.state ORDER BY s.last_updated_ts DESC LIMIT 1
        Question: Was livingroom light on at 9 am?
        Answer: SELECT datetime(s.last_updated_ts, 'unixepoch', 'localtime') last_updated, s.state FROM states s INNER JOIN states_meta sm ON s.metadata_id = sm.metadata_id INNER JOIN states old ON s.old_state_id = old.state_id WHERE sm.entity_id = 'switch.livingroom' AND s.state != old.state AND datetime(s.last_updated_ts, 'unixepoch', 'localtime') < '2023-11-17 08:00:00' ORDER BY s.last_updated_ts DESC LIMIT 1
    parameters:
      type: object
      properties:
        query:
          type: string
          description: A fully formed SQL query.
  function:
    type: sqlite

And HASS then complains as following:

Logger: homeassistant.components.assist_pipeline.pipeline
Source: components/assist_pipeline/pipeline.py:1015
integration: Assist pipeline (documentation, issues)
First occurred: 11 August 2024 at 23:11:34 (5 occurrences)
Last logged: 12:47:13

Unexpected error during intent recognition
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 1015, in recognize_intent
    conversation_result = await conversation.async_converse(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/conversation/agent_manager.py", line 108, in async_converse
    result = await method(conversation_input)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/extended_openai_conversation/__init__.py", line 196, in async_process
    query_response = await self.query(user_input, messages, exposed_entities, 0)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/extended_openai_conversation/__init__.py", line 380, in query
    return await self.execute_function_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/extended_openai_conversation/__init__.py", line 406, in execute_function_call
    return await self.execute_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/extended_openai_conversation/__init__.py", line 432, in execute_function
    result = await function_executor.execute(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/extended_openai_conversation/helpers.py", line 711, in execute
    with sqlite3.connect(db_url, uri=True) as conn:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: unable to open database file

I appreciate any hints, suggestions, advice what I can do to solve this (for me) big riddle. Thanks!

EDIT: just wanted to thank @jekalmin for this very cool integration - it really opens up many more possibilities. Please continue this great work!

Any of you have incosistent actions? Sometimes he says that he do a thing but he has not, or says that he cannot and the next time he do.
I’m using 4o-mini maybe with 4o it’s better?

Go to System > Logs, click “Load full logs”, then CTRL+F for “custom_components.extended_openai_conversation”.

That should show you exactly what the integration sent to and received from the model and might give you a bit more information. 4o-mini may not be perfect with tool calls, but adjusting the descriptions for each of the function calls you have (the ones listed under “-spec:” for more clarity may help improve reliability. Making sure to keep your exposed device list clean could also help (a bigger prompt could increase the chances of the model misunderstanding its instructions).

I can find nothing in the logs (except for one error of token exceed). Maybe i have to change the log level somewhere?

If i generate more then 1 assist, can i expose different entities to different assist?

Viewing or exporting the full logs should contain those entries, assuming you’ve used an assistant with one of the ‘extended openai conversation’ entries as the conversation agent since your last reboot. I haven’t changed my log levels and they’re visible for me, but maybe it differs between installation types? I’m running the latest HAOS.

I don’t believe there is a direct way to expose different entities per assistant at the moment.

I had to enable debug in the extension configration page.

For exposing different entities, just write what you want to expose in the prompt (like the iteration in the default prompt)

Can i iterate entities like the default prompt do in a spec instead of doing it in the prompt itself? This way i can filter entities and do not spam the prompt with useless info

Hi all, I am using or starting to use VoiceAssist and part of it is if you ask for the weather It’ll tell you the weather and auto navigate to the weather view, but we have to set it up with a specific intent and not use AI or it wont switch screen, thats just one use case.
Is there a way to use AI but also do something else on specific intents or when you mention a key word like “weather”?

I see you’ve worked on token saving, is not possible to make a spec that filter entities on request? I mean, i do not give any information in the prompt but i instruct gpt that if i request something about a light he has to call a specific function to have a list of light entities. I’m trying but i’m getting lost…

I’ve been thinking about how we could do this without bypassing the exposed entities list, but I have to assume that a function call is indeed possible for this purpose. I wish we had more documentation on the exact way information is handled though. A great way to reduce token cost is also to keep the conversation open for a period of time. For example, when you use the chat interface on the dashboard I think it’s preserving the bulk of the information on the destination server instead of resending your system prompt and full entities list with every message. I assume this because follow up messages are way faster than the first message. When you close that window it resets and will have to give it all of that information again on your next request. So long as the conversation ID can be preserved on whatever integration you’re using, that can be hundreds to thousands of tokens saved per message.

If we can stretch that further and only provide a list of entities and their static attributes in the initial system prompt, then use function calls for their states and dynamic attributes, we could theoretically keep the same context active until you hit the window limit after a few weeks/months. Were it not for my job I’d be working on making that happen :smiling_face_with_tear:

Looking at the logs looks like for every line of conversation, all the history (original prompt+request+gtp response) is sent increasing the token. Every line of conversation is greater then the previous one, that’s why could be very useful to have a conversation id to keep hiatory.

In the prompt, the entities passed are not all the exposed one but just the one you write in your prompt. The default prompt of the integration has the iteration that pass all the exposed entities but you can do whatever you want, that’s why i want to make a spec or a function to pass just what is needed only when it’s needed. I’m trying to, but i’m not an expert nor a programmer.
We definetely need someone who can manage this integration

If I wanted to trigger an automation based on the contents of an image, how would I accomplish this?
Example, I would want a notification if a black SUV pulls into the driveway.
I can trigger a snapshot when a car pulls into the driveway easily. Just not sure how to trigger an automation based on a specific car type.