Ask OpenAI questions from your default conversation agent!

Hi everyone. I wanted to be able to use my default voice assistant (which is the one I use to control HA entities) to also query OpenAI to be able to ask random trivia questions from my Wear OS smartwatch. Normally, you would have to switch the voice assistant to the OpenAI one for this to work. I found a workaround however which allows me to do both with the same assistant.
Here’s a video that shows it working:

I am now able to for instance ask: “Ask OpenAI who is Barack Obama”? And I get the answer back.
For this to work I use this awesome custom component which extends the conversation agent with regex capabilities and allows us to retrieve specific values from our conversation queries.

In the example above the conversation agent would use the “who is Barack Obama” bit and forward that to the target conversation agent and store the response.

Setup instructions:

  1. Install Yarvis via HACS by adding the repository as a custom repository and then installing it.
  2. Setup Yarvis via the integrations menu and set the intents to trigger on, e.g:
AskOpenAI:
  sentences:
    - Ask OpenAI to\s(?P<query>.+)
    - Ask Open AI to\s(?P<query>.+)
    - Ask OpenAI\s(?P<query>.+)
    - Ask Open AI\s(?P<query>.+)
  1. Add a voice assistant (Setting - Voice assistants) or change your current one to use the Yarvis conversation agent:

  2. Setup an intent_script (and a template sensor) in your configuration.yaml and replace the agent_id:
    The agent_id can be found in the debug assistant for your target conversation agent:
    (Settings-Voice assistants-OpenAI(or whatever you called the OpenAI assistant)-three dot menu-Debug)

intent_script:
  AskOpenAI:
    action:
      - service: conversation.process
        data:
          text: "{{query}}"
          language: EN
          agent_id: 50005158a4b775603223d530315c184e
        response_variable: agent
      - event: openai_response_received
        event_data:
          full_response: "{{ agent.response.speech.plain.speech }}"
    speech:
      text: "{{ states['sensor.openai_query_response'].attributes.full_response }}"

template:
  - trigger:
      platform: event
      event_type: openai_response_received
    sensor:
      - name: "OpenAI query response"
        state: "{{now()}}"
        attributes:
          full_response: "{{ trigger.event.data.full_response }}"

Restart HA or reload your YAML configuration to finish.
You should now be able to use you default voice assistant (the one that has Yarvis set as the conversation agent) to both control your HA entities and ask OpenAI questions without switching.

Changelog

  • 2023-07-23: Initial version
  • 2023-07-24: Used template sensor triggered by custom event to hold full response instead of input_text which is limited to 255 characters.
8 Likes

Thanks, I tried to do this recently but couldn’t figure out how to trigger a wildcard sentence.

Small steps on the road to the ideal smart assistant. I think HA will need to deal with this issue soon — how to combine the locality and privacy of HA Assist/intents, with the desire for a conversational (meaning: GPT-powered) assistant, without manually switching back and forth.
Adding a wake word processing emphasizes this issue, since a wake word can only trigger one assistant. For now, “Ask OpenAI” is great, but it’s too similar to Alexa’s antiquated paradigm of “ask [skill] to do [x]”.

1 Like

Have you tried to save the answer as an attribute of a template sensor? There you wouldn’t have the limitation to 255 chars. You could use the conversation trigger in a template sensor, so it fires on the response. Just from the top of my head, but could be worth a try. :slight_smile:

Great guide btw. :+1: Much appreciated! :wave:

oh my gosh, thank you for reaching out on my post. I was already half way through trying to implement this exactly before irl obligations took over. So excited to try this out!!

1 Like

Great idea. I have edited the post to change the input_text to a template sensor with an attribute holding the full response from OpenAI.

1 Like

Is there anyway to use yarvis as a fallback to the default agent? I want sentences that are specific to homeassistant to be checked first, and then send all unmatched requests to OpenAI to get closer to a natural conversation

I.e. Hey Hal, turn on the lights > matches a assist sentence and turns on the lights
Hey Hal, Open the Pod bay doors, please > forwarded to GPT api > response: “I’m sorry, Dave, I’m afraid I can’t do that.”

No I don’t think that’s possible currently. There is no way to combine/fallback to multiple voice pipelines.

I agree 1000% that this needed!

Well actually there is, I’m doing just that, with the help of custom intent, and nodered.

Shortly described, I created a custom intent, with a wildcard, passing everything assist can’t do on to nodered. In nodered I’m then using my own OpenAI script, and sending the response back to HA.

I’m currently working on doing it with the extended OpenAI conversation, so it should be possible with only HA. I’m not just done yet.

The reason I’m going for assist at first, is faster response time, and to save money on the OpenAI api

That sounds great! Could you show us the custom intent w/wildcard? How do you know when there is no match? Is that in the esp32 firmware or in the yaml intent?

This will help with the custom intent w/wildcard, not with how to know if there is no match. I made one change, for the sentence it is “chat {question}” so it will only use openai when I start a sentence with “chat.”

Hello, I have seen this post and liked the idea, but like the most of you I’d like not to have a special trigger word for OpenAI. I liked the idea of having a fallback for the default agent.
So I gave it a try and implemented that as a custom component. I just finished this 10 minutes ago and didn’t have much time for testing, but i thought you should see this as soon as possible :slight_smile:

Just have a look at the code or try it out as you like. For me it seems to work quite good. Please read the README before setting it up, to understand how it works.

Actually it just allows to setup a list of agents which are called one after the other until an agent returns a successful result.


UPDATE: I have changed the repo-url and I’ve updated it in this post as well


2 Likes

This looks super cool! :smile: I’m really interested in your approach to creating a fallback mechanism for the default agent without the need for a special trigger word for OpenAI. I’ve had a look at your GitHub repo, and the concept of having a list of agents that are called sequentially until a successful result is obtained seems very practical.

I do have a question about how fast the fallback to OpenAI occurs. Does the system have to wait for a complete response from the initial assistant before it switches to OpenAI, and if so, does this mean there might be a significant delay before receiving an answer? I’m curious about how this impacts the overall responsiveness.

Great job on this, and thanks for sharing!

1 Like

Hi, thanks for your feedback.
As I understand the API of the agent, I need to wait for the response of the agent to get its response object including the error property.

Fortunately the default agent seems to be quite fast when it comes to intent recognition - or at least to find that there is no match.
So I don’t feel any delay. I could add some debug logging to get an overview of the delays.

However, in my opinion, the delay of the default agent is negligible compared to the performance of ChatGPT itself.

——————
Update:

  1. If the performance is that critical for you, one could call all agents in parallel and take the first (ordered) response. But I don’t find that practical and it could create strange side effects - and costs (OpenAI API)

  2. Another issue I have seen is that the OpenAI integration caches its own conversation history by saving all input and output messages into a history object. With my component, the OpenAI agent doesn’t get all messages, but only those which are passed in as a fallback. So you could not ask OpenAI things related to the history („what was the last thing I turned off?“).

Great work! This should be a core feature.
This way you could add multilanguage too.

1 Like

That’s a great idea, @DonNL !

I have improved the integration a bit and just released version 0.1.2.
Unfortunately there was a breaking change when I switched from “main” branch to GitHub Releases and additionally I renamed the Git-Repository (from hass to hacs) :slight_smile:

So please make sure to remove the integration and delete the repo from HACS. Then add it again using the following repo. From now on, I’ll try to reduce breaking changes as much as possible :smiley: .

For anyone else who wants to get more examples, here is my last Assist test conversation.
(I have told OpenAI to answer as a pirate :slight_smile:)

You can find all the instructions to set this up in the README.md of the repo.

2 Likes

Funnily enough, I made the same thing a couple weeks ago!

I implemented it a little differently, mainly due to this being the first custom integration I’ve ever written.

Great minds think alike I guess!

One important feature for me that I included was debugging. I wanted the ability to see what conversation agent responded, and even what the failures were of each conversation agent, right in the final response, assuming debugging is enabled.

Thought you might find my implementation interesting as well.

4 Likes

Just wanted to chime in and thank you both, this was one of the last remaining blockers from my perspective in getting to a fully functional VA model for the house.

I ended up trying both repos linked here last night and did notice one behavioral difference I thought would be worth lifting up as I’m not sure if its WAD or not.

For @t0bst4r 's implementation, when I use a custom sentence I’ve implemented that utilizes a wildcard (e.g. “play classical radio on the living room speakers” where ‘classical radio’ is a wildcard slot) I found the chained agent skips Home Assistant and goes right to the backup agent (OpenAI in my case), failing to play the music as expected. All other custom sentences that do not have wildcard slots seem to work as expected.

When using @marisa 's implementation, the behavior is as expected: a request for “play classical music on the living room speakers” is recognized as an HA intent and the appropriate script is executed.

Not sure if anyone else is seeing this behavior or its WAD, but wanted to share. Again, thank you both for sharing your work here!

If I had to take a guess as to the reason, I assume it’s because @t0bst4r 's implementation doesn’t pass a language along, where as my implementation does. Intents are based on language, and custom intents probably just don’t trigger unless a language is provided.

Interesting! I’m fairly certain my other custom intents that did not have wildcard slots triggered fine, so I wonder if its tied to that wildcard? Either way, I appreciate the work and thought!