Google Gemini 2.0 and script execution

Up to now I have been using OpenAI (gpt-4o-mini) as my conversation agent. It generally is awesome at correctly interpreting commands, but I feel the overall latency is just a tad too slow. This week I experimented with Gemini 2.0 Flash. It is noticeably faster for me. Based on the assist debug timings, most commands complete in 1-2s, whereas gpt-4o-mini usually is between 2.5-3.5s. It also handles all my own benchmark sentences and commands very well.

However, it seems Gemini handles scripts differently than openai. For openai I only need to say the name of the script (or the alias) and it knows to execute the script. For Gemini, I have to say “enable ”, or “turn on”. Which is a bit annoying.

To make this concrete, imagine a “script.listen_to_the_radio”. With openai, I could simply say “Listen to radio” and it would execute that script. Now with Gemini I have to say “Enable listen to the radio” which is a bit counter intuitive. And if I have to teach my family to remember trigger words, we could just use local handling :smiley:

Is there any way for Gemini, maybe some proper prompt tuning, to execute a script when it recognizes the script name or alias?

I don’t mean to say your way is bad, but this problem is solved easier with intent_script.
Neural networks will change, but your commands will always work in a stable, predictable and high priority way.
add sentence

  runcommand:
    data:
      - sentences:
          - "{ script_name }"
lists:
  script_name:
    values:
      - in: "Listen to radio"
        out: "listen_to_the_radio"
      - in: "Do something else"
        out: "script_name"

and action

runcommand:
  action:
    action: script.{{ script_name }}
  speech:
    text: "blablabla"

If you need exactly LLM for this task, try giving it an example of how to interact a request with a script name in promt. Somewhere on the forum I’ve seen something like this before, but I’m not sure for which integration.

Thanks for the tip. I didn’t look into custom intents yet, but that doesn’t look too complicated. I’ll give it a go.

I wonder if this problem could be improved with adding an instruction in the prompt to check scripts first when given a command. I’ve noticed this issue too occasionally.

I wish we had access t the exact file HA passes to the LLM. I wish I understood what and how Gemini remembers previous conversations.