Up to now I have been using OpenAI (gpt-4o-mini) as my conversation agent. It generally is awesome at correctly interpreting commands, but I feel the overall latency is just a tad too slow. This week I experimented with Gemini 2.0 Flash. It is noticeably faster for me. Based on the assist debug timings, most commands complete in 1-2s, whereas gpt-4o-mini usually is between 2.5-3.5s. It also handles all my own benchmark sentences and commands very well.
However, it seems Gemini handles scripts differently than openai. For openai I only need to say the name of the script (or the alias) and it knows to execute the script. For Gemini, I have to say “enable ”, or “turn on”. Which is a bit annoying.
To make this concrete, imagine a “script.listen_to_the_radio”. With openai, I could simply say “Listen to radio” and it would execute that script. Now with Gemini I have to say “Enable listen to the radio” which is a bit counter intuitive. And if I have to teach my family to remember trigger words, we could just use local handling
Is there any way for Gemini, maybe some proper prompt tuning, to execute a script when it recognizes the script name or alias?
I don’t mean to say your way is bad, but this problem is solved easier with intent_script.
Neural networks will change, but your commands will always work in a stable, predictable and high priority way.
add sentence
If you need exactly LLM for this task, try giving it an example of how to interact a request with a script name in promt. Somewhere on the forum I’ve seen something like this before, but I’m not sure for which integration.
I wonder if this problem could be improved with adding an instruction in the prompt to check scripts first when given a command. I’ve noticed this issue too occasionally.
I wish we had access t the exact file HA passes to the LLM. I wish I understood what and how Gemini remembers previous conversations.