Why doesn’t the homeassistant team want to find a solution to this problem?
The voice-assistant gives errors to almost all my questions.
I suspect the main reason is that the Home Assistant team is currently focused on developing their own fully local voice assistant. Their priority is offline processing, device control without cloud dependencies, and building their own ecosystem of voice models.
With Gemini flash 2.5, I found the problem is that it doesn’t handle thinking correctly. If you change the model settings and put the text /no_think at the end on a new line, you’ll find that flash 2.5 works fine.
It took me a while to figure this out as I’d have assumed that the default works…
It makes sense that it broke between 2 and 2.5 as flash 2.5 added thinking.
With the newly released Gemini flash 3 I am getting the invalid argument error though.
I just tried adding /no_think and it didn’t work. Still got Unable to get response
I decided to switch to gemini 2 for now and it seems more stable.
That’s too bad. 2.5 has been solid for me with /no_think whereas it would fail for most requests previously. With the newly released 3.0 flash I hit the other INVALID ARGUMENT error hit by others.
{ “code”: 400, “message”: “Please ensure that function response turn comes immediately after a function call turn.”, “status”: “INVALID_ARGUMENT” } }
Giggles.
Even with AI, you have to faithfully follow the latest rules and documentation, without leaving room for guesswork. Looks like backwards compatability may be an issue.
Who’d have thunk it?
I enabled billing on my Gemini project. And the problems are gone since then. ![]()
The paid Gemini 3 Flash API could cost you pennies!
I use it for fallback from local. No more than a couple calls per day. I think my high expense day was 3 cents.
Paid may have better QOS.
I have not had any HA problems with 3 pro or 3 flash
This is actually one of the solutions. We usually yell at it and swear a lot until it correctly follows instructions and turn the lights on.
My opinion if you can, Go to the local server on your PC with ollama with a decent GPU, a quantized Qwen 4b Q4 doesn’t require too many resources, it’s super smart with home assistant and it’s super fast and fun, don’t waste time and money with these cloud services.
I just hope that the Home Assistant guys create a system that uses speech to phrase for all the standard commands, and routes all the other non-standard commands and questions to the AI, so you would have zero latency (and less boredom because after a while the AI that talks to you to turn on a light becomes boring) and the power of the AI when it’s necessary. Such a system is entirely offline, furthermore the commands would always be available even when the AI server is not reachable.
… Well, thats exactly how it works with the default conversation integration, and Assist local first turned on. So…
Hi Nathan, Not really, maybe I explained myself badly, if you’re referring to the “Prefer local command” function, from what I’ve seen, it’s true that the system favors “automation” commands, and in fact, chat texting commands are very fast. But the problem , and therefore the latency and, unfortunately, sometimes the inability to recognize voice commands properly, is that whisper is used, not speech-to-phrase for voice commands.
I would like to see speech to phrase used for all standard voice commands and only then whisper for non-standard phrases to be sent to the AI
Personally, I’d hate that - speech to phrase is WAY too limiting, even more than simple intents. I don’t even use local first because it’s also too limiting and misses WAY too much on American south Texas accents. LLM fills the gap.
Don’t get me wrong its nice for very low power environments, but. no.
If you have the beef to
then just pull voice local and build your own. You can pull parakeet for Wyoming (that’s what I use) and then build up intents for anything you need - No way I’m doing speech to phrase with a rig that has that.
Optimize for good STT and then make sure your intents work well, then local first.
Also when you think of your LLM as a tool using assistant instead of just ‘turning on and off the lights’ suddenly those limits matter. For instance, I have to go tell Friday to load the grocery run from yesterday into inventory and then run a check on the hot tub chemistry. You aren’t speech to phrasing that. You need good tools with a competent LLM.
It is clear that we have different experiences of use.
I speak with my systems in Italian, so I assume it is even worse than with English speech, whatever accent it is.
I’ve tried the whisper approach in many ways, even with a local Docker server dedicated to whisper on a powerful PC, but it’s still inconvenient; sometimes it works well, and sometimes it doesn’t. I’d be very happy if STT worked well with low latency and always, but right now it’s not the best solution. The current compromise is to rely on groq via the cloud with Large v3 Turbo, but even then it’s not satisfactory at all; it’s probably not even whisper, but Italian support.
With speech-to-phrase, however, it’s a solid rocket; every hit in Italian hits the mark quickly, and that’s truly very satisfying and gratifying.
So I took the simplest route from my point of view: Automations everywhere and a lot of preset phrases, and I’m happy (also because I find the presets work safely and better than what an AI would do). I just need that approach to AI I described previously, and I’d be happy.
As you can see, my approach is using two machines, a small one with an Intel Atom with HASS that works very well with speech to phrase and local voice, always on (even in the event of a power failure) and which consumes very little energy, and the other is my powerful PC with which I have some local AI that I use when I keep it on.
Qwen 3 4b that I chose would also perform well as a tool assistant I find it smart despite being so small, my problem is the STT.
this can already be implemented
Thank you.
stt-fallback seems to be what I was looking for.
I’m testing it and it seems to work,
with an AI Assistant selected and local commands enabled,
I set speech-to-phrase as the primary STT and whisper on cloud as the secondary STT, and it seems to work exactly as I want (I need to do some more in-depth testing).
I wonder why it’s a somewhat hidden add-on?
I used Gemini 2.0 as workaround, but yesterday I received an email from Google saying they will deprecate Gemini 2.0, at least on Google AI Studio, so this issue is critical right now since we don’t know how much time we have.
And it’s strange that I got the error even I’m not using Gemini, but a local processed command instead, turn off ac after 2 hours (in Portuguese), so this error appears after 2 hours when the ac should be turned off. If I use Home Assistant instead of Gemini, which just has the local processing, it works and the ac is turned off correctly.
Delayed actions are a type of timer. After it expires, the system attempts to execute the command you spoke. So you’re simply postponing the error, if one exists, when working with the agent.
There’s another problem: currently, delayed commands don’t work if the LLM conversational agent is selected with the “no control” option.
Hi,
I have switched from gemini-2.5-flash to gemini-3-flash-preview. Since then I get this error, too, with an important difference to what I have read so far:
All actions I am asking are executed and only afterwards it fails.
So for me it was clear that this is kind of handshake issue between HA & Gemini API.
Just today, I even had a smart response really “proving” this:
Prompt was pretty easy “timer for 10 minutes” and the final result is following:
"message": "Please ensure that function response turn comes immediately after a function call turn.",
(almost) full process:
- type: intent-start
data:
engine: conversation.google_ai_conversation
language: de-DE
intent_input: Timer für 10 Minuten.
conversation_id: XX
device_id: YY
satellite_id: assist_satellite.home_assistant_voice_0a9425_assist_satellit
prefer_local_intents: false
timestamp: "2026-01-27T11:50:26.791211+00:00"
- type: intent-progress
data:
chat_log_delta:
role: assistant
timestamp: "2026-01-27T11:50:29.031844+00:00"
- type: intent-progress
data:
chat_log_delta:
tool_calls:
- tool_name: HassStartTimer
tool_args:
minutes: 10
id: XX
external: false
timestamp: "2026-01-27T11:50:29.032342+00:00"
- type: intent-progress
data:
chat_log_delta:
role: tool_result
agent_id: conversation.google_ai_conversation
tool_call_id: XX
tool_name: HassStartTimer
tool_result:
speech: {}
response_type: action_done
data:
targets: []
success: []
failed: []
created: "2026-01-27T11:50:29.033868+00:00"
timestamp: "2026-01-27T11:50:29.033958+00:00"
- type: intent-end
data:
processed_locally: false
intent_output:
response:
speech:
plain:
speech: >
Sorry, I had a problem getting a response from Google Generative
AI.: {
"error": {
"code": 400,
"message": "Please ensure that function response turn comes immediately after a function call turn.",
"status": "INVALID_ARGUMENT"
}
}
extra_data: null
card: {}
language: de-DE
response_type: error
data:
code: unknown
conversation_id: XX
continue_conversation: false
And this pattern is always the same. I ask for an action like executing scripts or turning on the lights and it always executes the action and almost always returns an error.
Looking forward to your opinions on this.