I observed that somehow LLM (openai) started to answer with higher waiting times on my Voice PE.
Finally I did more tests tests:
Spoken question: (in Romanian): “Daca la sah mut regele o patratica inainte si apoi il mut inapoi, se mai poate efectua rocada mare sau rocada mica?”
in English would be: “If in chess I move the king one square forward and then move it back, can I still perform the queenside or kingside castling?”
Natural Language Processing times:
“Prefer handling commands locally” ENABLED:
53.71 s
59.36 s
42.88 s
“Prefer handling commands locally” DISABLED:
2.89 s
1.48 s
3.07 s
Why is “Prefer handling commands locally” consuming SO MUCH time ?
I was expecting a small time increase, but … from ~2.5seconds to ~52 seconds makes me think to disable the option for ever altough I use it a lot …
Is it something in my installation only ? Any idea how to debug differently or to improve those times ?
When using that setting your text is first run through the local LLM and first when that fails it is then forwarded to the cloud LLM.
Since local LLM is generally slow, unless you have thrown A LOT of money at it, then it will increase significantly in processing time.
But the fallback to local processing gets, actually, to the local built-in-sentences recognition , which is not LLM.
I have home assistant yellow, I’m using Home Assistant Cloud for STT, TSS, and OpenAI. Disabling “profer handling commands locally” means that I will use my full cloud support and it works fast. When I enable the local sentence processing (which is not LLM), this makes it to be slow … so it seems not to be related to my openAI LLM (which works ok without the fallback)
Can someone please test this question in assist (written or spoken to the assist) and tell me the overall time ? (with Prefer Handlng Localy enabled, because without it’s fast).
Thanks @mchk , you motivated me to investigate deeper (since I am having the same issue also in English that was wrong, it seems in english it works extremely fast).
With Gemini next to me (as an assistant) I started to do debugs.
Relevant info:
I have ~185 entities exposed to Assist.
When I ask a long question, the system FREEZES for ~1min (actually I was having a Connection lsot message in the web, but working with voice, I was not seing it initially)
eventually, the fallback kicks in and the LLM answers.
Log:
I enabled “debug” logger: level for assist. And … there’s a gap of ~1min exactly in that moment in the log.
The Evidence - Profiler
I ran the profiler during the freeze. The output confirms that the Main Loop is blocked for 51 seconds exclusively by re.Pattern.match. Gemini told me: “It seems the local intent matching is brute-forcing the regex matching on the main thread, choking the CPU.”
Hardware: Home Assistant Yellow (CM4) Software: 2025.12.3
I will, of course, cleanup my entities. But … I think Gemini has a valid question to be asked here: Is there a way to limit the timeout for local matching or offload this to a non-blocking thread?
Thanks for the hint. actually I was having a custom_sentences file. I temporarily removed and I see no change
After so many tests with the same sentence … I also learnt that
Status:
I reduced my exposed entities from 188 to 108. No change
I was having a custom_sentence local file. I temporarily removed, No change
I tested better (initially I was doing it wrong) and it seems with English actually works very fast ! I couldn’t replicate this issue at all !! So it looks to me that maybe it’s something with Romanian sentences.
Anyone with a Romanian Assist + LLM (+ Prefer handling commands locally) :
Can you please test and ask the Assist " Daca la sah mut regele o patratica inainte si apoi il mut inapoi, se mai poate efectua rocada mare sau rocada mica?" and then confirm the time it took ?
PS.:
Here you can see “Connection lost” while waiting for the answer …
just making sure: you did restart HA between tests, right? nevermind, I was able to sort of reproduce the issue, but my system is far more powerful than a yellow.
Could you please tell me if you get the same behavior in French? If my assumption is correct, it should take ~5x longer in French than in Romanian. In English it should work ~10x faster than Romanian.
As I said before, you need to try to figure out which intent is triggered by this phrase. The problem is that there are too general intent patterns that lead to a lot of checks being performed. For example, your English sample refers to the “scene_HassTurnOn.yaml" intent: [activate] <area> <name> [scene].
Apparently, the naming features of your entities give you an additional delay , which I’m not getting on my server. But for Romanian, I get the same error “connection lost”, which indicates that the operation is taking too long. I still recommend starting a discussion about the problem in the repository. It would be better to adjust the templates a bit to reliably avoid this problem.
I’ve done a bit more research on this matter and I can’t say I’ve been able to reproduce the issue reliably. Here are some possible causes.
The Prefer handling commands locally option means trying to run a sentence matching using the built-in conversation agent. If that fails (i.e. if NONE of the built-in sentences match), then hand the query to a LLM-based conversation agent.
At the time of this writing, Romanian has ~160M sentences built into the default conversation agent. As a reference, English has ~14M and French has ~795M.
Sequential matching historically has been slow, but optimized. If sequential matching is truly the bottleneck in this case, then using a non-standard query in a LLM-powered English pipeline with Prefer handling commands locally set to on should be ~10x faster than an equivalent Romanian pipeline, which in turn should be ~5x faster than a French one.
However, that’s not the case in my tests. Even though i am running this on hardware that’s much more powerful than a CM4-powered HA yellow (i.e. on a HAOS VM running on 2 cores of an Intel Core Ultra 5 245K), the response times I’m getting from conversation agents with and without Prefer handling commands locally on English, Romanian and French are comparable not only inside the same language, but also amongst languages (ranging from ~2.5s at best on French to ~7s at worst on Romanian). The bottleneck in my case seems to be the response time of OpenAI, and even that is hardly noticeable.
I’ve brought the issue to @synesthesiam’s attention, but so far we are simply sharing a feeling of amazement regarding the very long response time you’re getting @adynis
Wow !!! @synesthesiam This decreased the processing time from ~60seconds to ~6 seconds !!!
And currently I see that also local handling still works … also “falling back to LLM” seems to work (at least for this notorious statement from the beginning of this topic ) way faster !!
But … since this was not enabled officially … I suppose … there are some disadvantages ? negative implications ? Should I test also something else in paralel ?
It’s fine to leave the regex filtering disabled. It’s meant to be a pre-processing step to speed up matching, but it assumes certain things about the sentence templates and input text.
I’m looking into a replacement that should work better for more cases