Why "Prefer handling commands locally" increases the time from ~2.5s to ~52s?

I observed that somehow LLM (openai) started to answer with higher waiting times on my Voice PE.

Finally I did more tests tests:

  • Spoken question: (in Romanian): “Daca la sah mut regele o patratica inainte si apoi il mut inapoi, se mai poate efectua rocada mare sau rocada mica?”
    • in English would be: “If in chess I move the king one square forward and then move it back, can I still perform the queenside or kingside castling?”
  • Natural Language Processing times:
    • “Prefer handling commands locally” ENABLED:
      • 53.71 s
      • 59.36 s
      • 42.88 s
    • “Prefer handling commands locally” DISABLED:
      • 2.89 s
      • 1.48 s
      • 3.07 s

Why is “Prefer handling commands locally” consuming SO MUCH time ?
I was expecting a small time increase, but … from ~2.5seconds to ~52 seconds makes me think to disable the option for ever altough I use it a lot … :roll_eyes:

Is it something in my installation only ? Any idea how to debug differently or to improve those times ?


1 Like

When using that setting your text is first run through the local LLM and first when that fails it is then forwarded to the cloud LLM.
Since local LLM is generally slow, unless you have thrown A LOT of money at it, then it will increase significantly in processing time.

Most likely. First, check the system’s performance in another language. Or try another llm integration for your language (e.g., ollama).

But the fallback to local processing gets, actually, to the local built-in-sentences recognition , which is not LLM.
I have home assistant yellow, I’m using Home Assistant Cloud for STT, TSS, and OpenAI. Disabling “profer handling commands locally” means that I will use my full cloud support and it works fast. When I enable the local sentence processing (which is not LLM), this makes it to be slow … so it seems not to be related to my openAI LLM (which works ok without the fallback)

Can someone please test this question in assist (written or spoken to the assist) and tell me the overall time ? (with Prefer Handlng Localy enabled, because without it’s fast).

Thanks

1.5s with cloud llm provider [english]

It seems there’s a problem with parsing the Romanian version of the sentence. Go to Developer Tools - Assist and check it.

If a request is not processed within a reasonable time, it is worth opening a discussion about the issue in the intent repository.

Thanks @mchk , you motivated me to investigate deeper (since I am having the same issue also in English that was wrong, it seems in english it works extremely fast).
With Gemini next to me (as an assistant) I started to do debugs.

Relevant info:

  • I have ~185 entities exposed to Assist.
  • When I ask a long question, the system FREEZES for ~1min (actually I was having a Connection lsot message in the web, but working with voice, I was not seing it initially)
  • eventually, the fallback kicks in and the LLM answers.

Log:
I enabled “debug” logger: level for assist. And … there’s a gap of ~1min exactly in that moment in the log.

The Evidence - Profiler
I ran the profiler during the freeze. The output confirms that the Main Loop is blocked for 51 seconds exclusively by re.Pattern.match. Gemini told me: “It seems the local intent matching is brute-forcing the regex matching on the main thread, choking the CPU.”

Here is the pstats output (sorted by tottime):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6652   51.132    0.008   51.132    0.008 {method 'match' of 're.Pattern' objects}
 7756/132   16.857    0.002    0.410    0.003 {method 'poll' of 'select.epoll' objects}
    172/2    1.285    0.007    0.001    0.000 {built-in method select.select}
15405/541    0.933    0.000   22.494    0.042 _parser.py:511(_parse)

Hardware: Home Assistant Yellow (CM4)
Software: 2025.12.3

I will, of course, cleanup my entities. But … I think Gemini has a valid question to be asked here:
Is there a way to limit the timeout for local matching or offload this to a non-blocking thread?

PS. I can attach the cprof file if needed.

Do you have any custom sentences?

No, you can’t. :slight_smile:

1 Like

Thanks for the hint. actually I was having a custom_sentences file. I temporarily removed and I see no change :frowning:

After so many tests with the same sentence … I also learnt that :smiley:

Status:

  • I reduced my exposed entities from 188 to 108. No change :roll_eyes:
  • I was having a custom_sentence local file. I temporarily removed, No change :roll_eyes:
  • I tested better (initially I was doing it wrong) and it seems with English actually works very fast ! I couldn’t replicate this issue at all !! So it looks to me that maybe it’s something with Romanian sentences.

Anyone with a Romanian Assist + LLM (+ Prefer handling commands locally) :
Can you please test and ask the Assist " Daca la sah mut regele o patratica inainte si apoi il mut inapoi, se mai poate efectua rocada mare sau rocada mica?" and then confirm the time it took ?

PS.:
Here you can see “Connection lost” while waiting for the answer …

Hahaha. So you did learn something!

1 Like

just making sure: you did restart HA between tests, right? nevermind, I was able to sort of reproduce the issue, but my system is far more powerful than a yellow.

Could you please tell me if you get the same behavior in French? If my assumption is correct, it should take ~5x longer in French than in Romanian. In English it should work ~10x faster than Romanian.

As I said before, you need to try to figure out which intent is triggered by this phrase. The problem is that there are too general intent patterns that lead to a lot of checks being performed. For example, your English sample refers to the “scene_HassTurnOn.yaml" intent: [activate] <area> <name> [scene].

Apparently, the naming features of your entities give you an additional delay , which I’m not getting on my server. But for Romanian, I get the same error “connection lost”, which indicates that the operation is taking too long. I still recommend starting a discussion about the problem in the repository. It would be better to adjust the templates a bit to reliably avoid this problem.

I’ve done a bit more research on this matter and I can’t say I’ve been able to reproduce the issue reliably. Here are some possible causes.

The Prefer handling commands locally option means trying to run a sentence matching using the built-in conversation agent. If that fails (i.e. if NONE of the built-in sentences match), then hand the query to a LLM-based conversation agent.

At the time of this writing, Romanian has ~160M sentences built into the default conversation agent. As a reference, English has ~14M and French has ~795M.

@tetele ➜ /workspaces/intents (main) $ python3 -m script.intentfest count_sentences --language en --summary
en: 14,198,324
...
@tetele ➜ /workspaces/intents (main) $ python3 -m script.intentfest count_sentences --language ro --summary
ro: 160,010,542
...
@tetele ➜ /workspaces/intents (main) $ python3 -m script.intentfest count_sentences --language fr --summary
fr: 795,509,605

(source)

Sequential matching historically has been slow, but optimized. If sequential matching is truly the bottleneck in this case, then using a non-standard query in a LLM-powered English pipeline with Prefer handling commands locally set to on should be ~10x faster than an equivalent Romanian pipeline, which in turn should be ~5x faster than a French one.

However, that’s not the case in my tests. Even though i am running this on hardware that’s much more powerful than a CM4-powered HA yellow (i.e. on a HAOS VM running on 2 cores of an Intel Core Ultra 5 245K), the response times I’m getting from conversation agents with and without Prefer handling commands locally on English, Romanian and French are comparable not only inside the same language, but also amongst languages (ranging from ~2.5s at best on French to ~7s at worst on Romanian). The bottleneck in my case seems to be the response time of OpenAI, and even that is hardly noticeable.

I’ve brought the issue to @synesthesiam’s attention, but so far we are simply sharing a feeling of amazement regarding the very long response time you’re getting @adynis

@adynis Can you try adding this to a custom sentences file?

language: ro
settings:
  filter_with_regex: false

This will disable regex filtering (after a restart or if you run the conversation reload action.

Wow !!! @synesthesiam This decreased the processing time from ~60seconds to ~6 seconds !!!
And currently I see that also local handling still works … also “falling back to LLM” seems to work (at least for this notorious statement from the beginning of this topic ) way faster !! :slight_smile:
But … since this was not enabled officially … I suppose … there are some disadvantages ? negative implications ? Should I test also something else in paralel ?

It’s fine to leave the regex filtering disabled. It’s meant to be a pre-processing step to speed up matching, but it assumes certain things about the sentence templates and input text.

I’m looking into a replacement that should work better for more cases :+1:

2 Likes