Ollama realy slow with "assist"

I’m having a thin client with HA and ollama running on my laptop. Setting up Ollama in HA works, IP is found. In HA I can see the models that are downloaded at the laptop. Expose ollama to the network is activated on the laptop.

I set up a new Assist with Ollama. I have only 2 entities (Light, with area Living room and Light with the area bedroom). Without the “assist” option activated the chat is … slow, but ok. Takes 2-4 seconds to answer. text appears a bit slower than i can read - but o well.

Now i activate the “assist” option to enable controlling the entities. I ask “turn on the light in the living room”. After 4 minutes either the light turns on - or it takes another minute and the chat ends with a time out.

If i chat directly with the model in ollama, i can make it analyze my input quiet well. Takes a couple of commands at the beginning to stop it chatting and limit the answer to the minimum. Splitting the prompt into the area, the entities and the action. letting it know, what areas exist and to let me know if something wasn’t found. I know that the commands from the assists internally are turned into code. something like

  • tool_name: HassTurnOff
    tool_args:
    area: living room
    domain: light
    id: 01KGDHXDG0YSAT966KAMT9PGDW
    external: true

why can i make the Ollama model split the prompt into the keywords within seconds but when HA directs the prompt to Ollama it’s a 20/80 chance to execute the command or time out after 5 minutes. Sure, with much more GPU power whatever happens in the background could be handled, but on a thin client it’s a bit limited.

Still, directly chatting with ollama generates the code within 5 seconds, passed from HA it takes 5 minutes. Any ideas?

What GPU and Memory in your laptop?
What LLM are you running?
If the GPU is not like a 40xx (or better) and/or if you model is too big for the memory in your card, crawl is all you get.
How’s the response when you talk to the LLM directly in a command prompt window?

Model is Kineticflow/HomeAI and the GPU is the internal i5-GPU. It’s not fast, but like i said, when writing with the model directly i can make it interpret my commands quiet well. Things getting very slow when HA tries to pass my command to HomeAI.

My guess, you are maxed out.