Ollama llm assistant not working

I have have set up ollama integration to use an llm as an assistant.
Everything seems to work fine, except when I want to actaually, lets say turn on/off the tv. And yes, I have the “assist” option enabled.
I use 2 llm llama 3.2 and mistral.

llama3.2 tells me it cannot connect to a server when things become specific, like:
Sorry, I had a problem talking to the Ollama server: POST predict: Post “http://127.0.0.1:37149/completion”: EOF

mistral:
To turn on the lamp naast de bank, you can use the following command in Home Assistant:

HassTurnOn(area=“Woonkamer”, name=“Lamp naast de bank”)

The action is not executed. What am I missing here?

Asked the question again, now I get this response:
It seems like you have asked me to format an answer based on a tool call response. However, I don’t see the tool call response provided. Could you please provide the output of the tool call so that I can assist you in formatting an answer to the original user question?

I am now using llama 3.1:8b that seems to work a little bit better. I can now turn on/off devices, but not the ones I asked for.

I have now downgraded to ollama version 0.51. this seems to work better as well. But every now and than I get this " Sorry could not talk to ollama server" .

Ok, I found this message in the logs of ollama: gpu VRAM usage didn’t recover within timeout" seconds=5.250980419

So I think this is a “performance issue” . The GPU is still producing an answer, but homeassistant has timed out i think. Is there a way to change this timeout?