Anyone else's Voice Pipeline w/ Ollama broken since 0.9 release?

I went from working pipeline to not with Gemma 3 12B w/ tools via Ollama, Ollama did an update with streaming responses with tools so maybe that broke it?

it works fine for me using qwen2.5. as gemma doesn’t natively support tool calls, maybe it’s the model?

1 Like

Yeah it’s a tune that supports tools, might be that

Tool calling.

You must ensure the model supports it. I’m only using Gemma for summary jobs rn. It does that VERY WELL. Gemma herself will tell you she can’t and it’s experimental.

I’m using the PetroStav tools tune of Gemma but since it’s not official, maybe Ollama org doesn’t flag it as a tool model? I’m running Mistral small 3.1 for the meantime and working better, still getting some stuff wrong but at least it’s functional

maybe try following gemma variant. I have this in use for a month now and its working as intended. I use it right now with Ollama 0.9 and latest stable HA version

1 Like

Thanks, I tried it and it still fails the “Which devices are on in the bedroom” test. Says stuff is on when it isn’t or off when it’s on etc.

Maybe check all exposed devices. Check aliases and check your context size. Maybe your context is set to small and not all device data is sent to the LLM. My context size is set to 45.000. the actual sent context is at around 29.000

1 Like

29K? jeebus.

Heh context is king. I was blowing 250K token contexts for Friday before I started summarizing the context… (you know this is happening when you ask the model to do something it should know cold HassTurnOn(entity_id ) etc. It says it did it and nothing happens… It pushed the base intents right out… The model tool calls to

Is there an easy way to see the prompt size HA is sending? Thanks

I am having some luck with mistral-small3.1

but with 30K context it adds up to 33GB of ram/vram, and ollama for some reason decides to put a third of that on CPU while my gpu has 32GB of VRAM, still reasonably fast but odd none the less.

So after a few days of living with Mistral Small 3.1 seems to have fixed all my issues. The model is not small and from what I’ve learned from you guys you need a crap ton of context, so it’ll definitely reside somewhat in system ram…

1 Like

But is it working? Ram is cheap… Ish :joy:

1 Like

I think I also saw your Discord post on this and chiming in to say I am having the exact same issue as described, running the same model (gemma3-tools-PetrosStav) which was working incredibly well with a 16k context size on my RTX 4070 Ti Super. Pretty sure is has to do with the recent Ollama update and the tool streaming response. I downgraded my Ollama back to 0.7.1 and everything is working again.

Unfortunately, the model linked by @maglat and the mistral-small3.1 are working in the powershell (running Ollama server on Windows Desktop) but timing out when I try to reach it through Home Assistant pipeline - while CPU runs at 70% and GPU idles :frowning: No idea what is going on there, but downgrading Ollama and sticking with Petrostav @16k helped me as an interim solution.

I am no expert to all this, just tinkering around. Still appreciate any guidance how to get tool calls working again using Ollama 0.9 or why the other models are not responding when reaching out through Home Assisstant.

This just dropped, it’s smaller, going to give it a try:

ollama run magistral:24b

1 Like

24b she phat.

I absolutely plan on checking it out. Should basically be ollama compat reasoner capable mistral (read reasoner/tool user) that specializes in euro language and worldview. Not a bad combo. Especially if you speak French.

What I’ll be looking for - Mistral (spiritual ancestor here) has a nasty habit of running on… And on and on and on and… Omg Mistral SHUT UP?! (see that timeout in my tok / sec on Fridays party - wasn’t because it wasn’t barfing tokens at nearly 20/sec… It wouldn’t stop and hit the timeout every time… :rofl:)

I’ll be looking to see if they can choke that off well when they move to a test time computd/reasoner.