Ollama Now Supports Streaming Text Responses with Tools

marcjt · May 30, 2025, 1:38pm

Leaving this here for anyone who is curious… Ollama 0.8.0 (released May 28th 2025) now supports streaming text responses for models with tool function!

github.com/home-assistant/core

Streaming broken when Ollama model config entry is configured to assist

opened 01:44PM - 10 Mar 25 UTC

closed 01:01AM - 27 Mar 25 UTC

Rudd-O

integration: ollama

### The problem I asked the conversation.qwen_2_5 entity (which I had before) t…o write me a poem. The response came in a block without streaming, no matter how long the poem I asked for was. (This test was all done via text chat) I added another conversation.deepseek_r1 and asked it to write me a poem. The response came in streaming. I deleted the existing conversation Qwen config entry and recreated the config entry anew. When asked to write a poem, the response came in a block again. I then disabled assistance in the Qwen config entry (set it to _No control_) and boom, streaming was enabled. Is there a reason why streaming is not enabled when the conversation entity is set to _Assist_? ### What version of Home Assistant Core has the issue? 2025.3.1 ### What was the last working version of Home Assistant Core? new feature that did not exist before ### What type of installation are you running? Home Assistant Core ### Integration causing the issue ollama ### Link to integration documentation on our website _No response_ ### Diagnostics information _No response_ ### Example YAML snippet ```yaml ``` ### Anything in the logs that might be useful for us? ```txt ``` ### Additional information _No response_

NathanCu · May 30, 2025, 3:12pm

Rock on - if you can upgrade your builds

(I’m on Intel-IPEX for my local box - have to wait for it to port to the IPEX build.) Will definitely start watching for this one to flow through their git.

oh and @Rudd-O - that was your bug originally…

maglat · May 31, 2025, 9:22pm

Confirm it finally working in the text chat. I really really looking forward that our great HA team will implement response streaming for voice as well. this would extend the voice experience by A LOT!

Shawneau · June 1, 2025, 5:41pm

Pretty sure this Ollama release broke my assistant pipeline…

Rudd-O · June 1, 2025, 9:15pm

Gonna upgrade Ollama N O W. Woohoo!

Rudd-O · June 1, 2025, 9:16pm

Ok then i’ll hold off on upgrading.

Crap.

Rudd-O · June 1, 2025, 9:23pm

I upgraded and it works fine. Now i wish Whisper did streaming. It would also be cool if Piper did it too. Shaving those two or three extra seconds between responses is a great user experience enhancer.

Shawneau · June 1, 2025, 11:27pm

Glad it worked for you, I have two different rigs both running ollama 0.9 and both give me the response I gave screenshot for above… I’ll try and troubleshoot more but Ollama upgrade is only new thing. Which model do you run? Gemma w/ Tools or Qwen? Something else? thanks!

Edit: Now voice is giving me proper responses but prompting with text gives me tool call nonsense pictured above… I’ll keep working on it

Rudd-O · June 12, 2025, 12:45am

Qwen2.5 7B 4bit. Hugh Mungus context window.

NathanCu · June 12, 2025, 2:00am

Did you have to do your own model file for that I think im landing on qwen for the monastery

Rudd-O · June 16, 2025, 12:22am

No custom model file just vanilla with 20K context tokens.