Has anyone been successful in using tool calling with Qwen 3.5?
Are you using:
Ollama?
vLLM?
llama.cpp?
What inference settings are you using? What version of ollama/vLLM/llama.cpp? Are you using the thinking boolean in your config? Are you using /no-think in your prompt context window?
I had to flip back to GLM-4.7 Flash to get tool calling to work seamlessly again.
Crazynick started looking at it but both me. And nick believe you should use the instruct not the thinking model or you won’t get it to shut up. Do a search you’ll find the thread.
OSS20b is better at following instructions right now imho, (end Feb 2026) or at least with current prompt techniques. I’ll see what the quantized models do in a month or so…
Experimentally, GLM-4.7 Flash is amazing at antigenic tool calling, but I really envy the artificial analysis intelligence index of Qwen 3.5. Also, Qwen 3.5 vision model is fantastic. Qwen 3.5 is definitely the best vision model I’ve been able to run. My goal is to have one model for all tasks.
If Qwen 3.5 could replicate GLM-4.7’s tool use proficiency, it would be perfect. The Tau agentic tool use benchmark for Qwen 3.5 doesn’t seem so far off, so part of me wonders if it’s an implementation problem from Ollama. Others running llama.cpp do not appear to experience the same difficulty with tool calling. Perhaps it needs to be fine tuned a little in HA’s direction.
I agree GPT-OSS 20b high reasoning is great. I didn’t find it to be as good as GLM 4.7 Flash in regards to tool calling though.
By instruct, I’m assuming you mean running the same model with thinking disabled. As far as I’m aware, they are the same model with two modes. I’ve not gotten very good results so far. I plan to give llama.cpp a shot.