Qwen3 llm no_think

qwen3 just came out, and the small models seem promising for a quick llm based interaction.

however, I can’t turn off the thinking mode.
I tried to place /no_think almost everywhere in the prompt, but it doesn’t work.
I tested if it works in a normal chat session with the model, and that’s okay.
I assume it’s because the instruction is neither at the beginning or the end in the final constructed prompt.

does anyone have an idea?

3 Likes

Change the component, that’s the only option.
You need to add a small code snippet that will add the required tag to each message from the user. The condition can be the model name.

1 Like

I am interested to see how this goes. With the rise of more thinking models having the capacity to request this in the integration and also an separate option to suppress the thought output would be great as well.

Yesterday I did something similar for cloud integration.

I checked again, if I place this at the end of the system promt, the model stops thinking, but for some reason the tags remain before the start of the response.
image

I think I’ll go with creating a modelfile in Ollama with /no_think in the system prompt

which obviously didn’t work, as the system prompt is overwritten by home assistant.

and yes, the empty thinking tags would also still be there

I’m not an expert, but it would seem to me that the ollama integration in HA needs to be updated to understand what the thinking part means so that it doesn’t put it in the output.

It’s not just qwen3 I see this with, but other thinking models as well… And all of them behave the same way with the thinking part tagged, so it doesn’t seem to me like it’d be too difficult to update (hopefully).

1 Like

yes there is an issue open on github already. so it might get implemented at some point.

1 Like

This is so annoying. /no_think does disable thinking but, well, the tags are still present. And a regex based solution would break streaming too.

On top of that, apparently Ollama itself does not support structured response without the thinking part. Many tickets open regarding that.

I guess I’m stuck with qwen2.5, as no other newish Ollama model supports tool calling and not thinking.

I will have to play around with it some more, but I put /no_think as the first words in the prompt and them again as the last words in the prompt and it stopped. I figured it can’t hurt being the first and last thing it reads.

Also I am using the Local LLM Conversation integration not Ollama. It’s got more tweaks and adjustments, which I didn’t have to use in this case but the Ollama integration doesn’t let you easily tweak temperature(LLM Temp not my thermostat) and TopK, etc. I used the ChatML prompt format.

I have tested both the typed input and STT and it no longer says or prints . I am not going to touch it for right now to see if I can recreate it since I want to test it now that it’s not annoying me with the thinking.

So far, I have to say, it is performing better than anything I have tested right out of the box. It’s as good as a well prompted and mucked with Mistral:7b. It’s pretty fast on a 4060ti 16GB and very usable even on a 3050 8GB.

Give it a try and if it’s repeatable we can write it up as a solution.

Okay, I have tested have an update.

It works with the following setup.

  • Local LLM Conversation Integration v0.3.8 (not the Ollama integration)
  • Use ChatML Prompt Format
  • Qwen3:latest model
  • put /no_think as the first command.
  • put /no_think as the last command.

The /no_think at the end seems to carry the most weight, which seems odd.
With /no_think just at the top it doesn’t work at all.
With /no_think just at the bottom it works 80% of the time (25 tests and 5 added )
With it in both places it hasn’t added in over 100 attempts.

Output without /no_think. He rambles until he runs out of tokens.

With /no_think in both top and bottom. A thorough answer like I asked for in the prompt.

As I said putting it only in one place was at best inconsistent and I am not sure why, but that’s a fail in my book.

For now I call this a solution. Now to get him to stop saying “asterisk” with the markdown. I had that solved with Mistral but the same solution isn’t working for Qwen, but that’s for a different post.

-Cheers

1 Like

amazing, thank you!

that’s really good. although I’d like to have this with the official Integration, I think that’s a feasible workaround for now.

I don’t know what others are finding, but Qwen3 is by far the best LLM for Home Assistant I have seen and I have tested all the major ones and a lot of the trained specifically for HA ones.

I have been able to remove a few automations and scripts because it just figured out how to work with the integration on its own.

For example. I just added the integration for Google Tasks. I didn’t even leave the google task website and simply said “Hey Yoda(that’s my wakeword), add mow the lawn to my personal list google tasks.” and 5 seconds later, it was there. Of course, then I tried to delete it and poof it was gone. I added nothing else, no custom sentences, no pyscript, nothing. That is how it is supposed to be but I haven’t seen it until today.

I was able to remove some custom Music Assistant automations as well and it works on its own.

There’s probably a better place to post this but I figured anyone who reads this will at least see that Qwen3 is worth a good look as a Voice Assistant in HA.

1 Like

I definitely agree. it is the first model I used that was able to correct an incorrect tool call within a single request.

I asked it to turn on the TV and the PlayStation, it made two tool calls, where the one for the TV was correct and the one for the PlayStation returned an error.
within the same assist process it automatically created a follow up tool call to turn on the PlayStation which was successful.

2 Likes

With good grounding and a good reasoner this is what you should come to expect.

Spend time in your tool descriptions and you will be rewarded. (I plan on using qwen3 for Fridays local version myself)

Tried Qwen3 and it seems to rival ChatGPT in terms of handling multiple commands at a time, the 14b version responds almost as quickly. Really seems pretty good.

Would you be able to share your full setup/prompt? This is the solution I’ve been looking for. Been using qwen3 through ollama since it came out and it works flawlessly and much faster than 2.5. The only issue is the /think in the outputs of the assistant. I tried adding the local llm integration and used the standard prompt with the addition of the /no_thinks. qwen3 now just replies that it had an error calling the tool, which never happened before. Any ideas?

Sure. Here is a sample I am playing with with a Piper Onnx of Yoda that I trained, kinda fun.

This has only the standard tool prompting, I’d love for someone to start sharing their tool ideas. I’ve tried to start a few threads and for some reason we are all keeping our tools prompting a secret, or we aren’t changing the default.

****Caveat - This only works in the Local LLM Conversation integration, if you are using the Ollama Integration, this will not work. Ollama Integration is not (yet) paying attention to the /no_think.

This prompt should give you not think prompts and no markdown. I prompt him twice for both of these things as for whatever reason it increases the success rate.

/no_think. You are 'Yoda', a helpful AI Assistant that controls the devices in a house via Home Assistant. Use only plain, speakable text. Format your responses for TTS.  Complete the following task with information provided. If you do not have enough information to execute a task, stop and ask the user to repeat the request with more detail. 
The current time and date is {{ (as_timestamp(now()) | timestamp_custom("%I:%M %p on %A %B %d, %Y", True, "")) }}
Tools: {{ tools | to_json }}
Devices:
{% for device in devices | selectattr('area_id', 'none'): %}
{{ device.entity_id }} '{{ device.name }}' = {{ device.state }}{{ ([""] + device.attributes) | join(";") }}
{% endfor %}
{% for area in devices | rejectattr('area_id', 'none') | groupby('area_name') %}
## Area: {{ area.grouper }}
{% for device in area.list %}
{{ device.entity_id }} '{{ device.name }}' = {{ device.state }};{{ device.attributes | join(";") }}
{% endfor %}
{% endfor %}
{% for item in response_examples %}
{{ item.request }}
{{ item.response }}
<functioncall> {{ item.tool | to_json }}
{% endfor %}
User instruction:
Do not read off a numbered of list of devices and their state. Use what you know about Home Assistant to execute tasks. Format your responses for TTS. Have fun and remember to be polite./no_think.'''

are you using any of the “in context examples”? any other relevant pieces of the configurations screen in home assistant?