Is there a way to programmatically toggle "Think before responding" for a voice assistant? I'm running gemma4 e4b on some older hardware and for most responses like "turned off the lights" thinking mode isn't necessary and just slows down the response.
However, I scripted a "morning briefing" for my kids that runs every morning and saves the text response for later TTS processing whenever we ask for it. It works way better when thinking mode is turned on, but the only way I see to do that is by clicking the gear and checking the box.
Thanks for the response, I use local ollama running on an old Quadro P4000 with 8 GB VRAM. Unfortunately, I can't load two LLMs - faster-whisper:gpu large-v3-turbo takes up about 3.5 GB and gemma4 E4B takes up another 4 GB. If I unload either to enable thinking it takes quite a bit of time to reload upon first usage. Therefore, I leave both loaded all the time.
The ideal solution is the ollama integration allows per-interaction selection of thinking. So for example, if conversation.process has a data field with a selectable "think" boolean option. It wouldn't hurt my feelings if it also had an option to change the system prompt on every call either.
Doesn't the ollama integration allow you to create two conversation entities for a single server? One with thinking, one without. When creating a briefing, use the thinking version, and for the satellite, use the non-thinking version.