I just figured out this morning why a bunch of my LLM-integration scripts stopped working, and it’s because of the “Enable Web Search” functionality that was added in 2025.4 or maybe 2025.3. When that is enabled, it seems to be adding something to the generated prompt that is overriding things that are being added via script descriptions. So, for example, asking about weather goes straight to GPT-4o-mini rather than a script that is supposed to handle weather. On top of that, the responses to the web searches don’t seem to be processed using the LLM prompt – they’re just essentially the exact response you’d get with sending the request to the GPT directly. So instructions to not use markdown formatting or emojis don’t get used.
Turning the web search off fixed everything, but for random things that scripts aren’t handling, it is nice to have there. There’s not really any documentation about how, transactionally, that support was implemented, but I can’t imagine I’m the only one who has run into this problem. Has anyone figured out any work-arounds that leave script intents working and still supports web search?
Since OpenAI/Google use internal logic for calling search tools, you cannot control it. You can only enable or disable the tool.
The best scenario, in my opinion, would be to use a primary agent without search. Then create custom sentences that will call conversation.process with a second agent, which has search enabled, and pass your query to it.
Interesting. Experimenting with it in OpenAI’s playground, it seems broken on their end, at least. Any query with the web search module enabled seems to be bypassing the system prompt entirely. Putting the exact same commands in the request, however, works sporadically. It’s making me think they’re doing something to sanitize the results of a web search, and it is aggressively filtering system and prompt commands.
Annoying. Especially since it seems arbitrarily up to the LLM to decide if a request should trigger a search or an exposed script tool.
IMO, NC should’ve better documented that enabling it actually breaks (or at least can seriously change the behavior of) script integration. And given that responses almosts always have Markdown formatting, the conversation agent (or at least the TTS engine) really should be de-formatting the input. (Particularly as long as the TTS engines aren’t supporting SSML.)
Actually, digging into the code, I think this does stem from a bug in the implementation. It appears to be putting the Web Search tool last in the list of tools on the request, which – because of how LLMs process the input token streams – basically means anything the implicit instructions is telling the LLM with the tool take precedence. That may be why, if you have a weather script and ask for weather, the LLM decides not to use it. I suspect the implicit prompt on the web search tool is mentioning weather…
I don’t know how much I want to deal with cracking open a core component and running a custom build to see, but if I find some spare cycles I may make sure the web search tool is first, not last. I suspect that may “fix” the problem.
Also I’m still going to keep it separate. I the long game I think we’re all landing on some kind of multi agent model - pass the convo around between the experts. Search being one.
Yeah, fundamentally it’s very similar. A script calling a separate conversation is essentially creating a new tool in the list, just like using the OpenAI one, just with another round-trip to HA. That’s slower and adds a ton of extra tokens, but I think most people wouldn’t care. I think the bigger downside is the loss of context of the results in the top-level conversation, but that’s probably a corner case.
That does, however, make me wonder if there’s a separate potential issue in HA – the order of tool definitions coming from scripts exposed to the LLM would actually matter if they have any descriptions that could be interpreted as overlapping in any situations – which are going to be more often the smaller or more quantized the LLM is. I can’t find where the list of tools is being generated to see if it is deterministic or ordered (like, alphabetical or something). Depending on the order of the generated list, a local script for web searching could have the same problem if the prompt is generic enough. And it limits things if you have to start being explicit about the kind of things to use web search for…
One big upside, though, as I think about it – it’d be pretty easy to write a jinja2 function to use in the web search template to de-format the responses.