Checking for todays rain returns wrong results

haushalt · July 29, 2025, 6:19am

So my wife just asked Nabu “is it going to rain today?” and the response was “No precipitation is expected”, while ALL weather services in HA show rain. I am using OpenAI ChatGPT 4.1-mini as agent. How can I find out which information was provided to give the wrong answer? I have been running debug on the OpenAI service but didnt find any clue in there. All the entities pusblished to assist seem to be correct… :-/

Hellis81 · July 29, 2025, 7:04am

Got to love the humor in this
I saw the national weather service here had a forecast a few days ago saying there was a 10% risk of rain, and if it rained it believe it would rain between 0.1 mm to 9.8 mm in one hour.

So I mean why would Home Assistant be better?
I’m just speculating here, maybe you need to specify where? It perhaps doesn’t send that information to the LLM and the LLM just hallucinates.

Thyraz · July 29, 2025, 12:14pm

I think there is still no default intent in Home Assistant to enable Assist to call weather.get_forecast.

In other words: The LLM is great in generating text, but needs help with other things.
The included tools are not that much so far.
You can also easily create your own tools by creating scripts or intent_scripts to give your assistant more capabilities.

For weather you can already grab a ready to use script here:

jackjourneyman · July 29, 2025, 12:31pm

Yup. Weather is one area where you really have to write your own intent scripts, because there are so many ambiguities. What percentage probability is “going to rain”, for a start.

I have one that draws on the local weather forecast, but even that it will often say it’s raining when you can see it isn’t just by looking out of the window. My script checks the rain gauge in the garden before it commits itself.

Edit: If you turn on the Prefer handling commands locally toggle in the Assist setup page, you can combine local intent scripts with OpenAI. It works pretty well.

NathanCu · July 29, 2025, 12:40pm

Correct… Op, Unless you have intentionally both told your llm thou shalt not lie (yes this is a thing you should do) AND provided a tool to provide a correct answer. (yes both), you should not blindly trust what it says. Even. Then blind trust is bad.

The weather script Fes made should be good enough for most cases when combined with a truth directive VERY HIGH up in the prompt. Like step one or two.

By default it will tell a story - you have to guard against it.

jackjourneyman · July 29, 2025, 1:08pm

Try asking about distances. OpenAI is very, very bad at that. The distance to the next town will be different each time.

You can build safeguards into your prompt along the lines of “all answers must be based on actual data or external sources”, and remind it “not to fabricate data and instead state when it cannot confidently answer”. The most likely outcome then is that it simply refuses to commit itself.

NathanCu · July 29, 2025, 1:17pm

No you don’t build safeguards… You build safeguards WITH A TOOL (you must have both.)

If you only give the safeguard and it can’t get the answer it still hallucinates if you build a tool and don’t give instructions… Hallucinates. YOU MUST have BOTH the truthiness directive and a good tool that it knows how to use… Else that response is suspect.

Tools. All of these are tools anything that needs an accurate calculation should be looked up.

I don’t have that issue because I have a tool that’s called raw distance to and accepts GPS coords or a zone (which resolves to GPS) for to and from. Do math. Report. Distance problem solved.

Ground travel Directions. Tool for Waze.

Air travel dont have a tool but I can tell you what aircraft are in or have been in local airspace and everything about it including its current GPS coords which can be used in either of the previous tools.

Calculator. Yep full scientific calculatior with sin, cos, tan

News. Tool to look up the newsfeeds I care about

Internet search. Dedicated web search tool llm

When you do that you can make your tryth directivs reference the toolbox and state that if you cannot get the answer from your stored data or tools your answer is I do not know and feel free to advise the user on what you would need to answer the question accurately.

(tools are that important thus the reason for my soapbox about we should always make multi tools because we only have 128 tool slots)

jackjourneyman · July 29, 2025, 1:25pm

Good point. I did look up the OpenAI web search option, but it seemed a bit expensive for casual use. Do you get to charge these things to the company?

NathanCu · July 29, 2025, 1:37pm

Not near as much as I want. And a very good point. Actually the web search scrape tool now uses my local openwebui instance which has a web search (Google) and a scraper because this.

In fact I’m on a tear to move as MUCH processing local to the local instance as possible. So there’s a tool to summarize a camera. That summary is driven by a local instance of llama3.2:vision and I cache the response. (no need to call it again if it’s fresh)

If I left it on OAI on its normal burn it means more than $200/mo in just summarizing cameras. (uh no…)

But if the cameras get summarized locally I can still interact with a big frontline model. And give it a tool to quickly access the last summary of the camera and pull a new one if necessary.

So the balance we’ll be hunting is making the prompt dense with the right info to answer the 80%. (this is the hard part) because avoiding hallucinations is about giving the llm the info it needs to answer. But we all know your info changes through the day and stuff relevant at 0800 on Monday is probably not the same data thats relevant at 2100 on a Friday. So your prompt should change accordingly.

On Monday am. I preload the response from the daily planner in the llm frontline prompt… on Friday evening it’s the menu and weekend calendar.

The net results are if you ask about your ToDo list on Fri evening she will have a cache miss and then because instructions look up the info using the ToDo tool… On Monday morning she will immediately answer and have to use a tool to mark things done. The art is predicting what info goes in when.

Thyraz · July 29, 2025, 1:47pm

While I use OpenAI as my default LLM for assist, I also didn’t like the increase in my billing it caused.

I use the free Google Generative AI approach with a script, so the web search can be triggered by the OpenAI assist:

jackjourneyman · July 30, 2025, 10:48am

Great suggestion, thanks. I’ve kept Home Assistant as the default conversation agent and I now have two “AI” custom sentences, one using OpenAI (faster) and the other Google Generative AI with Google searches (more up-to-date).

Open AI:

language: "en"
intents:
  CustomOpenAI:
    data:
      - sentences:
          - "(how [is] | how's | how are | how does | how do | how did | how will | how long ) {question}"
          - "(why [is] | why's | why are | why does | why do | why did | why will) {question}"
          - "(when [is] | when's | when are | when does | when do | when did | when will) {question}"
          - "(what [is] | what's | what are | what does | what do | what did | what will) {question}"
          - "(which [is] | which are | which does | which do | which did | which will) {question}"
          - "(where [is] | where's | where are | where does | where do | where did | where will) {question}"
          - "(who [is] | who's | who are | who does | who do | who did | who will) {question}"
          - "tell me [about] {question}"
lists:
  question:
    wildcard: true

Google:

language: "en"
intents:
  CustomGoogle:
    data:
      - sentences:
          - "look up {question}" 
          - "find [out] [about] {question}"
          - "google {question}"
          - "research {question}"
lists:
  question:
    wildcard: true

The intent scripts are the same except for the agent_id:

CustomOpenAI:
  action:
    - choose:
        - conditions: "{{ is_state('binary_sensor.online', 'off') }}"
          sequence:
            - action: script.willow_tts_response
              data:
                tts_sentence: "Sorry. There's no internet at the moment."
        - conditions: "{{ is_state('binary_sensor.online', 'on') }}"
          sequence:
            - action: script.willow_tts_response
              data:
                tts_sentence: "{{ states('sensor.look_it_up_phrase') }}"  # Random "Hang on a moment" phrase
            - action: conversation.process
              data:
                agent_id: conversation.openai_conversation   # Change to conversation.google_ai_conversation for Google
                text: "{{ question }}"
              response_variable: api_response
            - variables:
                answer: "{{ api_response.response.speech.plain.speech }}"
            - action: script.willow_tts_response
              data:
                tts_sentence: "{{ answer }}"

The point of all this is to allow responses to be directed to my Sonos speakers (I don’t have an ESPHome voice assistant).