About making inexpensive models smarter by providing tools and context. (local models, gpt-5-mini, gpt-4.1-mini, gpt-4o-mini ...)

Another prompting hack, which finally solved the hard to understand reponses from requests about calender events or the weather.
At least nabu casa cloud TTS adds a lot weird pauses otherwise.

About date and time responses:
--------------------
Most of the time we communicate through a voice interface with you. 
(speech-to-text -> you -> text-to-speech).

Because of that, the full stops (".") in your answers in a date string like 
"10. and 11. September" will confuse the TTS service.
Always transform dates to written out versions like 
"tenth and eleventh september" (in the language the user asked you).

Timespans should be transformed from something like 
"03:00 - 04:00" to "03:00 until 04:00" (also in the users language).
As the "-" sign will most of the time result in a speech pause of the text-to-speech service 
instead of reading the timespan correctly.
1 Like

New Tool:

Text announcements over TTS

The kids can be everywhere in the house and they always listen to loud music over Music Assistant / HA media players.

Running and yelling around in the house felt like it’s 1990 again, so I had to write a tool to solve it. :wink:

It’s just a wrapper around tts.speak which hides the detail about which tts entity to use from the LLM.
You have to set it in the variables section of the script.

After that you can simply ask your voice assistant to tell the kids that launch is ready, for everyone that accepts to behave like a human being for the next 30 minutes.

Script code:

alias: Text to Speech Announcement
description: |-
  Purpose: Tool to play an announcements on a chosen media player (LLM-Tool)
  Required parameters:
    - media_player (entity_id in domain `media_player`) 
    - message (string) — the text to speak

  Optional parameters:
    - none

  Expected output on success (returned via stop/response_variable):
    - { ok: true, media_player: string, announced: string }

  Expected output on error (returned via stop/response_variable):
    - { error: string, (optional) details: object }

  Allowed values:
    - media_player must be an existing entity in the `media_player` domain
    - message must be a non-empty string

  Hints:
    - If you don't know which media players are available or which one belongs to a room that the user requested, 
      use the "Entity Index" tool with the tag 'MediaPlayer' to find them.
icon: mdi:bullhorn
mode: single
variables:
  tts_entity_id: tts.home_assistant_cloud
fields:
  media_player:
    name: Media player
    description: Media player that should play the announcement
    example: media_player.living_room
    required: true
    selector:
      entity:
        domain: media_player
  message:
    name: Message
    description: Text to speak
    example: Hide and seek is starting!
    required: true
    selector:
      text: null
sequence:
  - action: logbook.log
    data:
      name: "LLM TTS ANNOUNCE:"
      message: "{{ media_player }}, {{ message }}"
      entity_id: "{{ this.entity_id }}"
  - choose:
      - conditions:
          - condition: template
            value_template: "{{ message is string and (message | trim) != '' }}"
        sequence: []
    default:
      - variables:
          error_result: >-
            {{ { 'error': 'Missing or invalid parameter: message (non-empty
            string required).',
                 'details': { 'received_type': message.__class__.__name__ if message is defined else 'undefined' } } }}
      - stop: Invalid message
        response_variable: error_result
  - choose:
      - conditions:
          - condition: template
            value_template: >-
              {{ media_player is string and (media_player |
              regex_search('^media_player\.[A-Za-z0-9_]+')) and
              (states(media_player) != 'unknown') }}
        sequence: []
    default:
      - variables:
          error_result: >-
            {{ { 'error': 'Missing or invalid parameter: media_player (must be
            an existing media_player.* entity).',
                 'details': { 'received': media_player | default('undefined') } } }}
      - stop: Invalid media player
        response_variable: error_result
  - choose:
      - conditions:
          - condition: template
            value_template: >-
              {{ tts_entity_id is string and (tts_entity_id |
              regex_search('^tts\.[A-Za-z0-9_]+')) and (states(tts_entity_id) !=
              'unknown') }}
        sequence: []
    default:
      - variables:
          error_result: >-
            {{ { 'error': 'TTS entity not configured or invalid. Tell the user
            to edit variables.tts_entity_id in the tool script to a valid tts.*
            entity.',
                 'details': { 'current_value': tts_entity_id | default('undefined') } } }}
      - stop: TTS entity misconfigured
        response_variable: error_result
  - action: tts.speak
    target:
      entity_id: "{{ tts_entity_id }}"
    data:
      media_player_entity_id: "{{ media_player }}"
      message: "{{ message }}"
  - variables:
      result: "{{ { 'ok': true, 'media_player': media_player, 'announced': message } }}"
  - stop: ""
    response_variable: result

I also added this to the LLM prompt:

Messages to other users in the house:

When we ask you to anounce something in a specific room or a specific media player, or to tell the kids something on a specific room:
Use the tool “Text to Speech Announcement”.
To find a matching media player for a room, use the Entity Index tool with the tag “MediaPlayer”.
If we don’t specify where the kids are, announce in both of their rooms sequentially.

1 Like

Thanks for the yaml! The timers are working, nevertheless when I ask my VPE “are there any timers?” It is processed locally and returns “There are no timers active”. However, if a ask with more text to avoid the local processing it recoginzes the created “pizza-timer” and answers correctly. Is this the same for you? How do you mitigate that issue with local timers?

I think the local processing is the problem here.
You can’t “overwrite” a core intent with an own script in this case.
Try to ask it in a weird way that won’t get catched by the local processing and I bet it will work.

For the LLM you can do the prompting tricks like used here, to tell it to avoid the core intent (e.g. by lying that it would be broken and would result in errors).

We can just hope that it will be possible in the future to disable core intents that we don’t want to use.

HA will most likely add more and more core intents that might start to get in the way of our own scripts (that we might still prefer sometimes).

I disabled “process locally” on my voice assistant because it caused too many problems or weird behavior.

This is the problem and you can… But you have to delete it entirely or replace it. Either r I don’t really recommend. Once local assist deterministic matching hits you’re not getting anything else period. End of story. To avoid it you have tk turn off local first.

And I did the exact same thing.

1 Like

Yeah, kind of a two pronged problem since they don’t expose anything with timers.

With the weather local intent handler I was able to override it and send it to the LLM.

Would be nice if they improved in this area.

1 Like