Two questions involving Assist and Voice

Okay, two quick questions I can’t seem to figure out:

  1. How can I tell what the command is that Assist is running? I have an automation to pause my music when Assist is listening so the music doesn’t interfere, but that means that “Pause my music” doesn’t work, so I’m trying to figure out a way to say “Start playing again unless the command is to pause”.
  2. For custom_sentences, how would I get a response from a LLM into a variable to pass to the response? I’d like to get the LLM to generate an appropriate response based on what it’s doing.
  1. setgings> voice assistants> your assistant> three dot menu > debug

  2. Look up sentence trigger

Adding a custom sentence to trigger an automation - Home Assistant

I worded that first one badly; what I meant was, I have an automation that watches for the Assistant to go to Listening, pauses my music if it’s playing, waits for the Assistant to shift back to Idle, and then unpauses it. The problem is, if the command is itself “Pause”, then the unpause action actually reverses the command, so I want to be able to say “Unpause my music, unless the command is itself to pause my music”.

There are complex methods for obtaining command text from sat (add new sensor in esphome config).
Consider automation with volume changes or mute if you want a simpler solution.

But volume changes aren’t really easier to manage.
In fact there are even more weird conditions. :wink:

Even if you duck the volume by a factor and restore it after the assistant finished:

Let’s say you duck to 1/3 and then tell the assistant to increase the volume by 10%.
Now you unduck by multiplying with 3 after that.

This might result in a volume not thaaaat far away from what you hoped for.

But what if you tell the assistant to set the volume to a fixed volume level like 25?
If you then unduck by multiplying with 3, you are way off.

What I did at the end:

  • On “listening” save the volume somewhere (You might create a helper for each of this players to store the volume. I switched to Node-Red for this automation, so I could avoid that)
  • Duck the volume
  • Now you can speak
  • Restore to unducked state on “processing”/“thinking”
  • Now the assistant will execute volume changes based on the unducked state ( ← This is the important part: The changes happen in the default state, so you can save the correct new state before ducking again)
  • Save the volume and duck again on “responding”
  • Now the LLM/TTS will talk to you
  • Unduck again on “idle”

This would also be possible with pausing.
You would have to save the previous play / pause state instead somehere, to restore to the correct one.

But this pausing / playing two times in a very short timespan might sound a little bit annoying. Works way better with a decent muting.

I don’t think this causes any difficulties.

  • We write the current volume value to an internal variable.
  • We ducking the media player. For example, to 0.04.
  • When the satellite returns to idle (wait_for_trigger), we check the current volume of the media player: if it’s still 0.04 , we return the value from the variable. Otherwise, we do nothing.

We will lose the ability to set the media player volume by voice to 4%, but this is not fatal :upside_down_face:.

Have a read through Creating scenes on the fly. You can use that to snapshot the current state of your media players so you’ll be able to reapply it whenever you want.

Here is how i have done it

alias: Mute music 
description: ""
triggers:
  - entity_id:
      - assist_satellite.home_assistant_voice_0a2cb3_assist_satellite
    to:
      - listening
    trigger: state
conditions:
  - condition: device
    device_id: e1c55ff967e4fe6ef86e882e2ecdc8bf
    domain: media_player
    entity_id: f71b04572231d02b5f412672c686abeb
    type: is_playing
actions:
  - action: media_player.media_pause
    metadata: {}
    data: {}
    target:
      device_id: e1c55ff967e4fe6ef86e882e2ecdc8bf
  - wait_for_trigger:
      - entity_id:
          - assist_satellite.home_assistant_voice_0a2cb3_assist_satellite
        to:
          - idle
        trigger: state
  - choose:
      - conditions:
          - condition: state
            entity_id: input_boolean.stop_music
            state: "off"
        sequence:
          - sequence:
              - action: media_player.media_play
                metadata: {}
                data: {}
                target:
                  entity_id:
                    - media_player.sonos_roam_2
              - action: input_boolean.turn_off
                metadata: {}
                data: {}
                target:
                  entity_id: input_boolean.stop_music
mode: parallel

You also loose the ability to use relative volume commands, which are more often used by my family than absolute target volumes.
The reason is quite simple: We often don’t know how loud a speaker is in absolute volume currently.
But we do know whether we want to increase or decrease the volume.

Example:
When someone asks to increase the volume, it would increase the already ducked volume. Let’s say start volume was 30%, you duck to 10%.
Now the relative volume command will maybe change the volume by 10%, so the new value will be 20%.

As your check will now detect a change, it won’t unduck afterwards and your volume has decreased instead of increased as requested.

That’s one of the reasons why I switched to the intermediate unducking when the voice assistant is “working” on our requests,
as the family members use a mix of relative and absolute volume changes.

edit: I edited the example as it was wrong.

1 Like

Of course, I know the scenes and I often use them for lights, but in my example I only need one value, so using one variable is enough.

That’s a fair point. I forgot about the relative volume adjustment feature being added a couple of months ago, as I don’t use it.
As task complexity increases, it inevitably demands more complex execution logic and some trade-offs.

I roughly imagined how your version could be improved, additionally using short-term mute at the processing and intent stages. But then we’ll have to sacrifice HassMediaPlayerMute

2 Likes

For the second one, I actually am defining my own custom sentences and custom intents to execute a list of actions; I was just wondering if instead of filling in the speech section in the intent, if I could send a request to the LLM to generate a response with a specific prompt and then use that as the speech reply, so it’s more personal/less robotic and repetitive.

  ...
  - action: conversation.process
    metadata: {}
    data:
      agent_id: conversation.agent
      text: "/.. your promt ../"
    response_variable: llm
  - set_conversation_response: "{{ llm.response.speech.plain.speech }}"

You can use this combination in GUI. Since this is simply passing data through a variable, it should work similarly in intent_scripts.