Questions about intent scripts

I noticed intent_scripts that have both a speech block and an action block execute the action block immediately, independently of the speech block (i.e. without waiting for the speech to finish.)

My first question is: Is there a way to make the action block wait until the speech is finished before executing?

if not, my second question is: Is there a reliable way to determine which speaker/assistant received a given voice command so that an answer can be reliably returned to the assistant/speaker that received the command thru a script or automation? (this is done automatically in an intent script, but my problem is that the fact that the action block in an intent script runs before the speech finishes causes issues when the speech is meant to actually reports entity statuses that the action block is acting upon…)

I’m not sure that’s clear… so to give a concrete example of what I mean, let say I have a number guessing game, the goal of which is to guess a two-digit number, one digit at a time. The voice command for a guess would be something like “my guess is 4”

The intent script would then have a speech block where the guess received is checked against a stored value in order to provide an appropriate answer such as:
“wrong. sorry. better luck next time” if the guess is wrong (then game is reset)
“correct. now guess the next digit” if this was the first guess
“correct. You win!” if this was the second guess

The action block could, for instance, call a script to act upon the result of the guess such as: reset the game when a wrong guess is made, move to the next digit when a correct guess is made the first time or play a sound effect when a correct guess is made the second time (game won)

The problem is that the actions execute before the speech and before the status of the game variables are accessed in the speech section. For instance, when the speech block executes, the game will already have moved on to the next guess and when the speech tries to handle the second digit guess, the game is already over type thing.

It’s easy to handle the speech inside a script rather than in the intent script but then I don’t know how to return the speech to the actual speaker the guess was issued to…

Check out this discussion

Thanks. I did take a look however I’m still confused. I tried to use developer’s tools to watch intent_recognized events as they happen and get info from their data payload, however, nothing shows up in the listener when I use my custom intents?? Is there another way to listen to voice command events as they happen?

When you activate your game with your voice, you receive your satellite’s ID. By manipulating the names as shown in the mentioned topic, you can obtain the name of the media player and then use it in your actions.

I’m pretty sure there’s no way to detect when a voice assistant has stopped speaking. A speech block has finished as soon as the last action has completed - that’s the speech command, not the sentence the voice assistant is saying.

I’ve always put in delays to make sure there was time for speech to finish:

# Play radio...
        
CustomRadioPlay:
  action:
    - action: script.tts_response
      data:
        tts_sentence: "OK. {{ states('sensor.wait_phrase') }}"
    - delay:
        seconds: 2
        
    - action: media_player.play_media
      data:
        media_content_id: "{{ station }}"
        media_content_type: favorite_item_id
      target:
        entity_id: "{{ states('sensor.speaker') }}"

However, if you have an ESPHome voice assistant, there may be something in its configuration you can use.

As for returning the response in your game… My voice assistant don’t have any location awareness, so I’ve come at it from the other end by using the Bermuda integration to track which room I’m in, then responses are sent to speakers in that room.

I can actually ask a question in one room, move to another room and hear the anwser there - which is quite cool (though it wasn’t planned that way).