Wait for end of TTS before continuing with automation - Sonos

I have some variable TTS commands that execute throughout my day. Right now, I have delays set on each TTS command but because the message is always variable (due to using chatgpt to generate the messages), it tends to cut off just before the end of the sentence or take a long time before it executes the next step in the automation.

Is there a way to check when a TTS message has finished. Other solutions have said to implement a wait trigger for the media player to go idle. However, my sonos speaker never goes idle when the tts is spoken. Either this has something to do with how sonos handles tts or how my tts software handles it. I am currently using tts.elevenlabs (GitHub - carleeno/elevenlabs_tts: Custom TTS Integration using ElevenLabs API).

I typically run ambient music throughout my day so this could also be the reason why it doesn’t go idle when a TTS has finished speaking. Ultimately, in an automation I would like some TTS spoken and then straight after its finished speaking, it executes the next command like change the lights or something.

If there is a possible solution this would make my TTS much more seamless and natural instead of having inconsistent delays between automation steps.
Thank you!

Have you tried using the announce option with the media_player.play_media service?

If you set it to “true” whatever the speaker is playing will resume after the announcement, no need for a delay. Haven’t tried it with TTS, but it works on my Sonos with pre-recorded announcements.

The other option is to wait for the media player to start playing, and then for it to become idle, using a wait_template. That’s how I handle it when I need that kind of thing:

  - service: tts.google_cloud_say
    data:
      entity_id: media_player.master_bedroom
      message: "{{ states('input_text.train_message') }}"
  - wait_template: "{{ is_state('media_player.master_bedroom','playing') }}"
  - wait_template: "{{ is_state('media_player.master_bedroom','idle') }}"
  - service: media_player.turn_off
    data:
      entity_id: media_player.master_bedroom```
2 Likes

FWIW, I have Sonos speakers and created a script to perform the following steps:

  • snapshot
  • optionally join new group of players
  • optionally set volume
  • optionally play preamble (a very short tune to grab attention)
  • play TTS announcement
  • restore

To wait for the preamble and/or announcement to finish, I use the same technique described by Tinkerer.

Originally, I used this wait_template:

    - wait_template: "{{ states(new_group_master) == 'idle' }}"

However, about two versions ago, I encountered problems because, after playing the preamble or announcement, the player’s state changed to paused as opposed to idle. It was unclear to me when, after playing something, it returned to idle or paused so I changed the wait_template to this:

    - wait_template: "{{ states(new_group_master) != 'playing' }}"

This should work the same way:

    - wait_template: "{{ states(new_group_master) in ['idle', 'paused'] }}"

NOTE

I also tried use the enqueue option which lets you add tracks (the preamble and announcement) to the queue then play the queue. That worked well until I learned that the queue is not included in the snapshot (it’s documented but I discovered it empirically). The result is a mess.

Here is another way to tackle the delay issue. Granted, it’s not the simplest, BUT IT WORKS 100% of the time for me and it can be adjusted to meet your exact needs. The TTS service, voice and other variables can affect message length and this code can be modified to fit those needs.
Pass in the variable {{message}} you want spoken and adjust as necessary. All my math is based on Google Cloud Say with the default voice. Google Translate Say differs slightly and will require some experimenting with the math. I can’t speak for other TTS services. YMMV!

    - service: input_number.set_value
      target:
        entity_id: input_number.media_duration_chars
      data:
        value: >-
          {% set chr_count = (message|length) %}
          {% if chr_count > 1500 %}
            {% set chr_count = 1500 %}
          {% endif %}
          {{ chr_count }}
    - service: input_number.set_value
      target:
        entity_id: input_number.media_duration_seconds
      data:
        value: >-
          {% set chr_count = (states('input_number.media_duration_chars')|int) %}
          {% if chr_count < 100 %}
            {% set duration = ((chr_count/10)|round(0,'ceil'))+2 %}
          {% elif chr_count < 200 %}
            {% set duration = ((chr_count)* 0.066)|round(0,'ceil')+3 %}
          {% elif chr_count < 500 %}
            {% set duration = ((chr_count)* 0.068)|round(0,'ceil')+3 %}
          {% elif chr_count < 1000 %}
            {% set duration = ((chr_count)* 0.071)|round(0,'ceil')+3 %}
          {% elif chr_count >= 1000 %}
            {% set duration = ((chr_count)* 0.074)|round(0,'ceil')+3 %}
          {% endif %}
          {% if duration < 5 %}
            {% set duration = 5 %}
          {% endif %}        
          {% if duration > 120 %}
            {% set duration = 120 %}
          {% endif %}
          {{ (duration) | int }}
    - delay:
        seconds: "{{ states('input_number.media_duration_seconds') }}"

I’ve tried using wait_template but when my tts is executed on my Sonos speaker, the state of the speaker does not change. It lowers the music playing and says the tts but it doesn’t actually change the state from “playing” to “idle” so having the wait_template doesn’t seem to work for my case. Unless I’m missing something?

I tried to copy your template you used but this also doesn’t work for my case. The state of the sonos speaker never changes to paused or idle while something is playing.

I tested it by playing some music then executing this script:

alias: tts test 2
sequence:
  - service: sonos.snapshot
    data:
      with_group: true
      entity_id: media_player.adam_s_room_one
  - service: media_player.volume_set
    data:
      volume_level: 0.49
    target:
      entity_id: media_player.adam_s_room_one
  - service: tts.speak
    data:
      cache: true
      message: hello this is a test message
      media_player_entity_id: media_player.adam_s_room_one
    target:
      entity_id: tts.elevenlabs_tts
  - wait_template: >
      {{ is_state('media_player.adam_s_room_one', 'paused') or
      is_state('media_player.adam_s_room_one', 'idle') }}
    continue_on_timeout: true
  - service: sonos.restore
    data:
      with_group: true
      entity_id: media_player.adam_s_room_one
mode: single

But the wait trigger is stuck waiting for the media player to pause or go idle after the tts. The reason I want to figure out when the tts has finished is so i can lower the volume back down to an ambient volume. The tts when it is played, is too quiet. So what I want is to raise the volume, say the tts and then go back to ambient volume but I just can’t figure out how I can tell when the tts has finished speaking.

I think I could probably use this but it wouldn’t work 100% of the time. Because I am using elevenlabs tts, there is a small buffer depending on how long the tts is the generate audio file. Also, I think this would work better with google_translate or a similar alternative compared to AI generated tts because it’s more consistent compared to elevenlabs having stability and clarity + similarity enhancements settings. If two of the same sentences are generated with elevenlabs, they aren’t necessarily spoken in the same way.

I think this might solve my problem but it wouldn’t be as consistent for my use case. I will definitely try this solution if there is absolutely no way I can figure out when the tts finishes speaking.

I’m unsure how to play the local tts files through media_player.play_media. I can see the tts files are stored in the root/config/tts path but how do I play them? Also I think it would be difficult trying to figure out what the tts filename is when trying to play it this way.

I tried this but it didn’t work:

service: media_player.play_media
data:
  media_content_id: '/root/config/tts/example.mp3'
  media_content_type: 'music'
target:
  entity_id: media_player.adam_s_room_one

Then you’re using the announce feature, and there’s no way to know.

The media_player’s state should change to idle or paused after the media_player has finished playing the content. While it’s playing the content, its state is typically playing.

The exception to this rule is if the content is played using the media_player’s announce feature (as mentioned by Tinkerer). Currently, it doesn’t report a change from playing to idle, or paused, thereby making it difficult to know when the content is/isn’t actually playing.

I learned from community member TheFes that when you use tts.google_translate_say it actually makes a call to media_player.media_play and automatically sets the announce option set to true.

I’ll hazard a guess that this also happens to tts.speak.

Perhaps you can try something like this (untested). The tricky part might be to get the right TTS key word. For Google Translate it’s google_cloud and I don’t know what it is for Elevenlabs (it’s a custom integration) so I simply used elevenlabs in the example below.

service: media_player.play_media
data:
  media_content_id: 'media-source://tts/elevenlabs?message=This+is+a+test'
  media_content_type: 'provider'
  announce: false
target:
  entity_id: media_player.adam_s_room_one

FWIW, if you want to play media that resides on your Home Assistant server, you should review the documentation for the Media Source integration.


EDIT

Changed value of media_content_type from music to provider.

2 Likes

Is there a way to use tts.speak with announce off or is there some other way I can turn off the announce setting on the sonos speaker by default?

Or would I need to do what @123 said which is to call the media_player.play_media service to turn off announce?

Here’s what tts.speak supports and there’s no indication it lets you disable announce.

1 Like

I tried use the media_player.play_media service call that you provided, but I get an error saying provider not found. Here is what I have:

service: media_player.play_media
data:
  media_content_id: 'media-source://tts/elevenlabs_tts?message=This+is+a+test'
  media_content_type: 'music'
  announce: false
target:
  entity_id: media_player.adam_s_room_one

I even tried it with google_cloud but it also shows provider google_cloud not found so I’m not sure if I’m calling the service correctly.

I had a look through the elevenlabs source code and it states that the domain is elevenlabs_tts but not sure if is this is the correct TTS key word. I tried your example and this one but that same error occurs so I’m unsure if this is because it’s the wrong key word or because there is something wrong with the play_media example.

Do you have the Google Cloud integration installed?

It’s different from the Google Translate integration.


EDIT

Change this:

media_content_type: 'music'

To this:

media_content_type: 'provider'

Okay didn’t know how to listen to a service using call_service but now I figured it out. Here is the play_media service that worked:

service: media_player.play_media
data_template:
  media_content_id: >
    media-source://tts/tts.elevenlabs_tts?message={{
    state_attr('sensor.hassio_openai_response', 'response_text') | urlencode
    }}&cache=true
  media_content_type: music
  announce: false
target:
  entity_id: media_player.adam_s_room_one
enabled: true

it was actually tts.elevenlabs_tts. I can now use the wait_templates and it works. Thank you.

You’re the second person in the past week who has marked their own post as the Solution despite using a solution that was provided to them. That’s not the community’s custom for the use of the Solution tag. It makes it appear that everyone ultimately solves their own problem (regardless if the answer came from someone else).

FAQ Guideline 11

Is that better?

Would this work for a Google Home device?
I use one for my voice control speaker and I need to “wait” until it stops for the “listening” to start.

Here’s what I have:

  on_wake_word_detected:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"
    - switch.turn_off: use_wake_word
    - delay: !lambda "if (id(use_wake_word).state) return 200; else return 0;"
    - homeassistant.service:
        service: media_player.play_media
        data:
          entity_id: media_player.google_home_mini
          media_content_id: media-source://media_source/local/alfred_bell.wav
          media_content_type: audio/x-wav
    - wait_until:
        not:
          speaker.is_playing:
    - delay: 100ms
    - voice_assistant.start_continuous

The ‘speaker.is_playing’ doesn’t work, obviously, but I don’t know how to determine the Google Home stopped playing.

For anyone else stuck on this. Here is the format I use for tts.speak to a Sonos media player in order for it to change state to ‘playing’. You can also use this to specify the language and voice for tts.speak in variables defined up front as I have. Note that tts has a limit to the length of message it can work with (not sure exactly what that is). To get around this limitation, I split my message up into multiple pieces up front, pass them individually into the below action, and wait for the media_player to leave state ‘playing’ before starting the next piece.

data:
  entity_id: media_player.home_theatre
  media_content_id: >
    media-source://tts/tts.home_assistant_cloud?message={{
                announcement | urlencode }}&cache=false&language={{ tts_language
                }}&voice={{ tts_voice }}
  media_content_type: provider
  announce: false
alias: Speak announcement on Sonos (special version for 'playing' state)
action: media_player.play_media
enabled: true