Simple script to handle multiple TTS lines as a batch

Recte · July 9, 2023, 10:08am

Quite often I have a TTS line that is too long for my Pi4 to process using Piper. Since I am not aware of an option to set a timeout, I want(ed) to split the text and then have the TTS run multiple times. One challenge is to detect when the TTS is finished. For as far as I know, there is no option for that, so I used the delay solution I found here: I finally solved my issue of multiple Alexa notifications interrupting one another

So for the sake of sharing, here’s what I’ve done. I’m always open for suggestions of course.

alias: Multi-line TTS
sequence:
  - variables:
      text: "{{ states('input_text.multi_line_tts') }}"
  - repeat:
      count: "{{ text.strip('.').split('.') | count }}"
      sequence:
        - variables:
            text_current: "{{ text.split('.')[repeat.index - 1] | trim }}."
            text_word_length: >-
              {{ ((text_current | count - text_current.split(' ') | count) /
              text_current.split(' ') | count) | round(1) }}
        - service: tts.speak
          data:
            cache: true
            media_player_entity_id: "{{ speaker }}"
            message: "{{ text_current }}"
          target:
            entity_id: tts.piper
        - service: logbook.log
          continue_on_error: true
          data:
            name: >-
              Debug Multi-line TTS to speaker {{
              state_attr(speaker,"friendly_name") }}
            entity_id: script.multi_line_tts
            message: >-
              The current text '{{ text_current }}' has {{ text_current.split('
              ') | count }} words with an average of {{ text_word_length }}
              characters a word. The delay is {{ (text_current.split(' ') |
              count * 0.5 ) | round(0, 'floor', default) }} sec.
          enabled: false
        - delay:
            seconds: >-
              {{ (text_current.split(' ') | count * 0.5 ) | round(0, 'floor',
              default) }}
          alias: Delay 0,5s per word
mode: queued
icon: mdi:microphone-message

In short, it splits the text into single lines using the dot as separator. Made sense to me, you just have to keep the lines short. Since it’s a script, one could of course pass variables to it from an automation. For example:

service: script.multi_tts_lines
data:
  speaker: media_player.<your speaker>

or

service: script.turn_on
target:
  entity_id: script.multi_tts_lines
data:
  variables:
    speaker: media_player.<your speaker>

Known issues:

If the text is not yet in the cache, it messes up. But the next time, it’s fine
A very short line with long words could result in an overlap, but I haven’t bumped into it other than while testing.

Edit 1: Small code improvement and a (disabled) debug line that could be useful in solving issue 2. One could for example decide to make the delay factor dynamic based on the average length of words. Next I’ve put the speaker into a variable. If you want to always use the same speaker, you can set the variable speaker as first action in the script.
Edit 2: I ended up using an input_text helper since I bumped into issues with longer lines not being sent over to the script. If you use the repeat in the same automation where you define the variable text, then there is no need for a helper.

jeroenterheerdt · March 17, 2025, 4:56pm

Thanks for sharing this, this is very helpful still today. Amazed that we have to hack around tts systems like piper for longer texts to avoid timeouts