Randomized TTS messages and avoid "script already running" errors

Tags: #<Tag:0x00007f7391be2970>

I am sharing with you a “service” I created to simplify randomized TTS messages preceded by an optional chime sound. The goal of my service is to consolidate all of the common setup for playing an optional chime, then an optional random (or static, or templated) TTS message so that I can use it throughout my normal automations easily and consistently. A nice advantage to having this centralized is that I won’t have variations of the chime + TTS pattern and “tts.microsoft_say” littered through my other automations. This has been tested on HASSIO version 0.111.4.

I have the word “service” in quotes because while getting it working I learned about a workaround to a limitation with scripts in that they can’t currently run more than once at a time, resulting in an error in the logs to the effect of “script already running”. The workaround is to use a custom event to trigger an automation. I have heard that there is a pull-request in the works to fix the parallel script handling, but for now this workaround is operating well.


First, let’s look at how this is called from an automation:

#in an automation:
  action:
    - event: chime_tts
      event_data:
        entity_id: media_player.bmod_keith_tablet
        sound: airplane-cabin-ding.mp3
        delay: 2000
        random: [
          "Keith is awesome!",
          "Let's get started!",
          "You're here!",
          "I've been waiting for you to arrive!"
        ]

The event value is our custom event chime_tts.

The event_data has the values needed by our chime_tts automation. If you want to use templating to define the values, use event_data_template. You could use templating to select a specific sound conditionally, or to adapt phrases based on what triggered your automation, among other things.

The required entity_id can be any valid media_player entity, or “all”.

The optional sound value would be a sound file you place in /config/www/sounds, which incidentally is available at http://your-homeassistant-url:8123/local/sounds.

The optional delay value is how many milliseconds to wait for the sound to play before playing the TTS so that the TTS doesn’t play before the sound is finished; default 2000.

The optional random value is an array of phrases to say.

If you just want to play a single TTS phrase, use the optional message property instead of random. If you provide both, random takes precedence over message

Here is an example calling with just a single message using templating:

  action:
    - event: chime_tts
      event_data_template:
        entity_id: all
        sound: airplane-cabin-ding-dong.mp3
        message: "We don't live in a barn. Close the {{ trigger.to_state.name }} door." 

Here is an example with random phrases using templating:

  action:
    - event: chime_tts
      event_data_template:
        entity_id: media_player.bmod_keith_tablet
        random: [
          "I'll get the lights for you. {{ ['this time', 'spoiled sap', 'lazy bum'] | random" }},
          "I guess you aren't coming back. I'll get the lights.",
          "Hello? I guess everyone left me alone. I might as well make it dark too."
        ]

Now, let’s look at the “service” automation:

- id: chime_tts
  alias: Chime + TTS
  trigger:
    - platform: event
      event_type: chime_tts
  action:
    - service: media_player.play_media
      data_template:
        entity_id: "{{ trigger.event.data.entity_id }}"
        media_content_id: "{{ base_url }}/local/sounds/{{ trigger.event.data.sound }}"
        media_content_type: music
    - delay: 
        milliseconds: "{{ trigger.event.data.delay | default(2000 if trigger.event.data.sound else 0) }}"
    - service: tts.microsoft_say
      data_template:
        entity_id: "{{ trigger.event.data.entity_id }}"
        message: "{{ trigger.event.data.random|random if trigger.event.data.random else trigger.event.data.message }}"

  • the trigger is watching for our custom event chime_tts
  • first action plays the sound, if given
  • second action waits for the sound to finish playing, or waits 0 time if sound wasn’t given
  • third action plays a random TTS phrase from random if given, or the single message if given

Notable behavior:

  • If this event is called again while it’s already active you will not hear everything played. The last call to play aborts the first call and plays. This is expected due to how there isn’t any queueing, but I would love to have smart queue/clobber handling if I can figure out a way to make it work. For example, if a door is opened and closed again before the first initiated sequence is finished playing then I’d like to hear the initial chime and the 2nd TTS can clobber the first TTS, but if two different doors are opened at the same time I’d like to hear the first chime, then the first TTS, then the second TTS without the second chime (since the chime is to get attention, it doesn’t need to be played again in that scenario).

Future improvements:

  • Add handling for scheduled quiet hours and temporary muting.

  • The first and third actions really need to be conditionally executed rather than always executed. They effectively don’t do anything if the sound or message aren’t given, but they are still called and left to their own handling of the missing values. I haven’t figured out how to conditionally skip an action yet.

  • The delay you need to specify needs to be based on the duration of the sound, which may need to be tuned per sound effect. I would love to figure out how to automatically wait until the sound finishes playing or detect the duration from the file.

  • Add optional push notifications via SMS and/or mobile app notifications so this can be the single notification service wrapper used by all my automations. I already have these working separately, so this should be an easy thing to add.

  • Find a way to implement a smart queue/clobber handler, as mentioned above in the “notable behavior” section.

5 Likes

Yep:

Queuing is one of the new modes that will be available.

There’s also this very useful enhancement that permits you to enqueue TTS requests. That means you can submit several TTS requests in rapid succession and they will be neatly executed one after the other in the order they were received (i.e. no collision or overlap).

However, even though it leverages something that already exists in media_player, it was deemed to change the entity-model and had to be submitted to the Architecture repo for discussion.

The author complied and now, unfortunately, it just languishes there because no one has found the time to provide commentary (this definitely falls into the category of “Well that sucks”).

1 Like

Nice! Thanks for the link. I was trying to find that PR a few days ago and overlooked that one. I’m excited about the mode feature. It makes sense and really improves the utility of scripts by making behaviors both flexible and predictable. I don’t know if I’ll be able to use the queue mode for my hybrid queue/clobber behavior, but it will certainly be useful for other automation cases.

Very interesting on the TTS queue feature.

I just submitted it yesterday. :wink: Although it relies on a few PRs that came before that actually implement the “guts” of the scripting language.

Each “call” can pass in its own unique set of parameters/variables, so although it will queue up “runs” of the same script, what each “run” does will depend on its own unique set of input variables. And, you can always call script.turn_off on the script which would stop the current “run” and cancel all queued up “runs.”

Interesting tip on script.turn_off to cancel currently queued runs of the script. That could be useful. I may still need to create a custom queue mechanism since I want to conditionally clobber only the TTS messages from the same original triggering entity, but enqueue TTS from differing sources. I also want the chime to only play at the start of the series if there are any queued.

The general idea is that a particular source trigger clobbers its own TTS regardless of queue order but a different source trigger must wait its turn. Also desirable to skip redundant chime sounds.

I can probably get by with just queuing all the TTS in one long sequence and let each one play the specified chime. That would certainly be an improvement over the current last-one-wins clobbering.

I’m probably overcomplicating it, but here is a complex example:

  • Door1 opens
  • Door2 opens while Door1 open chime+TTS is playing
  • Door1 closes while Door1 open chime+TTS is still playing
  • Result:
    • Door1 open chime plays fully, but Door1 open TTS aborts and plays Door 1 close TTS, then Door 2 open TTS plays after all the Door1 notifications play
  • Notes:
    • Door1 close TTS was played ahead of Door2 open TTS because Door1 was already playing a TTS that got superseded by closing the door
    • Initial chime from Door1 opening plays fully even if Door1 closes before the chime finishes playing
    • the chime only played at the very beginning even though each queued chime+TTS defined a chime sound, although if the chime sound was going to be different it should still play at the start of that particular TTS; it may make sense to sort the sets together by which chime they specify

I assume this is all done using built-in functionality? Because I don’t see any link to any custom component/code to use.

I’ve never done anything with custom events so I don’t really know how they are implemented.

Yes, everything I used is built-in.

Custom events are easy. You simply use whatever custom name you want. There isn’t any special preliminary config for it.

# In your first automation you raise the event:
action: 
  - event: custom_name
    event_data:
     data1: value1

# In your receiving automation you listen for the event:   
trigger:
  - platform: event
    event_type: custom_name

Very interesting, thanks! Seems we all come up with similar ideas (and wishes). Love your idea of using events—I also ran into the “script already running” trap quite a few times.

I’m running TTS extensively, with multiple Raspberry Pi Zero operated “talk/alert boxes”, all synchronized via Logitech Media Server. Using PicoTTS, it was easy adding “play” commands in my talk scripts that include several different alert sounds at the start of the message (and then play it all as one WAV file to the media player). This turned out much more reliable than sequencing media player calls and relying on delays or checking media player states (very slow in HA).

I also have a language selection in lovelace (de-DE + en-GB currently), automatic day/night volume level automation, and I use a mixture of pre-recorded MP3s (like for my 6/10/12 minute kitchen timers or when servers become unreachable) and TTS messages (like a morning briefing with weather conditions, important birthdays and calendar events for the day, or a [nerdy, I know] random fortune cookie reader).

The scripts look quite weird by now, relying on a bunch of variables (I use the var component), inputs and lots of sensor data.

I really need to look into this media player enqueue thingy, I guess. Currently, if a message comes in (and there are lots of them!) and another is playing already, it will simply override the playing message, which is sometimes ok (for time-critical alerts like “xy is calling” or “someone is at the door”) and sometimes not good (for informational messages like “phone is now fully charged”).

I’d love to see the TTS enque feature become reality!

If anyone’s interested (apart from my quite custom setup), here’s my audio_alert and say script (these two are the least complicated of the bunch):

# Audio Alert
# Plays audio file given by "audiofile" dummy variable to media player.
# If "audiofile" is empty/undefined, plays "cantdothat.mp3".
# If no "entity_id" is given, plays to "media_player.signalpi1"
audio_alert:
  alias: 'Audio Alert'
  description: 'Send an audio alert to media player'
  fields:
    entity_id:
      description: 'Entity Id of the media player to use'
      example: 'media_player.signalpi1'
    audiofile:
      description: 'Name of the audio file to play, including extension'
      example: 'warning.mp3'
    language:
      description: 'Language for output (subfolder, can be either like "de" or "de-DE", only first 2 chars used)'
      example: 'de-DE'
  sequence:
    # only alert if "Audio Alerts" is on (or it's the "audio off" message)
    - condition: or
      conditions:
        - condition: state
          entity_id: 'input_boolean.audio_alerts'
          state: 'on'
        - condition: template
          value_template: "{{ audiofile == 'audio-alerts-off.mp3' }}"
    # we might be playing another message already, so wait until finished (or max. 30s)
#    - wait_template: "{{ states(entity_id|default('media_player.signalpi1',true)) in ['idle','paused','off'] }}"
#    - wait_template: "{{ not is_state(entity_id|default('media_player.signalpi1',true), 'playing') }}"
#      timeout: '00:00:30'
    # play message to player (or "signalpi1") (which is synchronised with others in LMS)
    - service_template: media_player.play_media
      data_template:
        entity_id: "{{ entity_id|default(states('var.audio_alerts_media_player'),true) }}"
        media_content_type: music
        media_content_id: "{{ states('var.audio_alerts_base_url') }}/{{ (language|default(states('input_select.audio_language'),true))[0:2] }}/{{ audiofile|default(['cantdothat.mp3','cantdothat-2.mp3']|random, true) }}"
# Say
# Plays audio file given by "audiofile" variable to media player given by "entity_id"
# Then plays TTS message to same player
say:
  alias: 'Say'
  description: 'Speak a message via TTS'
  fields:
    entity_id:
      description: 'Entity Id of the media player to use'
      example: 'media_player.signalpi1'
    audiofile:
      description: 'Name of introductory audio file to play. MUST be WAV, pcm-s16le, mono, 16000 Hz.'
      example: 'picotts-beep.wav'
    message:
      description: 'The text message to speak (can be a template)'
      example: "It's {{ temperature }} degrees outside."
    language:
      description: 'The language to speak in (defaults to tts: setting)'
      example: 'en-GB'
  sequence:
    # only alert if "Audio Alerts" is on
    - condition: state
      entity_id: 'input_boolean.audio_alerts'
      state: 'on'
    # play text message
    - service: tts.picotts_say
      data_template:
        entity_id: "{{ entity_id|default(states('var.audio_alerts_media_player'),true) }}"
        language: "{{ language | default(states('input_select.audio_language'), true) }}"
        message: >
          {% set language = language | default(states('input_select.audio_language'), true) %}
          {% set lang = language[0:2] %}
          {% if audiofile == '' %}
          {% elif audiofile %}
            <play file="{{ states('var.audio_alerts_base_path') }}/{{ lang }}/{{ audiofile }}"/>
          {% else %}
            <play file="{{ states('var.audio_alerts_base_path') }}/{{ lang }}/picotts-beep.wav"/>
          {% endif %}
          <volume level="60">{{ message }}</volume>

And here are two variations on how an incoming phone call could be announced using the above:

  # Audible alert on incoming phone calls
  - alias: Incoming call
    trigger:
      platform: state
      entity_id: sensor.phone
      to: 'ringing'
    action:
    - service: script.audio_alert
      data:
        audiofile: "phone-incoming.mp3"
  # Audible alert on incoming phone calls
  - alias: Incoming call
    trigger:
      platform: state
      entity_id: sensor.phone
      to: 'ringing'
    action:
    - service: script.say
      data_template:
        language: 'de-DE'
        audiofile: "picotts-call-{{ (language|default(states('input_select.audio_language'),true))[0:2] }}.wav"
        message: "{{ state_attr('sensor.phone', 'from_name') }}"

(The reason I’ve explicitly set language to de-DE here is that most of my contacts here in Germany have German names, and it sounds horrible if a TTS engine tries to speak them in English. picotts-call-en.wav contains the ring sound plus the text “Incoming call from:”, and picotts-call-de.wav the ring sound and the text “Anruf von:”. The caller’s name is then added as spoken by the German TTS voice.)

My folders for the alert audio files are separated by language, and inside the config/www folder, because these must be reachable by an external media player:

www/
  alerts/
    de/
      cantdothat.mp3
      cantdothat-2.mp3
      phone-incoming.mp3
      picotts-beep.wav
      picotts-call-de.wav
      picotts-call-en.wav
    en/
      cantdothat.mp3
      cantdothat-2.mp3
      phone-incoming.mp3
      picotts-beep.wav
      picotts-call-de.wav
      picotts-call-en.wav
1 Like

Your setup is very interesting! Thanks for sharing it :slight_smile:

I’m going to go look at picotts to see what that is about. Having the TTS say command concatenate the audio file and the message together is a nice feature.

We all learn together, and from each other! :slight_smile:

Yep, PicoTTS is unfortunately really underrated. It’s based on the original SVOX we had in Android 2 or so. Unfortunately, it’s almost impossible to get voices nowadays. :frowning: Like always, the good technology gets bought out and vanished in dark drawers. Sigh.

I mainly chose it because it’s totally cloud-independent. Can’t compare with Alexa’s voice, but I don’t need internet/cloud to have it work. Even on a Raspi Zero.

Here’s more of the great features: The SVOX manual. Great read.

N.B.: For using “play”, the WAV files have to be the same format as those PicoTTS generates, i.e. RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz. I just built a few different “start sounds” (like a general beep, positive and negative acknowledge, phone ring, doorbell and the like), named all of these “picotts-something.wav” and put them in the alerts/language folders. Kind of systematic approach because I want to be able to distinguish what’s what just by the first sound. :wink:

Because WAVs can’t easily be replaygain’ed, I usually set the volume to 60%, otherwise PicoTTS tends to generate sound files that can clip.

1 Like

Talking about random: I just saw I already use events in order not to block an automation with long delays, but be able to re-trigger itself during the day.

So want another one? Here you go! (Warning: This runs on Home Assistant Core, it might not work with the fancy docker or Hass-Whatever [now was it “OS” or “All-In-One”???] installs. This requires that you have some fortunes installed. Sorry Windows-users. [Are there any?])

Define a sensor for each language (English and German, here):

  # Fortune cookie, only short ones (max. 200 chars)
  # requires fortune-mod, fortunes to be installed on the HA machine
  - platform: command_line
    name: 'Fortune EN'
    command: '/usr/games/fortune -s -a -e drugs food fortunes linuxcookie linux magic medicine paradoxum people science startrek wisdom zippy'
    # scan interval 365 days so it doesn't interfere
    scan_interval: 31536000
    value_template: "{{ value }}"

  # Selection of German fortune cookies, only short ones (max. 200 chars)
  # requires fortune-mod, fortunes-de to be installed on the HA machine
  - platform: command_line
    name: 'Fortune DE'
    command: '/usr/games/fortune -s -a -e computer elefanten letzteworte lieberals mathematiker ms murphy quiz regeln sprueche tips unfug wusstensie'
    # scan interval 365 days so it doesn't interfere
    scan_interval: 31536000
    value_template: "{{ value }}"

Go automate it using above script.say:

  # Random speaker
  - alias: Random speaker
    trigger:
      # kickoff trigger in the morning
      - platform: template
        value_template: "{{ states('sensor.time') == (state_attr('input_datetime.aufstehen', 'timestamp') | int | timestamp_custom('%H:%M', False)) }}"
      # event trigger so we can re-trigger ourself
      - platform: event
        event_type: event_random_speaker
    condition:
      # only talk between wakeup and bed times
      - condition: template
        value_template: "{{ states('sensor.time') >= (state_attr('input_datetime.aufstehen', 'timestamp') | int | timestamp_custom('%H:%M', False)) }}"
      - condition: template
        value_template: "{{ states('sensor.time') < (state_attr('input_datetime.bettzeit', 'timestamp') | int | timestamp_custom('%H:%M', False)) }}"
    action:
      # get a new fortune
      - service: homeassistant.update_entity
        data_template:
          entity_id: "sensor.fortune_{{states('input_select.audio_language')[0:2]}}"
      # speak the wisdom
      - service: script.say
        data_template:
          language: "{{ states('input_select.audio_language') }}"
          audiofile: 'picotts-gong.wav'
          message: >
            {% set language = language | default(states('input_select.audio_language'), true) %}
            {% set lang = language[0:2] %}
            {% if lang == 'de' %}
              <speed level="90">
              <pitch level="100">Der Kaun-seler meint:<break time="666ms"/>
              {{ states('sensor.fortune_de') | replace('F:','Frage:') | replace('A:','Antwort:') | regex_replace(find='\.\.\.\s+\.\.\.', replace=',', ignorecase=False) | regex_replace(find='([\s]*--.*$)+', replace='', ignorecase=False) | replace('--','<break time="500ms"/>') }}
              </pitch></speed>
            {% else %}
              <speed level="90">
              <pitch level="100">The Counselor sayeth:<break time="666ms"/>
              {{ states('sensor.fortune_en') | replace('Q:','Question:') | replace('A:','Answer:') | regex_replace(find='\.\.\.\s+\.\.\.', replace=',', ignorecase=False) | regex_replace(find='([\s]*--.*$)+', replace='', ignorecase=False) | replace('--','<break time="500ms"/>') }}
              </pitch></speed>
            {% endif %}
      # delay between 5 and 119 minutes
      - delay:
          minutes: "{{ range(5,120) | random }}"
      # fire event to re-trigger this automation
      - event: event_random_speaker

Be warned: If you actually go so far as to try this out, you might be as nerdy as I am! :grin:
Well, at least I have a switch to turn the automation off (in case I have business clients over, hee hee).

I like it. I don’t think I can install the fortune app in hassio unless it can install to the config folder, but maybe this whole thing can be done as an addon? I haven’t delved into making addons yet, but I plan to someday when I have more time.

A friend of mine opted for the Hass.io install, because it seemed much easier for a newbie (it is). I didn’t even succeed getting PicoTTS installed on his system. He will now also revert to a “naked” Raspberry Pi OS + HA Core install. Just because it’s so nice to be able to do what you want, and not be so restricted.

Please let me know if either you succeed running system binaries from somewhere else, or find out how to (easily) create addons!

When I upgraded to a NUC last week from my Pi3b+ I contemplated moving away from the fully managed Hass.io, but I didn’t want to give up the snapshots feature. It’s a self-contained device, so I think Hass.io still makes sense. It also forces me to find ways to implement things that works for anyone using Hass.io :slight_smile:

I agree it’s great for ease-of-use, snapshots and all the nice add-ons, at the same time easier to support. Whenever you want to break out of the box, it starts getting over-complicated. Then again, they promised never to take away “Core”, i.e. Python venv, so maybe we can have best of both worlds. :slight_smile:

Anyway, I’m happy to see someone trying to find out about writing add-ons! Would be nice if there was a “demo” add-on which would show how to approach things, because—like you—I assume the majority of people will eventually be using what’s easiest to install.

Getting an official PicoTTS add-on for Hass.io users that wish to stay “cloud-free” would be a good first step, I think.

I believe Hass.io uses core as well. It’s the supervisor and OS that make it Hass.io. It’s all docker containers with some handy management. The read-only filesystem is the primary impediment to installing random stuff on the host, and installing stuff in the homeassistant container (core) probably gets lost when you upgrade versions. That’s where addons come in handy. Separate container and fully backed up in the snapshots. Plus the UI to find addons. I’m also liking HACS for managing custom integrations, custom Lovelace features and custom scripts.