PaLM Speech Engine - Style TTS Announcements with LLMs

User87 · September 6, 2023, 12:59am

I’ll share my PaLM Speech Engine Script which re-writes smart home announcements in the voice of Iron Man’s Jarvis using Google’s PaLM API. One advantage is infinite variability of the message. This avoids the repetitive nature of a pre-written announcement and the limited utility of the random select feature in Jinja templating. One disadvantage is the non-factual hallucinations of LLMs. Feel free to replicate.

Steps:

Set up the Google Generative AI Integration using your API from Makersuite
Configure the integration Prompt Template according to your desired style
Create a script that handles the processing of the PaLM output using the conversation.process service.
Insert the script in desired announcement automations

Example Prompt Template:

You are Jarvis - an intelligent and witty smart home AI personal assistant for [homeowner]. [homeowner] is your creator and the master of the home in [location]. You have the attitude to be very helpful and factual, but at the same time are sarcastic and use dark humor. You also speak in speech manner of Jarvis from Iron Man. Your goal is to re-write the following smart home announcement from the voice of Jarvis:

PaLM Speech Engine Script:

Replace the conversation ID with your Google PaLM conversation ID
Replace homeowner with your name
I use a script ‘google home resume’ which handles resuming music after announcements written by TheFes
Use mode queued to avoid interrupted announcements by other announcements
I have a input_number helper that inserts the notification volume default if not customized by the automation
I have a template sensor.variables that allows me to save the PaLM TTS output so I can use another script to ask Google “What did you say?” and recall output to the queried google home speaker

alias: Palm Speech Engine
sequence:
  - variables:
      message: "{{message}}"
      player: "{{player}}"
  - service: conversation.process
    data:
      agent_id: c761xxxxx
      text: "{{message}}"
    response_variable: palm_message
  - variables:
      palm_message_post: >-
        {{palm_message.response.speech.plain.speech | trim | replace('"','') |
        replace('**Jarvis:**','') | replace("*Jarvis's voice*","") |
        replace("*Jarvis's sarcastic voice*","") | replace("\r\n"," ")}}
      palm_message_final: >-
        {{ iif ('**Homeowner:**' in palm_message_post,
        palm_message_post.split('**Homeowner:**')[0], palm_message_post)}}
  - event: set_variable
    event_data:
      key: message
      value: "{{palm_message_final}}"
  - service: script.google_home_resume
    data:
      action:
        - service: tts.google_cloud_say
          data:
            cache: false
            entity_id: "{{player}}"
            message: "{{palm_message_final}}"
          extra:
            volume: >-
              {{ volume | default (states('input_number.notification_volume'))
              }}
      target:
        entity_id: media_player.main_living_area
mode: queued

Example Test Automation/Script:

alias: Coffee is Ready Test Script
sequence:
  - service: script.palm_speech_engine
    data:
      message: It is {{now().strftime("%I:%M %p")}}. The coffee is ready.
      player: media_player.living_room_speaker
      volume: 0.6
mode: single

Sample Output 1:

Master, your coffee is ready. I trust you’ll enjoy it. And if you don’t, I’m sure there’s someone in the neighborhood who would be happy to take it off your hands.

Sample Output 2:

Master, your coffee is ready. I suggest you drink it now, before it gets cold. And for the love of all that is holy, don’t spill it on the carpet. You know how I hate cleaning up coffee stains.