Text-to-Speech Voice notifications; AWS Lambda & Polly

Just a few days ago, Amazon released “Polly”, which is a decent sounding text to speech platform, and very cheap (for those of you not familiar with AWS, it is a collection of computing services that you only pay for exactly what you use). After a day of fiddling and tinkering, I have put together a Lambda (which allows a person to run an application without a server and only pay for the resources used) application which does the following:

  1. Receives a request via AWS API Gateway
  2. Reads the text to be spoken
  3. Determines if the text has already been converted to audio
  4. If so:
    4.1. Locates the audio on the S3 bucket you specify, where it would have saved it already
    4.2. Uses the HASS API to play the audio on a media player you specify
  5. If not:
    5.1. Has Polly speak the voice, and saves it to an S3 Bucket
    5.2 Uses the HASS API to play the audio on a media player you specify

Here’s the code. Note, it includes the aws-sdk node.js module, because the pre-installed one in Lambda is not updated to have the Polly API. In the future, this will not have to be included, and the stock one can be used - saving some utilization time.

As for hass configs:

notify:
  - name: "AWS Polly"
    platform: command_line
    command: "xargs --replace=@ curl -X GET -H 'Content-Type: application/json' -H 'x-api-key: YOUR_API_KEY' -G https://YOUR_API_ENDPOINT.us-east-1.amazonaws.com/prod --data-urlencode 'text=@'"

automation:
- alias: 'Welcome home'
  hide_entity: true
  trigger:
    platform: state
    entity_id: group.all_devices
    state: 'home'
  action:
      service: notify.AWS_Polly
      data:
        message: "Hi, this is your house speaking. Welcome home!"

I know the details are sparse so far, but I’ll work on putting together something with a little more documentation. I use this to have notifications spoken over my Chromecast Audio devices, and my Google Home. I have them all in a “home group”, so all speakers play the notifications. From the trigger to the audio starting is about 4 seconds.

It is worth noting I also have a site-to-site VPN between my house and AWS, and have the Lambda code running in my VPC network context - so this all stays “internal” to the house, without having to expose the HASS API publicly.

Finally, if you don’t want your whole-house music or the movie you’re watching to stop because of a notification, you can always add in some conditions to the automation rule:

  condition:
    condition: or
    conditions:
      - condition: state
        entity_id: media_player.home_group
        state: 'idle'
      - condition: state
        entity_id: media_player.home_group
        state: 'off'
7 Likes

This is excellent!

Right now, setup and configuration are rather complicated. Are you planning to make it easier to set up and to integrate it as an official home assistant notify component?

Yes, setup is definitely pretty complicated. This may be able to be ported to a python script and use the boto3 library to allow people to run this a little more locally and just use Polly for voice, and local system for storage and streaming to a media player.

No promises on if/when I can get to that - but the code is on GitHub if anybody wants to give it a shot. :slight_smile:

Well done @dcnoren! Any ideas on whether or not an Amazon Echo/dot/etc could be a media player target for TTS playback?

I unfortunately do not have an Echo or similar to try with. If you have one, you could try creating an automation rule to play a publicly-available mp3 file on the internet to the Echo or Dot, and see if it works. If it does, then this would work.

Another option to investigate is creating a skill directly with Alexa and utilizing that.