Snips sound output using amazon polly tts

So this is bare bones at this point, but i really disliked the Snips pico tts voice. I wanted to use Amazon Polly since it has the broadest range of voices. There are a number of different ways to do this of course, but this is mine. You would need to have an amazon profile setup, I created a new one using IAM and configured it with the proper key and permissions for just polly.

Setup:
PC with ubuntu and HA
Raspberry PI 3 with Snips installed via apt and a Jabra 410 USB speaker.

shell_commands/jarvis_says.yaml

jarvis_says: bash /home/homeassistant/.homeassistant/shell_command/jarvis_says.sh “”{{ speech }}“”

shell_commands/jarvis_says.sh

mkdir -pv /tmp/sounds/
if [ ! -f “/tmp/sounds/$1.wav” ]; then
aws --profile jarvis polly synthesize-speech --output-format mp3 --voice-id Geraint --text “$1” “/tmp/sounds/$1.mp3”
mpg123 -w “/tmp/sounds/$1.wav” “/tmp/sounds/$1.mp3”
fi
mosquitto_pub -h <MY_MQTT_IP> -t ‘hermes/audioServer/default/playBytes/0049a91e-8449-4398-9752-07c0e1858234’ -f “/tmp/sounds/$1.wav”

Note that the string after playBytes is required but arbitrary for the most part. You could use it to wait for a finished message but isn’t necessary for us.

And then called in scripts and such like this:

- service: shell_command.jarvis_says
  data:
    speech: '"OK, the garage door is opening"'

You can change the voice in the aws command of course. I am caching the sounds under /tmp/sounds to avoid calling the service every time if we have the text converted already. You can of course play any arbitrary wav this way through the snips mqtt broker.

Took a fair bit to get snips configured and working properly so let me know if you have any questions.

Note, I also had to tweak my /etc/asound.conf to get everything to play properly.

pcm.!default {
  type asym
  playback.pcm {
    type plug
    slave {
      pcm "hw:1,0"
      rate 48000
      format "S16_LE"
      channels 2
    }
  }
  capture.pcm {
    type plug
    slave.pcm "hw:1,0"
  }
}

ctl.!default {
  type hw
  card 1
}

So I did some more playing around with dialogues and getting tts to work with them. I basically ended up wiht an mqtt client that listens for hermes/tts/say and processes them.

I have snips on its own pi and to avoid excess network traffic so I have this in my mosquitto.conf to bridge them

topic hermes/nlu/intentParsed out
topic hermes/tts/say out
topic hermes/audioServer/# in
topic hermes/tts/sayFinished in

The last two are just so I can publish to my HA MQTT and have it picked up by snips. You have to stop and disable the snips-tts service so it doesn’t try to handle the mqtt messages.
Then i have a python script here

And now any conversations I have with snips use the amazon polly tts I have configured. Still a work in progress but neat to see how capable it really is.

A short video here with a back and forth. No response at the end because I haven’t set up intents in HA for this yet.