So this is bare bones at this point, but i really disliked the Snips pico tts voice. I wanted to use Amazon Polly since it has the broadest range of voices. There are a number of different ways to do this of course, but this is mine. You would need to have an amazon profile setup, I created a new one using IAM and configured it with the proper key and permissions for just polly.
Setup:
PC with ubuntu and HA
Raspberry PI 3 with Snips installed via apt and a Jabra 410 USB speaker.
shell_commands/jarvis_says.yaml
jarvis_says: bash /home/homeassistant/.homeassistant/shell_command/jarvis_says.sh “”{{ speech }}“”
shell_commands/jarvis_says.sh
mkdir -pv /tmp/sounds/
if [ ! -f “/tmp/sounds/$1.wav” ]; then
aws --profile jarvis polly synthesize-speech --output-format mp3 --voice-id Geraint --text “$1” “/tmp/sounds/$1.mp3”
mpg123 -w “/tmp/sounds/$1.wav” “/tmp/sounds/$1.mp3”
fi
mosquitto_pub -h <MY_MQTT_IP> -t ‘hermes/audioServer/default/playBytes/0049a91e-8449-4398-9752-07c0e1858234’ -f “/tmp/sounds/$1.wav”
Note that the string after playBytes is required but arbitrary for the most part. You could use it to wait for a finished message but isn’t necessary for us.
And then called in scripts and such like this:
- service: shell_command.jarvis_says data: speech: '"OK, the garage door is opening"'
You can change the voice in the aws command of course. I am caching the sounds under /tmp/sounds to avoid calling the service every time if we have the text converted already. You can of course play any arbitrary wav this way through the snips mqtt broker.
Took a fair bit to get snips configured and working properly so let me know if you have any questions.