Configuration to create a more "human sounding" Home Assistant

TL;DR – By building some HA scripts and automations, as well as some use of the command line, I’ve created the means for Home Assistant to mimic a more human like verbal response to input/triggers (ie, get HA to sound “spontaneous” with its audio replies vs the normal “statically scripted” responses)

Warning, this is a long post.

Introduction

I first went down the home automation rabbit hole after watching “A Town Called Eureka” and thinking “I want a house like S.A.R.A.H” :slight_smile:

Unfortunately, one the challenges to creating that kind of “AI” is that unless you’ve either got the processing power of Amazon/Google, or are willing to allow them to listen to you all the time, any “challenge/response” type of interaction will always be scripted and somewhat “static” (ie doesn’t change from one trigger to the next). This of course is by design, but it doesn’t make for a very “intelligent” (or human) sounding solution when you want HA to provide an audio response/output.

To take a real world example. My wife likes to be brought a cup of tea in the morning. If asking me to make the tea, she could obviously ask this in a number of different ways.

Can I have a cup of tea please.
Please can I have a cup of tea.
Could I have a cup of tea please.
Can you make me a cup of tea please.
Please can you make me a cup of tea.

etc, etc.

If using HA on the other hand, you would normally set this up to ask for the cup of tea in just one way. The purpose of this particular configuration is to change that behaviour to allow HA to make the request in a different way each time a trigger occurs.

I’ve detailed all the steps below, but at a high level this is achieved by picking a random entry from a list of pre-defined choices within a text file, then assigning the chosen text to a sensor so HA can output the contents of the sensor using TTS.

The end result is that HA can then give a different audio response to an automation each and every time it is triggered.

Prerequisites

  • SSH enabled and configured within HA
  • Ability to access to the command line (via the SSH terminal, Putty etc)

Setup Instructions

Step 1 - Create a text file with the variety of situational responses you wish to pick from

Create a text file with your editor of choice, containing all the audio prompt options you want HA to be able to choose from, then save to an appropriate location within your HA install.

I happen to put the file in the same directory as the TTS script it will be called from, just to make it easy to track, but that’s personal preference and dependent on how you organise your HA folder structure.

Note : The list can be as long or as short as you like, and will largely be dictated by the sentence structure. Longer answers offer the opportunity for more variations in the wording than shorter ones, so will have more entries in the file.

Step2 - Create a target directory for the responses, and add to the “allowed external directories” in the config

# Configuration.yaml
  allowlist_external_dirs:
  - "/output_path"

This step is required because the output won’t display in the target sensor unless it is permitted in the allowed external directories.

Note: Ensure you correctly identify and use the full path (use pwd if needed), and not just the subdirectory within the homeassistant folder.

Step 3 - Create a Bash shell script to pick a random entry from the text file in step #1, and save the output to a separate text file located in the folder from step #2

#!/bin/sh
cat /input_path/input_choices.txt | shuf -n1 > /output_path/output_choice.txt

Again, create the file with your editor of choice, then save to a suitable location. In my case, I save my bash scripts in a subdirectory under my HA scripts path but again it’s down to you as to what works best for your installation.

Note: You’ll need to remember the filename and location for step #4

This code makes use of the cat and shuf command to read the entire file, but only outputs one line to the target file. Ensure you change the paths and filenames in the script to suit your input and output locations (from steps 1 and 2).

Note: Make sure you chmod 755 the shell script, and check that it can be run from the command line before proceeding. This will also populate the sensor for when HA restarts after completing the entire setup procedure

Step 4 - Create a shell_command to execute the bash script

# Configuration.yaml
shell_command:
  update_audio_scripts: '/bash_script_pathname/bash_script_name.sh'

Filename and location are from step #3

Step 5 - Create an automation to execute the shell_command

- id: 'shellupdate_001'
  alias: Update Audio scripts
  description: Update TTS audio files once per hour
  trigger:
  - platform: time_pattern
    hours: /1
    minutes: '00'
  action:
  - service: shell_command.update_audio_scripts
  mode: single

This step determines how often HA shuffles the responses. The example above executes the script once per hour, on the hour (because my wife is unlikely to ask for a cup of tea more frequently than that :slight_smile: )

Adjust as needed to increase or decrease the random nature of the responses (particular thanks must go to @CO_4X4 for solving this one for me)

Step 6 - Use the File platform to create the sensor entity used to hold the announcement

This creates the sensor that will hold the final announcement text. Note that it is the same target path from the one chosen in step #3

- platform: file
  name: audio_announcement
  file_path: /output_path/output_choice.txt

Step 7 - Create a script to read the announcement from the sensor and send to the relevant media player

The below code is pulled from the “cup of tea” code setup for my wife. Most of the code is off the shelf from the Sonos TT script elsewhere on the forums, with a minor change so HA recognises what time of day it is when making the announcement (I only have one Sonos device, hence why the media player is just called “sonos”)

The important part of the code is from tts_google_say onwards, and note that the sensor name is the same as assigned from step #6


cup_of_tea:
  alias: 'Cup Of Tea please'
  sequence:
  - service: sonos.snapshot
    data:
      entity_id: media_player.sonos
  - service: sonos.unjoin
    data:
      entity_id: media_player.sonos
  - service: media_player.volume_set
    target:
      entity_id: media_player.sonos
    data:
      volume_level: 0.2
  - service: tts.google_say
    data_template:
      entity_id: media_player.sonos
      message: >
        {% if now().strftime("%H")|int < 12 %}
        Good morning.
        {% elif now().strftime("%H")|int < 18 %}
        Good afternoon.
        {% else %}
        Good evening.
        {% endif %}
        {{ states('sensor.audio_announcement') }}
  - delay:
      seconds: 2
  - wait_template: "{{ is_state('media_player.sonos', ('paused' or 'idle')) }}"
    timeout: "00:00:05"
  - service: sonos.restore
    data:
      entity_id: media_player.sonos

Step 8 - Create the appropriate automation to trigger the audio announcement

This final part is entirely dependent on how you want the script to trigger, you just need add a service call to execute the script.

Using the example above, the automation just needs to call the following

- service: script.cup_of_tea

to issue the request. My wife does that using a button, but any trigger will work. Ultimately, depending on how frequently you’ve set the shell_command to randomise the text, you’ll get a different audio response.

Once you’ve done all of the above, restart HA and you’re all set.

Getting creative

Once the basics are set up, it’s then possible to get quite creative with the announcements by chaining them together. For example, I’ve also created a sensor called “appreciation_suffix” which contains different ways of saying “thank you” (thanks, thank you, cheers, ta, thanks a lot, etc)

I set that up using the same process as above, then just added that sensor to the end of the previous scripted sentence. Using the example in step #7, this changes the template to look as follows

  - service: tts.google_say
    data_template:
      entity_id: media_player.sonos
      message: >
        {% if now().strftime("%H")|int < 12 %}
        Good morning.
        {% elif now().strftime("%H")|int < 18 %}
        Good afternoon.
        {% else %}
        Good evening.
        {% endif %}
        {{ states('sensor.audio_announcement') }}
        {{ states('sensor.appreciation_suffix') }}

One Final Note

I am aware that I could possibly have achieved the above by the use of Templates, however having investigated this option it became apparent that it would probably required a lot more yaml, and could very quickly have got out of control with the input variables (especially for the sections with larger option sets). By going down the route above, once the setup is done, I can adjust the input text easily with no adjustment required for any other parts of the setup. It’s also very easy to expand both inputs and outputs across the board.

Comments and feedback welcome.

4 Likes

An alternative to the first six steps of your instructions is to create a Trigger-based Template Sensor (a recently introducing concept).

template:
  - trigger:
      - platform: time_pattern
        hours: '/1'
      - platform: homeassistant
        event: start
      - platform: event
        event_type: event_template_reloaded
    sensor:
      - name: 'Audio Announcement'
        state: >
          {{ [
            'Can I have a cup of tea please.', 
            'Please can I have a cup of tea.',
            'Could I have a cup of tea please.', 
            'Can you make me a cup of tea please.',
            'Please can you make me a cup of tea.'
             ] | random }}

The resulting sensor is sensor.audio_announcement and can be used in the script you posted in step 7 (in the same way it’s currently used).

The sensor randomly selects a phrase every hour, when Home Assistant starts, and when Template Entities are reloaded.

To add another phrase:

  1. Add it to the list within the template (each phrase is delimited by single-quotes and separated by a comma; the last phrase shouldn’t have a terminating comma).
  2. Execute: Configuration > Server Controls > Reload Template Entities

You can also append more sensors to the same set of triggers. For example, here is how to create sensor.audio_announcement and sensor.appreciation_suffix.

template:
  - trigger:
      - platform: time_pattern
        hours: '/1'
      - platform: homeassistant
        event: start
      - platform: event
        event_type: event_template_reloaded
    sensor:
      - name: 'Audio Announcement'
        state: >
          {{ [
            'Can I have a cup of tea please.', 
            'Please can I have a cup of tea.',
            'Could I have a cup of tea please.', 
            'Can you make me a cup of tea please.',
            'Please can you make me a cup of tea.'
             ] | random }}
      - name: 'Appreciation Suffix'
        state: >
          {{ [
            'thanks', 
            'thank you',
            'cheers', 
            'ta',
            'thanks a lot'
             ] | random }}
1 Like