Reusing Google nest speakers for response, but with Assist microphones, TTS

Hi all, so on my quest to find ways to use Assist more and more to replace Google Assistant at home, I’ve been playing with a couple of ESP32 satellites, and a conference speaker at home. I can get these solutions to work mostly, but I can’t help thinking about reusing the existing Google nests I have around the house already (I have at least that I use currently for TTS, and general other stuff like playing music etc). I’m mostly thinking about this due to two reasons:

  • E-waste - Don’t like the idea of eventually making the Google nest redundant, and would like to reuse them
  • The sound quality of the Nest is actually really decent for the money

So I’ve been investigating making simple ESP32’s with just a mic, which would accept the requests I give Assist, but then use the Google Nests to relay the response.

As a test, I got it working on a custom response, as shown below.

# Solar & EV
intent_script:
  Battery:
    speech:
      text: "The house battery has a charge of {{states('sensor.battery_state_of_capacity') | round(0)}} percent"
    action:
      service: "tts.google_translate_say"
      data:
        message: The house battery has a charge of "{{states('sensor.battery_state_of_capacity') | round(0)}}"percent
      target:
        entity_id: >
          {% if is_state('binary_sensor.living_room_sensor_assist_in_progress', 'on') %}
            media_player.kitchen_speaker
          {% else %}
            media_player.garage_nest_mini
          {% endif %}

The above is set up to work with two ESP32’s with mics (one in the living room, and another in the garage). And even though this works, it’s of course limited to just those two Nests. I want to of course expand it to the other Nests to, and then somehow expand it to all the other responses that can come from Assist (both the default, and the custom ones).

But is this even realistic? Is there an easier way I could achieve this? So far, I’ve created around 30 custom responses, plus all the default ones, plus adding all the other nests to the code, it’s a lot of work! So I can’t help but think that there’s an easier way.

If anyone has any input or thoughts, it would be greatly appreciated if you’d drop it below :slight_smile:

I have had something working with Rhasspy for over 6 months using the google mini speakers as the primary response for output and pi based satellites for input (keeping everything local). Current issues with the basic satellite is being able to be able to identify the satellite. This seems to be possible with your esp setup due I am guessing to esp home. I would suggest if you are using logic like this to make it work that you move the logic out of the intent script and into a sensor and have it return the speaker to use into the entity id via that sensor. This makes it reusable and if you add, remove or change speakers later you only have to change it in the sensor template.

While not a great formatted example and I should really rewrite it since I wrote it when figuring out HA this should give an idea for one type of logic. Note I was using mqtt but will apply to what you are trying to do

  - platform: template
    sensors:
      choose_speaker_cmd:
        friendly_name: "Choose Notification Speaker based on Command"
        
        value_template: >-
          {% if states.sensor.ai_last_msg_master_mqtt.state == 'office' %}
            media_player.nest_lab
          {% elif states.sensor.ai_last_msg_master_mqtt.state == 'kitchen' %}
            media_player.nest_lab
          {% elif states.sensor.ai_last_msg_master_mqtt.state == 'living_room' %}
            media_player.nest_lr
          {% elif states.sensor.ai_last_msg_master_mqtt.state == 'back_office' %}
            media_player.back_office_speaker
          {% elif states.sensor.ai_last_msg_master_mqtt.state == 'dinning_room' %}
            media_player.back_office_speaker
          {% elif states.sensor.ai_last_msg_master_mqtt.state == 'master_bedroom' %}
            media_player.nest_mbr
          {% else %}
            media_player.mass_nest_un
          {% endif %}   

I should also add what would be a very nice feature for them to build into the satellite is if it could redirect output to a specific HA media entity.

1 Like

Many thanks for the info, great to see I’m not the only one that’s thinking this way (just seems a shame not to use existing media players, if one has them already).

And yes, makes perfect sense to move the logic outside of the intent script, as I want to keep those as simple as possible, and also ensure that both the default, and custom sentences can utilize the media players.

I’m trying to figure out, how to connect the logic to the intents, and am sure if I understand how to use your example code. Is it possible for you to somehow explain in a bit more detail, how I could incorporate this? How can I use the logic for speaker selection, outside of the intent scripts (for both default, and custom sentences)?

And yes, would be great to have an option for redirecting output to another/specific media player :slight_smile:

so my sensor in this situation would be sensor.choose_speaker_cmd


# Solar & EV
intent_script:
  Battery:
    speech:
      text: "The house battery has a charge of {{states('sensor.battery_state_of_capacity') | round(0)}} percent"
    action:
      service: "tts.google_translate_say"
      data:
        message: The house battery has a charge of "{{states('sensor.battery_state_of_capacity') | round(0)}}"percent
      target:
        entity_id: >
          {{ states('sensor.choose_speaker_cmd') }}

This would let you use the sensor as the speaker to default to when responding leaving the logic to the sensor.

1 Like

OK great, thank you. I’ll give it a try.

But how does this get incorporated into the default intent_scripts (HaasOn, HaasOff etc)?

No idea. I have only ever written my own scripts as the inbuilt voice is only new.

1 Like

Haha OK fair enough, you’re right they are brand new :slight_smile:

But have you got any ideas, of how you might get the default responses to come through your Nest speakers? Would love to hear them if so :slight_smile:

I had hoped as a work around that the extensions they gave for extending might have worked. But looking at it further it looks to be modifying only the intents

eg

# Example on_off.yaml entry
language: "en"
intents:
  HassTurnOn:
    data:
      - sentences:
          - "engage [the] {name}"
      - sentences:
          - "engage [all] lights in [the] {area}"
        slots:
          name: "all"
          domain: "light"
  HassTurnOff:
    data:
      - sentences:
          - "disengage [the] {name}"
      - sentences:
          - "disengage [all] lights in [the] {area}"
        slots:
          name: "all"
          domain: "light"

so no way to extend on the action unfortunately that I can see with this. Wishful thinking was along the lines of intent_scripts with

# Example on_off.yaml entry
language: "en"
intents:
  HassTurnOn:
    data:
      - sentences:
          - "engage [the] {name}"
      - sentences:
          - "engage [all] lights in [the] {area}"
        slots:
          name: "all"
          domain: "light"
    action:
      - service: script.ai_security_minimal
        data:
          who: "{{ states('sensor.choose_speaker') }}"
          message: >- 
               light on etc

but it was almost immediately obvious that it would not work as it is intent not intent_script. If there is documentation anyone can point me to on how the intent is passed through, I would be curious particulary if it can trigger additional automations and what information is made available along with the intent itself.

1 Like

Just wanted to say that so far, this seems to work great!

For my sensor.choose_speaker_cmd, I made a template sensor that swaps between media players, based on the mic that’s active. Here’s my code, if anyone is interested.

# Choose media player
  - platform: template
    sensors:
      choose_speaker_cmd:
        friendly_name: "Choose Notification Speaker based on Command"
        value_template: >-
          {% if is_state('binary_sensor.living_room_sensor_assist_in_progress', 'on') %}
            media_player.living_room_speaker
          {% elif is_state('binary_sensor.bedroom_sensor_assist_in_progress', 'on') %}
            media_player.bedroom_speaker
          {% else %}
            off
          {% endif %}

Thanks for your help @noinformationavailab :slight_smile:

1 Like

I’m trying to get this to work as well. Did you get it to work? I’m not entirely following the discussion here. How do you detect that the assist is done and there is a response to pipe through tts and play in the chosen speaker?

Hey Pontus, The code I posted above works great… But only for custom sentences. Right now, I’m trying to figure out how I can create a TTS response to all default sentences, and the error message ‘Sorry, I couldn’t understand that’.

Currently I have two ESP32 satellites, and a USB conference speaker plugged directly into HA. All is working pretty well (USB conference is the easiest to use, as it’s using the Assist Microphone add-in, which means it will give responses for everything.

If anyone has a way of getting the error message as an event or similar out of the ESP32 satellites, so an automation can be triggered, or something, then please let me know, as I’d really love to just set up mic satellites, and reuse my Google Nests.

Just to recap, here’s an example of a custom response:

# Solar & EV
intent_script:
  Battery:
    speech:
      text: "The house battery has a charge of {{states('sensor.battery_state_of_capacity') | round(0)}} percent"
    action:
      service: "tts.google_translate_say"
      data:
        message: The house battery has a charge of "{{states('sensor.battery_state_of_capacity') | round(0)}}"percent
      target:
        entity_id: >
          {{ states('sensor.choose_speaker_cmd') }}

This template is in my config file:

# Choose media player
  - platform: template
    sensors:
      choose_speaker_cmd:
        friendly_name: "Choose Notification Speaker based on Command"
        value_template: >-
          {% if is_state('binary_sensor.living_room_sensor_assist_in_progress', 'on') %}
            media_player.living_room_speaker
          {% else %}
            media_player.garage_nest_mini
          {% endif %}

The template sensor ensures the correct Nest speaker is selected for the corresponding satellite.

For anyone interested, I’ve finally found a way (thanks to the folks over on Discord) to use your existing media players for the response, in a much easier way!

Firstly, you don’t need to add any of the code I’ve posted above, as all that’s needed is the correct code on your ESP device.

Firstly, add the following to your ESPhome config for your device, where you need to change the media player to the one you want to use, with the ESP device being modified:

  on_tts_end:
  - homeassistant.service:
      service: media_player.play_media
      data:
        entity_id: media_player.living_room_speaker #add your media player here
        media_content_id: !lambda 'return x;'
        media_content_type: music
        announce: "true"

It needs to be added after the voice_assistant: section (see below):

Save and install the code.

Then go to Settings > Devices & Services > ESPhome and look for the ESP device you just updated. Click configure next to the updated device, and enable ‘Allow the device to make Home Assistant service calls’.

And that’s it! All responses (yes, all default, errors and custom responses) should now come through the media player added to your ESP config :slight_smile:

8 Likes

I wonder if there is anything similar for non esp satelites?

Thanks! I’m getting close!

I’m getting an error “Media:Load failed with code 104(MEDIA_SRC_NOT_SUPPORTED) for item 1”. The file it’s trying to play ends in tts.piper.raw which doesn’t seem to work. Any ideas?

This is my first look into the ESP side of things and my atoms finally arrived.
I am looking for the snippet you provided however the yaml I have is significantly different and with limited experience in the ESP side of things I am probably missing something obvious with where the code is imported.

substitutions:
  name: m5stack-atom-echo-a0562c
  friendly_name: Development Echo
packages:
  m5stack.atom-echo-voice-assistant: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main


esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: xyz


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
1 Like

The yaml you have pulls in the code from github as a package (read more about remote/git packages in the docs), so a quick and dirty way would be to copy that code in directly to your config and make your edits there. Mind you, any future updates to the remote package won’t be reflected.

1 Like

Hello @celodnb ,

Thank you very much for this - particularly your triggering the media player service in the on_tts_end section helped me get this working (an example elsewhere used on_tts_start).

I’ve edited this post from my original question, into a solution for disabling the speaker on my m5stack atom echo device from playing. My echo speaks slightly before the media makes it to the media player, and although its speaker isn’t great, it’s still distracting since it’s close enough to the media player device.

I found this post about disabling the echo’s speaker, which accomplishes this by changing the GPIO pin in the speaker specification from 22 to 21.

I implemented this by changing the packages section in my ESPHome configuration file for my atom echo device, from: m5stack.atom-echo-voice-assistant: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main to instead be m5stack.atom-echo-voice-assistant: !include firmware/voiceassistant-echo-nospeaker.yaml. Note the firmware sub-directory is one that I created to hold local copies of files, when I want to test adjustments like these.

I likely could have overridden the speaker GPIO pin while still sourcing the upstream file from GitHub, but for now I used the “local include” method to know exactly what my final configuration would look like, without potentially making mistakes in my attempts to override configurations.

Thanks again!

Ivan

1 Like