ESPHome Voice Assistant speech output to Home Assistant Media Player

I believe Alexa devices work differently because the audio has to be sent to Amazon before being sent back down to the Echo (don’t get me started on Echo devices and Amazon’s data hoovering; suffice it to say don’t send anything you don’t want Amazon to aggressively use against you).

Config for atom echo with tts sent to separate media player.

Uses tts.cloud_say so will require home assistant cloud

Automatically sets the volume to 20% for voice responses and returns to previous volume state after response.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Set volume to 20%
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.living_room_speaker
        data_template:
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%

Here’s a config that uses piper instead of tts.cloud_say for local tts
I run piper and faster-whisper via docker on a separate sever. You can simply use the piper addon if your hardware is decent or run as a container for very fast responses. As of 2024.12 my response times are a second or less.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Lower volume to 20%
    - homeassistant.service:
        service: tts.piper_say
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%

For me, using this config the wake word works, but then it just gets stuck in “Assist satellite → Processing”, no errors in the logs. Before adding this to the config, the ATOM Echo was working fine, it only had the bad speaker sound.

I got: “homeassistant.exceptions.ServiceNotFound: Action tts.piper_say not found”
And: ‘M5Stack Atom Echo 23a170’ - No such effect ‘pulse’

I tried using:

     - homeassistant.action:
        service: tts.speak
        data:
          entity_id: tts.piper
          media_player_entity_id: media_player.vlc_telnet
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;

The speaker works, but it just reads out loud “tts_message” literary, not the answer from the LLM. Any idea hot to fix it?

Edit:
Seems like using it like this works perfectly for me:

    - homeassistant.action:
        service: tts.speak
        data:
          entity_id: tts.piper
          media_player_entity_id: media_player.vlc_telnet
          message: !lambda 'return x;'
1 Like

thanks for posting. Where does this go though? in configuration.yaml or somewhere else? And the id, microphone, speaker names - where are these defined?
Thanks a lot in advance!

In the ESP device config file (ESPhome)

Thanks. I did the following -

  1. Installed ESPhome on docher (my HA instance also runs in docker, so no “addons” available)
  2. Configured the Atom Echo with HA firmware
  3. tried tinkering with the config file on ESPhome as shown above
substitutions:
  name: m5stack-atom-echo-0ab278
  friendly_name: M5Stack Atom Echo 0ab278
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key:xxx


voice_assistant:
  # Adjust Mic parameters for better understanding depending on room environment
  noise_suppression_level: 2   # 1-4
  volume_multiplier: 5.0   #1.0-6.0 (higher than 6 will result in major distortion)
  # Don't use Atom's speaker at all
  speaker: !remove     
  # Output response as a TTS to a chosen speaker
  on_tts_start:     
    - homeassistant.service:
        service: lms_tts_notify #this is what I otherwise use, works file via HA
        data:
          entity_id: media_player.black_2
          message: !lambda 'return x;'
  # I want to know when an Atom loses connection to HA, so blink the light fast red
  on_client_disconnected:     
    then:
    - voice_assistant.stop: {}
    - micro_wake_word.stop: {}
    - light.turn_on:
        id: led
        red: 1.0
        green: 0.0
        blue: 0.0
        brightness: 1.0
        effect: Fast Pulse
        state: true

# Expose a restart button to HA so the Atom can be remotely rebooted 
# (can fix a stuck pipeline or unstable wifi connection in multi-AP/mesh environments)  
button:     
- platform: restart
  id: restart_btn
  name: Reboot
  disabled_by_default: false
  icon: mdi:restart-alert
  entity_category: config
  device_class: restart  

# Expose a new switch to HA to indicate timer_ringing AND ability to 
# toggle it back to an off state (acknowledges the timer, same as pressing 
# Atom's front button); automate using this switch
switch:     
- platform: template
  name: Timer Ringing
  optimistic: true
  lambda: |-
    if (id(timer_ringing).state) {
      return true;
    } else {
      return false;
    }
  turn_off_action:
      - switch.turn_off: timer_ringing


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

The problem - I see the following logs in the esphome

NFO ESPHome 2024.12.4
INFO Reading configuration /config/m5stack-atom-echo-0ab278.yaml...
Failed config

esphome: [source /config/m5stack-atom-echo-0ab278.yaml:5]
  
  Platform missing. You must include one of the available platform keys: bk72xx, esp32, esp8266, host, libretiny, rp2040, rtl87xx.
  name: m5stack-atom-echo-0ab278
  name_add_mac_suffix: False
  friendly_name: M5Stack Atom Echo 0ab278

So, am I on the correct path? Or what does this error actually mean?

Means you need something like this in your code:

esp32:
  board: m5stack-atom
  framework:
    type: arduino

Thanks for the inputs. I got it working… almost.
only the “!remove”
is not accepted by the validate / compile function. I also have an ATOM. Do you know what the issue might be?

Thank you all. I got it working. Here is a complete “tutorial” for someone starting it from scratch like I did

Aim - to switch off the internal (“bad”) speaker of M5Stack Atom and route it to a media player (in my case an old Logitech Media Player radio)

My Configuration

  • Homeassistant Docker 2025.1.4
  • ESPhome installed also via docker (see configuration below)
  • M5Stack Atom Echo

Homeassistant should be up and running. DO NOT follow the default guide for configuring M5Stack Atom on the HA website. This uses the internal speaker. It is not possible to change this at the current time via homeassistant. Therefore you need to have your own Esphome (server) instance

== Install ESPhome via Docker ==
ESPHome can be easily installed through docker

version: '3'
services:
  esphome:
    container_name: esphome
    image: esphome/esphome
    volumes:
      - /volume1/docker/esphome/config:/config
      - /etc/localtime:/etc/localtime:ro
    restart: always
    privileged: true
    network_mode: host

If everything went well, you can access the dashboard at port 6052 of your host
http://192.168.1.XX:6052/

Connect your M5stackAtom to a computer and access the above link via Chrome to add this device to your dashboard.

==Firmware for M5StackAtom==
Here is a “correctly formatted” file which we can later compile and generate the required configuration.

The internal speaker has been commented out (line 65)
and homeassistant.service has been added via tts.google_say (to keep things simple and demonstrate). Change if you are using another tts service.
The mediaplayer to be used for the voice output is defined in entity_id

substitutions:
  name: m5stack-atom-echo
  friendly_name: M5Stack Atom Echo
  micro_wake_word_model: okay_nabu  # alexa, hey_jarvis, hey_mycroft are also supported

esphome:
  name: ${name}
  name_add_mac_suffix: true
  friendly_name: ${friendly_name}
  min_version: 2024.9.0

esp32:
  board: m5stack-atom
  framework:
    type: esp-idf
    version: 4.4.8
    platform_version: 5.4.0

logger:
api:
  encryption:
    key: xxxxxxxxxxxx




ota:
  - platform: esphome
    id: ota_esphome


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

captive_portal:

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset

i2s_audio:
  - id: i2s_audio_bus
    i2s_lrclk_pin: GPIO33
    i2s_bclk_pin: GPIO19

microphone:
  - platform: i2s_audio
    id: echo_microphone
    i2s_din_pin: GPIO23
    adc_type: external
    pdm: true

speaker:
  - platform: i2s_audio
    id: echo_speaker
    i2s_dout_pin: GPIO22
    dac_type: external
    channel: mono

voice_assistant:
  id: va
  microphone: echo_microphone
# speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"
  on_stt_vad_end:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Fast Pulse"
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: none
    - homeassistant.service:
        service: tts.google_say
        data:
         entity_id: media_player.black  # Replace with your media player entity_id
          #volume_level: 0.2  # Set volume to 20%
    - homeassistant.service:
        service: tts.google_say
        data:
          entity_id: media_player.black
        data_template:
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - delay: 100ms
    - voice_assistant.stop:
    - wait_until:
        not:
          voice_assistant.is_running:
    - wait_until:
        not:
          switch.is_on: timer_ringing
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
        else:
          - voice_assistant.start_continuous:
          - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        red: 100%
        green: 0%
        blue: 0%
        brightness: 100%
        effect: none
    - delay: 2s
    - script.execute: reset_led
  on_client_connected:
    - delay: 2s  # Give the api server time to settle
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - micro_wake_word.start:
        else:
          - voice_assistant.start_continuous:
          - script.execute: reset_led
  on_client_disconnected:
    - voice_assistant.stop:
    - micro_wake_word.stop:
  on_timer_finished:
    - voice_assistant.stop:
    - micro_wake_word.stop:
    - switch.turn_on: timer_ringing
    - wait_until:
        not:
          microphone.is_capturing:
    - light.turn_on:
        id: led
        red: 0%
        green: 100%
        blue: 0%
        brightness: 100%
        effect: "Fast Pulse"
    - while:
        condition:
          switch.is_on: timer_ringing
        then:
          - lambda: id(echo_speaker).play(id(timer_finished_wave_file), sizeof(id(timer_finished_wave_file)));
          - delay: 1s
    - wait_until:
        not:
           speaker.is_playing:
    - light.turn_off: led
    - switch.turn_off: timer_ringing
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
        else:
          - voice_assistant.start_continuous:
          - script.execute: reset_led



binary_sensor:
  # button does the following:
  # short click - stop a timer
  # if no timer then restart either microwakeword or voice assistant continuous
  - platform: gpio
    pin:
      number: GPIO39
      inverted: true
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_multi_click:
      - timing:
          - ON for at least 50ms
          - OFF for at least 50ms
        then:
          - if:
              condition:
                switch.is_on: timer_ringing
              then:
                - switch.turn_off: timer_ringing
              else:
                - if:
                    condition:
                      lambda: return id(wake_word_engine_location).state == "On device";
                    then:
                      - voice_assistant.stop
                      - micro_wake_word.stop:
                      - delay: 1s
                      - script.execute: reset_led
                      - script.wait: reset_led
                      - micro_wake_word.start:
                    else:
                      - if:
                          condition: voice_assistant.is_running
                          then:
                            - voice_assistant.stop:
                            - script.execute: reset_led
                      - voice_assistant.start_continuous:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

light:
  - platform: esp32_rmt_led_strip
    id: led
    name: None
    disabled_by_default: true
    entity_category: config
    pin: GPIO27
    default_transition_length: 0s
    chipset: SK6812
    num_leds: 1
    rgb_order: grb
    rmt_channel: 0
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

script:
  - id: reset_led
    then:
      - if:
          condition:
            - lambda: return id(wake_word_engine_location).state == "On device";
            - switch.is_on: use_listen_light
          then:
            - light.turn_on:
                id: led
                red: 100%
                green: 89%
                blue: 71%
                brightness: 60%
                effect: none
          else:
            - if:
                condition:
                  - lambda: return id(wake_word_engine_location).state != "On device";
                  - switch.is_on: use_listen_light
                then:
                  - light.turn_on:
                      id: led
                      red: 0%
                      green: 100%
                      blue: 100%
                      brightness: 60%
                      effect: none
                else:
                  - light.turn_off: led

switch:
  - platform: template
    name: Use listen light
    id: use_listen_light
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - script.execute: reset_led
    on_turn_off:
      - script.execute: reset_led
  - platform: template
    id: timer_ringing
    optimistic: true
    internal: true
    restore_mode: ALWAYS_OFF
    on_turn_on:
      - delay: 15min
      - switch.turn_off: timer_ringing

select:
  - platform: template
    entity_category: config
    name: Wake word engine location
    id: wake_word_engine_location
    optimistic: true
    restore_value: true
    options:
      - In Home Assistant
      - On device
    initial_option: On device
    on_value:
      - if:
          condition:
            lambda: return x == "In Home Assistant";
          then:
            - micro_wake_word.stop
            - delay: 500ms
            - lambda: id(va).set_use_wake_word(true);
            - voice_assistant.start_continuous:
      - if:
          condition:
            lambda: return x == "On device";
          then:
            - lambda: id(va).set_use_wake_word(false);
            - voice_assistant.stop
            - delay: 500ms
            - micro_wake_word.start

external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s
  - source: github://jesserockz/esphome-components
    components: [file]
    refresh: 0s

esp_adf:

file:
  - id: timer_finished_wave_file
    file: https://github.com/esphome/wake-word-voice-assistants/raw/main/sounds/timer_finished.wav

micro_wake_word:
  on_wake_word_detected:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
  vad:
  models:
    - model: ${micro_wake_word_model}

Copy this file to the config of your esphome (in this case /volume1/docker/esphome/config).

Finally you should be seeing something like this -

image

== Validate, compile and download ==

Click on the “three dots” and check if the configuration is correct (“validate”). It works on my setup as pasted.

image

Now click “threedots” > Install. Select Manual Download. This will start the compilation and can take anywhere from 30 s to 5 mins depending on your host.

After successful compilation select the download file format
image

(I choose the older bin format) and download the file to your computer.

==Flash the M5Stack Atom Echo ==
start the download tool via this website (chrome)
https://web.esphome.io/?dashboard_install
and connect to your device. Select the correct serial port. You should be seeing a install link if the connection went through

Select the previously generated bin file and download. Wait for the process to end.

==Home Assistant ==

Add esphome integration in homeassistant. This will autodetect the esphome installed above. After about 2-3 minutes the device M5StackAtom will show up and work.

There are a few updates needed.
First to get rid of the “No such effect ‘pulse’” warning use effect: Slow Pulse instead.
And for the error regarding “voice_assistant > speaker”. Is because they moved from ‘speaker’ to ‘media_player’. So just use media_player: !remove

Hi all! Sorry I’ve been away a while. Yes, there have been a handful of changes, like speaker: being moved into media_player:…and now the new continuing conversations support, which is requiring some rework because TTS audio will often still be in the middle of playing on an HA media player and the ATOM will start listening again. This has resulted in some rather interesting “AI talking to itself” situations since I deployed 2025.5.0 today. (My local llama3 has a sarcastic personality, so it literally starts yelling and getting major attitude…with itself! :rofl:)

I’ve also noticed that additional changes have resulted in media_player: !remove no longer removing the code for the speaker, even though it still doesn’t play (or maybe I just can’t hear that tiny speaker from my desk)?

Anyway, I’m getting a handle on the most recent changes and will rework my original examples soon. Since this is a cat/mouse game as ESPHome voice support evolves, I’m just going to host a copy of my latest yaml via git so these outdated examples stop tripping people up.

Well, media_player: !remove is actually screwing with the new conditionals for continuing conversation, as it is looking for the media_player to finish playing to continue listening. So, instead of using a !remove, I’m currently using

on_tts_start:
    - media_player.volume_set: 0%

…followed by the - homeassistant.action: (homeassistant.service is being depreciated)

I haven’t found a way to base the conditional on the homeassistant.action completing because there’s no direct feedback via the API as to when the service call has completed…at least none that I’ve found yet without additional coding on HA’s side via a script or the like. So, technically, both my Atom and the tts.cloud_say run at the same time and the Atom uses the completion of the local playback to begin listening again, if necessary.

So, one problem squashed-ish. Not entirely ideal, as I’m still hearing a momentary crackle from the Atom speaker as the i2s pipeline kicks in, but it will have to do until another way to handling the start of the next listening round can be sorted out.

Yeah so when you comment out
media_player: echo_media_player

the follow-up question or continuing conversation doesn’t work anymore… Thats unfortunate, because i thought of a way to send the status of the media player to the atom with a helper. If we could make the pipeline wait for this event that would be awesome. However I tried everything but can’t get it to work…

Hi everyone, Thanks for your help it works great, my issue is if I want to play music in the speaker and used it to answer as voice assistant, the music stops playing and doesn’t restart. Do you have the same issue ? Did you find a solution for that ?

For anyone working with the ESP32-S3 AMOLED boards, I’ve put together a complete ESPHome YAML for the ESP32-S3 Touch AMOLED 1.75", including display, audio, and a local Home Assistant voice assistant setup.

GitHub repo here:
:point_right: https://github.com/abramosphere/Home-Assistant-Voice-for-ESP32-S3-Touch-AMOLED-1.75

It’s hardware-specific but fully working, and might be useful as a reference (or starting point) for similar Waveshare / AMOLED S3 boards.

Does anyone have an up-to-date configuration using an original Atom Echo with either local or docker-based wakeword and an external HA media player with the speaker disabled on the echo itself?

Have been tinkering a fair amount but my echo currently stops working if I disable the speaker and otherwise just stalls for ages (presumably buffer issues) and then won’t recognise another wake word even after it eventually partially plays the response that my HA media player played a minute or two prior!