Support for Responding to ESP32-S3-Box-3 Queries Through External Speakers (e.g., HomePod)

The current setup of the ESP32-S3-Box-3 is limited to responding to voice queries through its built-in speaker. This limitation restricts the flexibility and integration of the device within a smart home ecosystem where users may prefer to use higher-quality or centralized speakers, such as the HomePod, for voice responses.

Feature Request:

I propose adding a feature that allows the ESP32-S3-Box-3 to redirect its voice response output to external speakers, like the Apple HomePod, or any other AirPlay or Bluetooth-enabled speaker. This feature would enhance user experience by leveraging existing home audio systems, offering better sound quality, and ensuring a more cohesive smart home environment.

Reasoning:

  1. Improved Audio Quality: External speakers, such as the HomePod, generally offer superior sound quality compared to the built-in speaker of the ESP32-S3-Box-3.

  2. Centralized Responses: Users can receive all voice responses in one place, avoiding the need to be near the ESP32-S3-Box-3 to hear responses.

  3. Integration with Existing Systems: Many users already have high-quality speakers integrated into their smart homes. This feature would maximize the utility of those systems.

Implementation Suggestion:

  • Introduce a configuration option in the ESP32-S3-Box-3’s settings to select an external speaker as the output device for voice responses.

  • Support for AirPlay, Bluetooth, or other wireless protocols could be integrated to enable communication between the ESP32-S3-Box-3 and the external speaker.

There are projects that do this. Try searching.

Actually I think you just need to use the media_player option here Voice Assistant — ESPHome

Thank you for the response. I’m aware that there are various projects and that ESPHome allows for customization through the esp32-s3-box-3-abc123.yaml file. However, my request is aimed at providing a more straightforward solution.

I would like an option where I can easily select a Media Player as the speaker without needing to dive into YAML configurations. My goal is to have a user-friendly way to set this up, ideally through the UI, rather than having to manually adjust YAML files. This would make it much more accessible for users who aren’t comfortable with advanced configurations.

1 Like

Without going into the merits for your Feature Request, I think this particular reasoning is a bit flimsy:

Surely you have to be near the ESP32-S3-Box-3 in the first place for it to be able to hear you.

Add the below to the voice pipeline, right before on_tts_stream_end. You have to allow the S3 box to make HA action (previously service) calls. This can still only be done on a speaker that works with tts_say. I think there is a card to send audio to any media player but it’s not currently possible. Having a dropdown box would be insanely hard to code when 7 lines of yaml accomplishes the exact same thing.

  on_tts_end:
     - homeassistant.service:
         service: media_player.play_media
         data:
           entity_id: media_player.vlc_telnet  
           media_content_id: !lambda 'return x;'
           media_content_type: music
           announce: "true"

Thank you for the explanation! However, I’m not entirely sure where I should add the code. Could you clarify where exactly I need to place it within the Home Assistant?

In ESPHome in the yaml configuration for the s3 box, it would be like the below. There may be some differences but you only need to add the on_tts_end part

First thing to do is go to developer tools, then actions and search for tts (text to speech) using the GUI to verify the device shows up and you can send sample text to it without any issues using piper (local) or trs_cloud (Nabu Casa cloud). I don’t own any Apple products but Chromecast devices can be hit or miss. Sonos seems to work perfectly so the Apple smart speaker should work. If so change entory_id below to whatever the apple speaker name is. This will not stop audio output from the S3 box. There is a way to do that but I don’t remember the exact way to do it but it involved commenting some stuff out. For it to work you have to go to settings > devices and services> then click on the ESPHome logo (NOT the devices link underneath). Click configure next to the S3 box and check the checkbox. If you don’t it won’t work even if your configuration is 100 percent correct.

voice_assistant:
  id: va
  microphone: m5cores3_mic
  speaker: m5cores3_spk
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  #vad_threshold: 3
  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: draw_display
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: draw_display
  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: draw_display
  on_tts_end:
    - homeassistant.service:
        service: media_player.play_media
        data:
          entity_id: media_player.sound_bar 
          media_content_id: !lambda 'return x;'
          media_content_type: music
          announce: "true"
  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: draw_display
  on_error:

Thank you for the detailed response. Unfortunately, the proposed solution did not work for me. I followed the steps you outlined, including enabling the “Allow device to perform Home Assistant actions” option and adding the code to the S3 box’s YAML configuration:

substitutions:
  name: esp32-s3-box-3-04dc14
  friendly_name: ESP32 S3 Box 3 04dc14
packages:
  esphome.voice-assistant: github://esphome/firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: !secret api_key
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
voice_assistant:
  on_tts_end:
     - homeassistant.service:
         service: media_player.play_media
         data:
           entity_id: media_player.homepods  
           media_content_id: !lambda 'return x;'
           media_content_type: music
           announce: "true"

I tried both approaches – using voice_assistant and just on_tts_end. When I used only on_tts_end, I received the error: “Component not found: on_tts_end.” when saving and attempting to install the configuration onto the S3 box. With voice_assistant, I encountered this error:

Traceback (most recent call last):
  File "/usr/local/bin/esphome", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/esphome/esphome/__main__.py", line 1014, in main
    return run_esphome(sys.argv)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 1001, in run_esphome
    rc = POST_CONFIG_ACTIONS[args.command](args, config)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 485, in command_run
    exit_code = write_cpp(config)
                ^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 195, in write_cpp
    return write_cpp_file()
           ^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 213, in write_cpp_file
    writer.write_cpp(code_s)
  File "/esphome/esphome/writer.py", line 352, in write_cpp
    copy_src_tree()
  File "/esphome/esphome/writer.py", line 305, in copy_src_tree
    copy_files()
  File "/esphome/esphome/components/esp32/__init__.py", line 709, in copy_files
    shutil.copytree(
  File "/usr/lib/python3.11/shutil.py", line 561, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/shutil.py", line 515, in _copytree
    raise Error(errors)
shutil.Error: [('/data/idf_components/b0ad5728/components/esp-sr/docs/myapp/lib64', '/data/build/esp32-s3-box-3-04dc14/components/esp-sr/docs/myapp/lib64', "[Errno 17] File exists: 'lib' -> '/data/build/esp32-s3-box-3-04dc14/components/esp-sr/docs/myapp/lib64'"), ('/data/idf_components/b0ad5728/components/esp-sr/docs/myapp/bin/python', '/data/build/esp32-s3-box-3-04dc14/components/esp-sr/docs/myapp/bin/python', "[Errno 17] File exists: 'python3' -> '/data/build/esp32-s3-box-3-04dc14/components/esp-sr/docs/myapp/bin/python'"), ('/data/idf_components/b0ad5728/components/esp-sr/docs/myapp/bin/python3', '/data/build/esp32-s3-box-3-04dc14/components/esp-sr/docs/myapp/bin/python3', "[Errno 17] File exists: '/usr/bin/python3' -> '/data/build/esp32-s3-box-3-04dc14/components/esp-sr/docs/myapp/bin/python3'")]

GitHub - BigBobbas/ESP32-S3-Box3-Custom-ESPHome at dev

that firmware literally does everything you want to do and then some. allows any speaker that can be controlled via media_player to be used as the external speaker and uses that only for the speaker when you speak. or you can choose to use the box either way

1 Like