Best hardware setup for Esphome media player and notifier

If you’re only using this ESP32 device to play audio (announcements or maybe music) then maybe consider flashing it with the Squeezebox Lite firmware for the ESP32? I found that this works better, especially when driven by Music Assistant, and having Music Assistant explose media players to Home Assistant for doing TTS announcements, etc.

Also to repeat what was mentioned much earlier in the thread, the quality of both the speaker and the enclosure it’s mounted in IS VERY IMPORTANT. I’ve taken to buying cheap little bluetooth speaker products, ripping out the guts and shoving an ESP32 inside them. If you’re clever, you can save enough of the components to retain the USB C connector previously used to charge the battery (which was removed), as well as wiring up buttons to ESP32 GPIO inputs. You can have ESPHome or the Squeezebox Lite firmware act on these buttons for volume up/down, play/pause, etc. This required some real DIY hacking with a soldering iron to pull off, but the result sounds good because the speaker and enclosure were engineered to sound good.

1 Like

It sounds interesting… do you know where I can find an easy, complete, and detailed step-by-step guide on this?

Yes, had the same issue in one of the countless iterations of my RFID jukebox. I haven’t debugged this problem enough, yet, but I think ESPhome has some performance bottlenecks making it difficult to use for a connected speaker.

Been there, done that, bought a T-shirt :smile:

I ended up using squeezlite-esp32 on a Muse Luxe speaker.

1 Like

A starting point is GitHub - sle118/squeezelite-esp32: ESP32 Music streaming based on Squeezelite, with support for multi-room sync, AirPlay, Bluetooth, Hardware buttons, display and more which is the github repo for this software. It’s not quite a simple step-by-step process, though if you wire up pins to an ESP32, you’ve got over one major hurdle. Where it can get tricky is if you want to connect switches and have Squeezelite-esp32 act on those for play/pause, skip forward/back, etc. You have to compose a JSON blob and upload it to the running squeezlite esp32 todefine those pins and functions. But you can do that after you get the basic music-playing capability working.

You should decide at the outside if you plan to have Music Assistant (MA) installed and part of your world. There’s many good reasons to do this, and going down this path means that MA automatically pause music that’s play, then play the TTS announcement and then resume. And you can configure MA to include a litlte “ding” noise before it plays the announcement. If you go down this path, then you can expose the MA devices into Home Assistant as media players as the interface.

Alternatively, you can add the Squeezebox-lite devices as Home Assistant “Squeezebox” media players directly, but then you don’t get those other nice features.

Note that if you go down this path, then the device is not going to be a Voice Assistant as it has no microphone, etc. Just a music/media player.

If you start with that github link and then do some searches, chances are you might turn up something more like a tutorial? The documentation on the github page is pretty complete, though it can be a little confusing at the outset until you get the feel for how it wants to be installed, configured and used. Like most things…

EDIT: To be fair, it’s been more than few months since I went down this path, so possibly the music playing experience with ESPHome (as compared to the squeezebox lite approach) has improved. The voice assistant and media playing code has been under active development, so what I experienced 6 months ago is not the same thing you’ll see today.

On the other hand, if you have shitty speaker acoustics, it really doesn’t matter what software is driving it. So what you invest in the speaker and enclosure will pay dividends regardless of what software stack you choose. Heck, try them both once you get the hardware working and see how each feels!

1 Like

As of today 20250719,
My project esp32+max98357 has good sound quality

Interesting! What kind of ESP32 board are you using? And what did u use to program it? (esphome? anything else?)

It is a board that can be found everywhere,
esp32s3+esphome,
At first I used flying wires, and the sound had obvious noise,
After I connected it by welding, the noise disappeared

Thanks for sharing. Are you using Seeed Studio variant of the esp32? What is the wiring or pinout that you have used to connect the Max Board?

Thanks! I upgraded to ESP32S3 + Max98357 + esphome, and now the sound is fine. I just have a couple of seconds delay before it starts. Anyone knows how to minimize that?

I don’t have this problem,
even if there is a delay of 1 second at most it’s barely noticeable,
is psram enabled?

It is the development board of esp32s3r16n8.
As long as it is not gpio0 gpio02,
you can choose the pinout you like.

1 Like

I think I got the same board, with double USB-C ports.
https://nl.aliexpress.com/item/1005005051294262.html - ESP32-S3 N16R8

This is my esphome yaml (relevant part):

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

i2s_audio:
  i2s_lrclk_pin: GPIO7
  i2s_bclk_pin: GPIO15

media_player:
  - platform: i2s_audio
    name: "${friendly_name} player"
    dac_type: external
    i2s_dout_pin: GPIO16
    mode: mono

How do I enable psram?

esphome:
  name: esp32s3r16n8
  friendly_name: esp32s3r16n8

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  cpu_frequency: 240MHZ
  framework:
    type: esp-idf

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "qg+NsW2U1mTeyYUo3KLRMHF7MT0gB2qLnTprjdo7S+w="

ota:
  - platform: esphome
    password: "92d45a56ad920dc4f5a5d35aad80d90f"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esp32S3R16N8 Fallback Hotspot"
    password: "qn7JJ5RmDGw1"

captive_portal:

psram:
  mode: octal
  speed: 80MHz

i2s_audio:
  i2s_lrclk_pin: GPIO7
  i2s_bclk_pin: GPIO15

speaker:
  - platform: i2s_audio
    i2s_dout_pin: 16
    id: out_sp
    channel: mono
    sample_rate: 48000
    bits_per_sample: 32bit
#    i2s_mode: primary
    use_apll: false
    bits_per_channel: default
#    mclk_multiple: 256
    buffer_duration: 500ms
    timeout: 500ms
#    i2s_comm_fmt: stand_i2s
    dac_type: external
    num_channels: 1

media_player:
  - platform: speaker
    name: "${friendly_name} player"
    announcement_pipeline:
      speaker: out_sp
      format: FLAC
      sample_rate: 48000
      num_channels: 1  # S3 Box only has one output channel


1 Like

Thanks! I used your yaml code, but still experience a delay of about 3 seconds.
I use my mediaplayer for notifications like my doorbell.
When I press the doorbell (or the testbutton in Node Red via Home Assistant), I see on the same second (so without delay) in the log of the esphome node this:

> [18:27:05][D][media_player:074]: 'ESP32 Mediaplayer 2 player' - Setting
> [18:27:05][D][media_player:081]:   Media URL: https://hass.xyz.xyz/local/audio/dingding32khz.mp3?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiI4YjI4NzE2NDUxMGY0NTFkOWFiN2FlZTY5ZTZmNWIxYSIsInBhdGgiOiIvbG9jYWwvYXVkaW8vZGluZ2RpbmczMmtoei5tcDMiLCJwYXJhbXMiOltdLCJpYXQiOjE3NTM2MzM2MjUsImV4cCI6MTc1MzcyMDAyNX0.R0sv7x6EjvhjttJrcfR51mpBirYKNdUoVs7OfPSqHSc
> [18:27:05][D][media_player:087]:  Announcement: yes

I takes about 3.5 seconds before the mp3 starts playing.
My file is a simple doorbell tune Download dingding32khz.mp3 | LimeWire (expires aug 3)
Do you have any idea why it takes 3 seconds to start playing?

Probaly due the mp3 has to be converted to a stream playable by the media player each time. You could try to cache it after the first use

That is a great suggestion. Can you explain me how to do that in esphome?

Try changing your mp3 to flac and then playing it.

I haven’t encountered this situation,
I see you used https://hass.xyz.xyz,
Maybe you can try http://192.168.1.1 this way,
ESP32S3 is not that difficult to play an MP3

esphome:
  name: esp32s3r16n8
  friendly_name: esp32s3r16n8

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  cpu_frequency: 240MHZ
  framework:
    type: esp-idf

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "qg+NsW2U1mTeyYUo3KLRMHF7MT0gB2qLnTprjdo7S+w="

ota:
  - platform: esphome
    password: "92d45a56ad920dc4f5a5d35aad80d90f"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esp32S3R16N8 Fallback Hotspot"
    password: "qn7JJ5RmDGw1"

captive_portal:

psram:
  mode: octal
  speed: 80MHz

i2s_audio:
  i2s_lrclk_pin: GPIO7
  i2s_bclk_pin: GPIO15

speaker:
  - platform: i2s_audio
    i2s_dout_pin: 16
    id: out_sp
    channel: mono
    sample_rate: 48000
    bits_per_sample: 32bit
#    i2s_mode: primary
    use_apll: false
    bits_per_channel: default
#    mclk_multiple: 256
    buffer_duration: 500ms
    timeout: 500ms
#    i2s_comm_fmt: stand_i2s
    dac_type: external
    num_channels: 1

media_player:
  - platform: speaker
    name: "${friendly_name} player"
    codec_support_enabled: false
    announcement_pipeline:
      speaker: out_sp
      format: WAV
      sample_rate: 48000
      num_channels: 1  # S3 Box only has one output channel

I added codec_support_enabled: false ,
This code allows the home assistant to decode your mp3 and then pass it to esp32 for playback,
This can greatly reduce the performance consumption of esp32

Thanks! I added that line into my yaml, but get errors without even playing a file:

> [17:27:58][E][speaker_media_player:326]: The announcement pipeline's file reader encountered an error.
> [17:27:58][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_NOT_SUPPORTED

I also tried converting my mp3 to flac, but the mediaplayer did not play it at all.
Not yet tried to serve my mp3 from a local (IP based) server, will do that soon.