Best hardware setup for Esphome media player and notifier

After some fiddling got it working: ESP32 S2 with Max98357.
Programming of the ESP succeeded with esptool.py. Had to do some triggering of programming mode with the buttons (hold O, then push and release RST, then release O button.) After first programming, the Wifi OTA works fine.
First sound was distorted, but after resampling the used MP3 files to 32kHz (used Audacity) sound is good! Also, TTS works fine if using a decent speaker, of course.

1 Like

Any chance you could share your yaml? I’m trying to get setup now, but I can’t get Home Assistant to recognizer it

No problem! I ended up with this:

substitutions:
  name: "esp32mediaplayer1"
  friendly_name: esp32mediaplayer1

packages:
  common: !include common.yaml

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
 
esp32:
  board: lolin_s2_mini
  variant: ESP32S2
  framework:
    type: arduino
wifi:
  power_save_mode: none
  output_power: 10

status_led:
  pin: 15

i2s_audio:
  i2s_lrclk_pin: GPIO33
  i2s_bclk_pin: GPIO18

media_player:
  - platform: i2s_audio
    name: "${friendly_name} player"
    dac_type: external
    i2s_dout_pin: GPIO16
    mode: mono

having in common.yaml:

esphome:
  name: "${name}"
  comment: "${friendly_name}"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  domain: .fritz.box
  ap:
    ssid: "${friendly_name} Hotspot"
    password: !secret wifi_password

captive_portal:

# Enable logging
logger:

# Enable Home Assistant API
api:

# Enable Ota
ota:

sensor:
  - platform: wifi_signal
    name: "${friendly_name} WiFi Signal Strength"
    update_interval: 60s
  - platform: uptime
    name: "${friendly_name} Uptime"
    
binary_sensor:   
  - platform: status
    name: "${friendly_name} Status"

switch:
  - platform: restart
    name: "${friendly_name} Restart"
    
text_sensor:
  - platform: version
    name: "${friendly_name} ESPHome Version"

Hope this helps you get it working!

3 Likes

Nice Gerben. I am looking for a relative simple solution to create a speaker for my alarm and maybe as a media player when the alarm is not necessary.
It would be nice if you could share in more detail which components you used. For instance, the MAX98357 are available in several models, like DFRobot or one of Aliexpress. And hopefully a Wemos ESP module will also work!

Hi!
I used simple cheap Ali products :

https://a.aliexpress.com/_EyLJu5B

https://a.aliexpress.com/_Exbi9ad

https://a.aliexpress.com/_Ewvmo1J

Nothing special, but good enough for some notifications :slight_smile:
Good luck!

1 Like

I have excellent sound quality with these components at the level of hifi systems.
max98357a - is high quality digital D-class aplifier. You need to pay attention to the circuitry and you will get a great result with it.

This is sample playback recorded with smartphone:

Of course it noticeably degrades the quality through the smartphone microphone, it sounds better live.
Played on this small desktop speaker:


and inside:

4 Likes

I have built two speakers with voice assistant function using this manual.

Use as short wires as possible, if possible shielded or twisted pair to avoid crackling and interference. Don’t forget that this is a high frequency digital data bus.
As an enclosure it is good to use old computer speakers or from 5.1 systems

Thank you. The components are in house now. How to connect those components? Do you have a schema of this solution? The only thing I need to make it working

Sorry, no schema available, but you can use the pins named in my esphome yaml file for connecting those boards. Those 3 i2s wires in between (besides power and speaker) are all you need.

1 Like

I bought the components, @gdschut mentioned, assembled it and tested it last week. I used the also the mentioned yaml-setup. The TTS is really working fine. But streaming media or a simple .wav of .mp3-file gives problems. The ESP gives some weird sounds and stops working. After reset I can play TTS without problems.
I tried serveral speakers, but no luck.

error message:

[11:52:35][W][component:214]: Component i2s_audio.media_player took a long time for an operation (0.54 s).
[11:52:35][W][component:215]: Components should block for at most 20-30ms.

Any suggestions how to get normal sound by streaming media?

ESPHome version 2024.2.2

I use this for playing mp3’s only, and not for streaming. First I had the same results, but after resampling the mp3’s to 32khz sample rate it sounds good enough for me. I was told this is caused by the limitations of the ESP-32 module. Using a more powerfull version of the ESP-32 should have better results, but I did not try that.

1 Like

Good suggestion to check an other ESP-32 module. For my alarm speaker I changed the (alarm)sound file to a sample rate of 32khz. And it works! Thanks for the tip!

Hey folks. Stumbled upon this thread and actually I’m running a Crowd Funding campaign that is aimed (among other things) to be used as ESPHome-based Home Assistant integration. Just think some of you may find this interesting.

Also being a developer of mentioned devices, I’m really curious to hear community feedback, like the features you’re missing or ideas you’d like to propose

1 Like

Very interesting, @anabolyc! The Louder ESParagus Media Center will be really a good solution for me, but the shipping costs and taxes are to high for me. Sorry.

Sure, I understand. I also did a few lightweight solutions in the past, essentially they were prototypes for the Esparagus boards. The ESP Audio Dock Solo and Duo are MAX98357 based solutions working with both ESP8266 and ESP32/S2/C3, and Louder-ESP32 is a TAS5805M-based one. I’m about to release Louder-ESP32 based on S3 also.
Point is, this is kind of my passion and I did more than one approach to the topic, as you can see :slight_smile:

Hi Andriy, I like the look of your products; great work !

My issue is that my primary use is a Voice Assistant, and thinking that my voice assistant includes a speaker and is already in the room … it should also be a music player.

My days of owning a hi-fi component system are long ago, as I simply don’t take time to sit and just listen to my music library (now digitised). I can of course run VLC on my computer, but it would be nice to have something playing in the background in the living room or kitchen. I do see the appeal of spotify and streaming music providers - but I prefer to listen to all those tracks i already have.

I currently use Raspberry Pis with reSpeaker HATs … but that is overkill in the price and CPU power areas … so at the moment I am waiting for Nabu Casa will release a decent ESP32 based voice assistant device, and hoping it includes a music player.

Hey @donburch888 thank for the kind feedback.

I’m currently working on the upgraded version of the ESP32 Solo dock. My idea is to add a couple of mics and a few less to the mono speaker, making it a full kit for a voice assistant, but still keeping the speaker as a main feature.

I think it may be quite close to what are you looking for.

I agree that Raspberry may be an overkill for just one purpose, I’m using that myself. It is sharing multiple purposes though, serving as a remote audio card for PC and speaker connected to HA/MA and Mopidy. It is also a part of a snapcast network allowing multi-room playback. So there is a place for Pi power in the home audio as well:)

I think there is market for

  • cheap voice assistant device with mono sound, and
  • as above, but decent quality stereo sound
  • and of course there are those who demand top quality audio

Personally I see myself in the middle category, connecting unpowered bookshelf speakers (why add an extra power adaptor). The problem I am experiencing is that it seems Voice Assist and music playing are currently mutually exclusive :frowning_face:

As you have probably seen at HA Voice Assist, they are looking at ESP32-S3 chip with PSRAM to do the voice wakeword detection locally.

I am using seeed reSpeaker HAT boards, but seeed stopped supporting the drivers 5 years ago and the drivers never supported any of the Digital Signal Processing magic to make use of the extra microphones. My understanding is that everyone who has developed these DSP algorithms to a useful level has chosen to make them proprietary (and I can’t entirely blame them). I would not bother with multiple mics until the software is available for them (and by then there will probably be newer ESP chips to handle the extra load).

I will be looking forward to release of your upgraded ESP32 Solo dock.

1 Like

I agree, I want to start from a basic mono model and move forward from there as long as there is a demand. I know that currently, the Voice Assistant setup in the ESP32 is cumbersome and full of pitfalls, but I believe in the power of community and trust it to get better soon, including the usage of 2 mics. And yes, the new solo is based on S3+PSRAM, so it should be supported right away.

Also afaik you can train your own models for custom wake words, I didn’t try it myself though. Should be fun to play with anyway.

2 Likes

Hey folks. Following the discussion above, I have the first samples of the Esparagus Echo devices which are in few lines

  • ESP32-S3 with 16Mb of flash and 8Mb of PSRAM
  • MAX98357 DAC
  • ICS43434 MIC
  • Few RGB LEDS
  • Optional ethernet
    all packed in alu case

I did some basic testing myself, but also looking for beta testers in the community. For anyone keen to try it out and provide honest and usage-based feedback, please reach out for a free sample.

More details in the github repo