Best hardware setup for Esphome media player and notifier

I’ve been searching for a hardware solution for making an ESPhome based media player, for a decent but simple notification system.
It must be able to use the TTS service, play wav and / or mp3 and is ESPhome based.
I tried to build one with an ESP32 and an external DAC (max98357a) but it only resulted in poor audio quality. Tried also ESP32 S2, but could not get it programmed yet.

Any tips for making a TTS notifier with ESPHome?

A raspberry Zero W with a HifiBerry DAC+ Zero might be a better solution.
You can run a simple Linux there and have all the option of a complete OS, including SnapCast and the likes.

Thanks for the idea!
Although I understand the extra possibilities, I still prefer a solution based on ESPhome (esp8266 or esp32). Why? Simplicity, power consumption, no failing sd cards after power outage, no OS to maintain, etc…

2 Likes

You would need a decent audio board, like a PicoAudio.
But that requires a a TinyPico ESP32 development board too, because ESP boards do not have. A standard for pinouts.
Then you would have to do some coding to make libraries to link to ESPhome, before you can start to tinker with the ESPhome setup.

A solid base could be a muse proto

Or a muse luxe if you can get your hand on one

1 Like

There is nothing special about the Muse Proto other than a screw terminal instead of using just the pins and then a onboard speaker.
The rest of the board is just the traditional ESP32 dev board with no extra DAC or anything else.
The output here is pretty bad and only mono. Nowhere near the typical 24bit / 96Khz the PicoAudio provides and even further from the 24bit / 192Khz the HifiBerry cards provide for a raspi.

I have played with the combination of an ESP32-WROOM-32 and a MAX98357A as well and discovered the following:

It seems to be very dependent of the speaker you use.
The first one I tried was some 1€ China speaker (3W/4Ohm) and the result was awful most of the time. I had massive crackling and sometimes complete silence in the middle of a song.

I then tested it with a Philips Speaker I had laying around. No Idea what Wattage this needs but it worked out of the box with decent sound.

Conclusion: Try another speaker if you have one at hand.

3 Likes

Thanks! I will continue experimenting this way to see if I can reach my goal.

Go ahead and experiment away. Would absolutely love to here what your result is :blush:

The only thing I didn’t get around yet is the response volume. I get very much yelled at when tts pushes the response. Maybe I need some sort of potentiometer in-between? Hopefully I get some time next weekend to solve this.

Same mileage here. I used some random tiny speaker which was only acceptable on a very low sound level or the audio was distorted.

Then I salvaged a speaker from a old pair of computer speakers and now it’s even good for playing music on a louder level.

A raspberry with dedicated DAC hat that I used previously doesn’t have any superior sound - highly suggest esp with good speaker as it is much more versatile!

I’m using Google Nests.
They look nice, have good audio quality and there are plenty available second hand for cheap.

I open them up and physically remove the microphones.

Works perfectly!

The Nest is also an interesting option! Thanks for that idea.
I wonder if this is working pure locally, or always using Google/clouds to play sound, even if the sound originates from a local resource?

For now I got it working with an ESP32 WROOM devkit board, indeed with a bigger speaker. Using the gain of the max98357 connected to GND, balancing between quality and volume.
The board is quite big, so I’m looking for a smaller board that can do the job. Tried ESP32 S2, but could not get them programmed at all, also saw the warning these are still experimental…

Somewhere I should have some ESP32cam laying around, will see if these can do the job

After some fiddling got it working: ESP32 S2 with Max98357.
Programming of the ESP succeeded with esptool.py. Had to do some triggering of programming mode with the buttons (hold O, then push and release RST, then release O button.) After first programming, the Wifi OTA works fine.
First sound was distorted, but after resampling the used MP3 files to 32kHz (used Audacity) sound is good! Also, TTS works fine if using a decent speaker, of course.

1 Like

Any chance you could share your yaml? I’m trying to get setup now, but I can’t get Home Assistant to recognizer it

No problem! I ended up with this:

substitutions:
  name: "esp32mediaplayer1"
  friendly_name: esp32mediaplayer1

packages:
  common: !include common.yaml

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
 
esp32:
  board: lolin_s2_mini
  variant: ESP32S2
  framework:
    type: arduino
wifi:
  power_save_mode: none
  output_power: 10

status_led:
  pin: 15

i2s_audio:
  i2s_lrclk_pin: GPIO33
  i2s_bclk_pin: GPIO18

media_player:
  - platform: i2s_audio
    name: "${friendly_name} player"
    dac_type: external
    i2s_dout_pin: GPIO16
    mode: mono

having in common.yaml:

esphome:
  name: "${name}"
  comment: "${friendly_name}"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  domain: .fritz.box
  ap:
    ssid: "${friendly_name} Hotspot"
    password: !secret wifi_password

captive_portal:

# Enable logging
logger:

# Enable Home Assistant API
api:

# Enable Ota
ota:

sensor:
  - platform: wifi_signal
    name: "${friendly_name} WiFi Signal Strength"
    update_interval: 60s
  - platform: uptime
    name: "${friendly_name} Uptime"
    
binary_sensor:   
  - platform: status
    name: "${friendly_name} Status"

switch:
  - platform: restart
    name: "${friendly_name} Restart"
    
text_sensor:
  - platform: version
    name: "${friendly_name} ESPHome Version"

Hope this helps you get it working!

3 Likes

Nice Gerben. I am looking for a relative simple solution to create a speaker for my alarm and maybe as a media player when the alarm is not necessary.
It would be nice if you could share in more detail which components you used. For instance, the MAX98357 are available in several models, like DFRobot or one of Aliexpress. And hopefully a Wemos ESP module will also work!

Hi!
I used simple cheap Ali products :

https://a.aliexpress.com/_EyLJu5B

https://a.aliexpress.com/_Exbi9ad

https://a.aliexpress.com/_Ewvmo1J

Nothing special, but good enough for some notifications :slight_smile:
Good luck!

1 Like

I have excellent sound quality with these components at the level of hifi systems.
max98357a - is high quality digital D-class aplifier. You need to pay attention to the circuitry and you will get a great result with it.

This is sample playback recorded with smartphone:

Of course it noticeably degrades the quality through the smartphone microphone, it sounds better live.
Played on this small desktop speaker:


and inside:

5 Likes

I have built two speakers with voice assistant function using this manual.

Use as short wires as possible, if possible shielded or twisted pair to avoid crackling and interference. Don’t forget that this is a high frequency digital data bus.
As an enclosure it is good to use old computer speakers or from 5.1 systems

Thank you. The components are in house now. How to connect those components? Do you have a schema of this solution? The only thing I need to make it working