Ditched Alexia Media Player for a DIY Media Player for fully local voice announcements

This project was triggered by all the recent drama with the third party integration Alexa Media Player.

I ended up building this and completely removing the integration. The voice is generated by Piper running in a container.

Final result can be seen here to avoid cross posting

esphome:
  name: speaker_a
  platform: ESP32
  board: esp32dev

# Network configuration omitted

output:
  - platform: gpio
    pin: GPIO27
    id: PAM8302_mute

i2s_audio:
  i2s_lrclk_pin: GPIO12  # WS/LRCK pin
  i2s_bclk_pin: GPIO14   # Bit Clock pin

# Media Player setup
media_player:
  - platform: i2s_audio
    name: "SpeakerA"
    dac_type: external
    i2s_dout_pin: GPIO13   # Data Output pin
    on_announcement:
      then:
      - output.turn_on: PAM8302_mute
    on_play:
      then:
      - output.turn_on: PAM8302_mute
    on_idle:
      then:
      - output.turn_off: PAM8302_mute


1 Like

Nice one, I’d like something similar too, to get rid of my Google Nests.

I see you’ve used PAM8302. What is the bottom-right board?
Not sure I completely understand how you wired the two devices to the ESP.

Yeah I’ve been definitely messy with the explanation

It uses a ESP32, a UDA1334A and a PAM8302A

The UDA is a I2S DAC, that connects to the ESP using 3 pins (data, clock, and channel selection).

The PAM is an analog mono amplifier. It is connected to the PAM analog ground and Left Channel (would work also on Right). In addition, it has an SD (Speaker Disable) pin that the ESP32 activates (PAM8302_mute) when idle, to reduce static hiss and power consumption.

In theory, both the UDA and the PAM are optional. The ESP internal DAC could be connected directly to an off the shelf pre-amplified PC speaker set. But I had difficulty finding the necessary YAML configuration and who got it working commented that the audio quality is mediocre at best (it’s just 8 bit after all).

Also I could have the speaker directly connected to the UDA, but in my case it wasn’t loud enough. If you have a higher impedance speaker it could work, but again, maybe not for music listening.

1 Like

All clear now!
I’m curious how would it compare to Alexa, sound-quality wise.

What speak have you used?

My build compares really badly. The speaker is savaged from a no name speaker set I found in a electronics waste bin. But for my robotic voice only use case, it is enough.

IMHO if you want to build something you can listen music on, the best option is to connect the UDA with the 3.5mm jack to an existing speaker set. I’ve seen people opening the speaker and hiding the boards and connection inside too.

1 Like

I’ve received the parts and trying now to assemble it all together.
My results are not good at all, sound-wise. There are cracks and hisses, and I can’t figure it out. Maybe my expectations were too high? :sweat_smile:

I’m using it with an WT32-ETH01 for it’s ethernet capabilities.
My schematic is as follows, if the sound is coming through then it should be correct, I assume:

And this is my ESPHome code:

ethernet:
  type: LAN8720
  mdc_pin: GPIO23
  mdio_pin: GPIO18
  clk_mode: GPIO0_IN
  phy_addr: 1
  power_pin: GPIO16

output:
  - platform: gpio
    pin: GPIO2 # SD pin on PAM8302
    id: PAM8302_mute

i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: GPIO5  # WSEL pin on UDA1334A
    i2s_bclk_pin: GPIO33  # BCLK pin on UDA1334A

media_player:
  - platform: i2s_audio
    name: "SpeakerA"
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO17   # DIN pin on UDA1334A
    i2s_comm_fmt: lsb
    on_announcement:
      then:
      - output.turn_on: PAM8302_mute
    on_play:
      then:
      - output.turn_on: PAM8302_mute
    on_idle:
      then:
      - output.turn_off: PAM8302_mute