I have a simple Voice Assistant device: ESP32, I2s mic and amp.
It sort of works, but the only way I can also use it as a media player is with the Arduino framework. I don’t know enough to know why; I spent hours trying to find workarounds using IDF (no media player, just speaker).
Now that I read about this Arduino as IDF Component (Major Architectural Change) I worry that it will be more “like” IDF, maybe with an Arduino “surface” and the media player will disappear.
I’d really like to hear from somebody that knows, rather than spend who knows how long trying to find out.
Also, what are people’s experience with this Major Change?
Thank youse all.
Have you searched the forum? It is stuffed full of examples.
start here
@Arh Of course I searched the forum.
Did you read even my title? It’s all about ESPhome >= 2025.10
There’s nothing about media_player for ESPhome >= 2025.10
So what is your issue?
Media player exists in IDF.
Using Arduino for a media player and voice assistant has not been used for the best part of a year. This is because it just works better in IDF.
My problem is that if I use the Arduino Framework I can have “media_player:” or “speaker:” but if I choose esp-idf the yaml checker will say there’s no “media_player” component.
The yaml below has both versions, just the “speaker” commented out because I want the media_player. If I change framework to esp-idf it won’t build with media_player, only if I switch the relevant lines to “speaker”.
It’s possible I’m missing something; if so, please tell me what.
Here’s the content:
esphome:
name: "espvoice2"
friendly_name: ESPvoice2
esp32:
board: esp32dev
framework:
type: arduino
# type: esp-idf
# version: recommended
logger: # Enable logging
baud_rate: 115200
level: DEBUG
# level: VERY_VERBOSE
api: # Enable Home Assistant API
encryption:
key: "somekey"
ota:
- platform: esphome
password: "somepwd"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
fast_connect: on
ap: # Enable fallback hotspot (captive portal) in case wifi connection fails
ssid: "Espvoice2 Fallback Hotspot"
password: "somepwd"
manual_ip:
static_ip: 192.168.0.149
gateway: 192.168.0.1
subnet: 255.255.255.0
dns1: 1.1.1.1
dns2: 8.8.8.8
i2s_audio:
i2s_lrclk_pin: GPIO15 # MIC-WS SPK- LRC
i2s_bclk_pin: GPIO17 # MIC-SCK SPK- BCLK
microphone:
- platform: i2s_audio
i2s_din_pin: GPIO27 #sd
id: inmp441_mic
adc_type: external
pdm: false
sample_rate: 16000
media_player:
#speaker:
- platform: i2s_audio
id: big_speaker
dac_type: external
i2s_dout_pin: #DIN
number: GPIO16
allow_other_uses: true
# Next for Speaker only
# channel: mono
# Next for MediaPlayer only
name: ESPvoice2 Media Player
mode: mono
on_pause:
then:
- media_player.stop
voice_assistant:
microphone: inmp441_mic
media_player: big_speaker
# speaker: big_speaker
# use_wake_word: false
noise_suppression_level: 2
auto_gain: 31dBFS
# volume_multiplier: 15.0
on_listening:
- light.turn_on: response_light
on_end:
- light.turn_off: response_light
binary_sensor:
- platform: gpio
id: voice_switch
name: Voice Switch
pin:
number: GPIO14
inverted: true
mode:
input: true
pullup: true
on_press:
- media_player.stop
- voice_assistant.start:
silence_detection: false #Jan'24
on_release:
# - delay: 7s # Add a small delay to ensure audio processing is complete
- wait_until:
not:
voice_assistant.is_running:
- voice_assistant.stop
The code above builds fine.
If I change the platform lines like this:
esp32:
board: esp32dev
framework:
# type: arduino
type: esp-idf
version: recommended
I get this:
Well?
If you read the link I sent above you will see you need this code for it to work.
i2s_audio:
i2s_lrclk_pin: GPIOXX
i2s_bclk_pin: GPIOXX
sample_rate: 48000
speaker:
- platform: i2s_audio
id: speaker_id
dac_type: external
i2s_dout_pin: GPIOXX
sample_rate: 48000
- platform: mixer
id: mixer_speaker_id
output_speaker: speaker_id
source_speakers:
- id: announcement_spk_mixer_input
- id: media_spk_mixer_input
- platform: resampler
id: media_spk_resampling_input
output_speaker: media_spk_mixer_input
- platform: resampler
id: announcement_spk_resampling_input
output_speaker: announcement_spk_mixer_input
media_player:
- platform: speaker
name: "Speaker Media Player"
id: speaker_media_player_id
media_pipeline:
speaker: media_spk_resampling_input
num_channels: 2
announcement_pipeline:
speaker: announcement_spk_resampling_input
num_channels: 1
files:
- id: alarm_sound
file: alarm.flac # Placed in the yaml directory. Should be encoded with a 48000 Hz sample rate, mono or stereo audio, and 16 bits per sample.
Although if you want a nicely working voice setup follow the examples I posted above.
Also if you are not a board with PSRAM you are going to have issues.
You can always just not update your device.
Thank you, I did try to build that way, and it does.
Didn’t install it yet for the following reasons:
- The example on that page (and your code) shows playing of local files, which is not what I want. It’s not 100% clear to me from anywhere that it can be used as a “normal” media player.
- my device has no PSRAM, so I might try it after I’m sure I have an easy way to restore it if it’s not good
- it doesn’t feel like progress when few clear lines have to be replaced my much more, some confusing, lines
First off the great thing about esphome is it does not need to be updated if something is working.
The examples I showed have nothing to do with playing local files only. The esphome page for speaker stuff is a media player for HA.
The voice assist thread has numerous examples of a voice assistant setup, each of which contain a media player element.
But you will not be able to run the voice setup without psram, preferably and N16R8 board.
Things have been progressing with voice and media a lot over the last year or 2. This is the great thing about HA and ESPhome. Replacing a few lines of code with twice as many lines of code is no issue really and has allowed a lot more progress to be developed which will be coming soon, like multi room audio.
