Worried about ESPhome 2025.10: Arduino as IDF Component, media player in Voice assistant device

I have a simple Voice Assistant device: ESP32, I2s mic and amp.
It sort of works, but the only way I can also use it as a media player is with the Arduino framework. I don’t know enough to know why; I spent hours trying to find workarounds using IDF (no media player, just speaker).
Now that I read about this Arduino as IDF Component (Major Architectural Change) I worry that it will be more “like” IDF, maybe with an Arduino “surface” and the media player will disappear.
I’d really like to hear from somebody that knows, rather than spend who knows how long trying to find out.
Also, what are people’s experience with this Major Change?
Thank youse all.

Have you searched the forum? It is stuffed full of examples.

start here

@Arh Of course I searched the forum.
Did you read even my title? It’s all about ESPhome >= 2025.10
There’s nothing about media_player for ESPhome >= 2025.10

So what is your issue?

Media player exists in IDF.

Using Arduino for a media player and voice assistant has not been used for the best part of a year. This is because it just works better in IDF.

My problem is that if I use the Arduino Framework I can have “media_player:” or “speaker:” but if I choose esp-idf the yaml checker will say there’s no “media_player” component.
The yaml below has both versions, just the “speaker” commented out because I want the media_player. If I change framework to esp-idf it won’t build with media_player, only if I switch the relevant lines to “speaker”.
It’s possible I’m missing something; if so, please tell me what.
Here’s the content:

esphome:
  name: "espvoice2"
  friendly_name: ESPvoice2

esp32:
  board: esp32dev
  framework:
    type: arduino
#    type: esp-idf
#    version: recommended

logger:  # Enable logging
  baud_rate: 115200
  level: DEBUG
#  level: VERY_VERBOSE

api:  # Enable Home Assistant API
  encryption:
    key: "somekey"

ota:
  - platform: esphome
    password: "somepwd"
    
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: on

  ap:  # Enable fallback hotspot (captive portal) in case wifi connection fails
    ssid: "Espvoice2 Fallback Hotspot"
    password: "somepwd"
  manual_ip:
    static_ip: 192.168.0.149
    gateway: 192.168.0.1
    subnet: 255.255.255.0
    dns1: 1.1.1.1
    dns2: 8.8.8.8

i2s_audio:
    i2s_lrclk_pin: GPIO15 # MIC-WS  SPK- LRC
    i2s_bclk_pin: GPIO17   # MIC-SCK SPK- BCLK

microphone:
  - platform: i2s_audio
    i2s_din_pin: GPIO27 #sd
    id: inmp441_mic
    adc_type: external
    pdm: false
    sample_rate: 16000

media_player:
#speaker:
  - platform: i2s_audio
    id: big_speaker
    dac_type: external
    i2s_dout_pin:  #DIN
      number: GPIO16
      allow_other_uses: true
# Next for Speaker only
#    channel: mono
# Next for MediaPlayer only
    name: ESPvoice2 Media Player
    mode: mono
    on_pause:
      then:
      - media_player.stop
    
voice_assistant:
  microphone: inmp441_mic
  media_player: big_speaker
#  speaker: big_speaker
#  use_wake_word: false
  noise_suppression_level: 2
  auto_gain: 31dBFS
#  volume_multiplier: 15.0
  on_listening:
    - light.turn_on: response_light
  on_end:
    - light.turn_off: response_light

binary_sensor:
  - platform: gpio
    id: voice_switch
    name: Voice Switch
    pin:
      number: GPIO14
      inverted: true
      mode:
        input: true
        pullup: true
    on_press:
      - media_player.stop
      - voice_assistant.start:
          silence_detection: false  #Jan'24
    on_release:
#      - delay: 7s  # Add a small delay to ensure audio processing is complete
      - wait_until:
          not:
            voice_assistant.is_running:
      - voice_assistant.stop

The code above builds fine.
If I change the platform lines like this:

esp32:
  board: esp32dev
  framework:
#    type: arduino
    type: esp-idf
    version: recommended

I get this:


Well?

If you read the link I sent above you will see you need this code for it to work.

i2s_audio:
    i2s_lrclk_pin: GPIOXX
    i2s_bclk_pin: GPIOXX
    sample_rate: 48000
speaker:
  - platform: i2s_audio
    id: speaker_id
    dac_type: external
    i2s_dout_pin: GPIOXX
    sample_rate: 48000
  - platform: mixer
    id: mixer_speaker_id
    output_speaker: speaker_id
    source_speakers:
      - id: announcement_spk_mixer_input
      - id: media_spk_mixer_input
  - platform: resampler
    id: media_spk_resampling_input
    output_speaker: media_spk_mixer_input
  - platform: resampler
    id: announcement_spk_resampling_input
    output_speaker: announcement_spk_mixer_input
media_player:
  - platform: speaker
    name: "Speaker Media Player"
    id: speaker_media_player_id
    media_pipeline:
        speaker: media_spk_resampling_input
        num_channels: 2
    announcement_pipeline:
        speaker: announcement_spk_resampling_input
        num_channels: 1
    files:
      - id: alarm_sound
        file: alarm.flac # Placed in the yaml directory. Should be encoded with a 48000 Hz sample rate, mono or stereo audio, and 16 bits per sample.

Although if you want a nicely working voice setup follow the examples I posted above.

Also if you are not a board with PSRAM you are going to have issues.

You can always just not update your device.

Thank you, I did try to build that way, and it does.
Didn’t install it yet for the following reasons:

  • The example on that page (and your code) shows playing of local files, which is not what I want. It’s not 100% clear to me from anywhere that it can be used as a “normal” media player.
  • my device has no PSRAM, so I might try it after I’m sure I have an easy way to restore it if it’s not good
  • it doesn’t feel like progress when few clear lines have to be replaced my much more, some confusing, lines

First off the great thing about esphome is it does not need to be updated if something is working.

The examples I showed have nothing to do with playing local files only. The esphome page for speaker stuff is a media player for HA.

The voice assist thread has numerous examples of a voice assistant setup, each of which contain a media player element.

But you will not be able to run the voice setup without psram, preferably and N16R8 board.

Things have been progressing with voice and media a lot over the last year or 2. This is the great thing about HA and ESPhome. Replacing a few lines of code with twice as many lines of code is no issue really and has allowed a lot more progress to be developed which will be coming soon, like multi room audio.