Can't play mp3s using media player since updating to 2024.5.0

I’ve got the basics of a smart speaker set up on a ESP32-S3-Zero, which was OK before updating to 2024.5.0
It plays a few different .mp3s that are in my home assistants www folder.

The audio and media player are set like:

i2s_audio:                              # I2s audio pins
  - id: i2s_in
    i2s_lrclk_pin: $i2s_lrclk_in_pin    # Mic WS
    i2s_bclk_pin: $i2s_bclk_in_pin      # Mic CK
  - id: i2s_out
    i2s_lrclk_pin: $i2s_lrclk_out_pin   # Spk LRC
    i2s_bclk_pin: $i2s_bclk_out_pin     # Spk BCLK

adf_pipeline:                           # ADF pieline pins
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO6                 # Spk DIN

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: $i2s_din_pin           # Mic SD/DA
    channel: left
    sample_rate: 16000
    bits_per_sample: 16bit
    pdm: false

media_player:                           # Media player
  - platform: adf_pipeline
    id: jarvis_media_player
    name: Media Player
    internal: false
    icon: mdi:speaker-wireless
    pipeline:
      - self
      - adf_i2s_out

microphone:                             # ADF Mic
  - platform: adf_pipeline
    id: mic
    pipeline:
      - adf_i2s_in
      - self

I’ve got a button set to do:

button:
  - platform: template
    name: Test sound
    id: test_sound
    icon: "mdi:speaker-play"
    disabled_by_default: true           # Shows entity in HA, but disabled by default
    on_press:
      - media_player.play_media: "http://192.168.5.5:8123/local/Voice_assistant/im_listening.mp3"

This worked reliably before updating, and doesn’t since. I’ve had it work and play the .mp3 just once, over multiple restarts of everything involved.

I see this in the ESPs logs:

[23:01:32][D][button:010]: 'Test sound' Pressed.
[23:01:32][D][media_player:061]: 'Media Player' - Setting
[23:01:32][D][media_player:068]:   Media URL: http://192.168.5.5:8123/local/Voice_assistant/im_listening.mp3
[23:01:32][D][adf_media_player:030]: Got control call in state 1
[23:01:32][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[23:01:32][D][esp-idf:000]: I (132522) MP3_DECODER: MP3 init

[23:01:32][D][esp-idf:000]: I (132528) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=4

[23:01:32][D][i2s_audio:072]: Installing driver : yes
[23:01:32][D][esp_adf_pipeline:358]: pipeline tag 0, http
[23:01:32][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[23:01:32][D][esp_adf_pipeline:358]: pipeline tag 2, i2s_out
[23:01:32][D][esp-idf:000]: I (132540) AUDIO_PIPELINE: link el->rb, el:0x3de20d8c, tag:http, rb:0x3de21290

[23:01:32][D][esp-idf:000]: I (132543) AUDIO_PIPELINE: link el->rb, el:0x3de20f4c, tag:decoder, rb:0x3de222d0

[23:01:32][D][esp_adf_pipeline:370]: Setting up event listener.
[23:01:32][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[23:01:32][I][adf_media_player:135]: got new pipeline state: 1
[23:01:32][D][adf_i2s_out:127]: Set final i2s settings: 22050
[23:01:32][D][esp-idf:000]: I (132575) AUDIO_THREAD: The http task allocate stack on external memory

[23:01:32][D][esp-idf:000]: I (132577) AUDIO_ELEMENT: [http-0x3de20d8c] Element task created

[23:01:32][D][esp-idf:000]: I (132580) AUDIO_THREAD: The decoder task allocate stack on external memory

[23:01:32][D][esp-idf:000]: I (132582) AUDIO_ELEMENT: [decoder-0x3de20f4c] Element task created

[23:01:32][D][esp-idf:000][http]: I (132585) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[23:01:32][D][esp-idf:000][decoder]: I (132588) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[23:01:32][D][esp_audio_sources:097]: Streamer status: 2
[23:01:32][D][esp_audio_sources:098]: decoder status: 2
[23:01:32][D][esp-idf:000][http]: I (132618) HTTP_CLIENT: Body received in fetch header state, 0x3fcc3819, 1703

[23:01:32][D][esp-idf:000][http]: I (132624) HTTP_STREAM: total_bytes=9541

[23:01:32][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=22050, bits=16, ch=1
[23:01:32][D][adf_i2s_out:127]: Set final i2s settings: 22050
[23:01:32][D][esp-idf:000][decoder]: W (132706) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[23:01:32][D][esp-idf:000][decoder]: W (132711) MP3_DECODER: output aborted -3

[23:01:32][D][esp-idf:000][decoder]: I (132717) MP3_DECODER: Closed

[23:01:32][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[23:01:32][I][adf_media_player:135]: got new pipeline state: 2
[23:01:32][D][adf_i2s_out:127]: Set final i2s settings: 22050
[23:01:32][D][esp-idf:000]: I (132743) AUDIO_ELEMENT: [i2s_out-0x3de21118] Element task created

[23:01:32][D][esp-idf:000]: I (132745) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:2103415 Bytes, Inter:163960 Bytes, Dram:163960 Bytes



[23:01:32][D][esp-idf:000][http]: I (132747) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[23:01:32][D][esp-idf:000][decoder]: I (132751) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[23:01:32][D][esp-idf:000][i2s_out]: I (132754) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1

[23:01:32][D][esp-idf:000]: I (132757) AUDIO_PIPELINE: Pipeline started

[23:01:32][I][esp_adf_pipeline:214]: [ decoder ] status: 14
[23:01:32][D][esp-idf:000][http]: I (132833) HTTP_CLIENT: Body received in fetch header state, 0x3fcbf4c1, 1703

[23:01:32][D][esp-idf:000][http]: I (132840) HTTP_STREAM: total_bytes=9541

[23:01:32][I][esp_adf_pipeline:214]: [ http ] status: 14
[23:01:32][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[23:01:32][D][esp_adf_pipeline:131]: Check element [http] status, 3
[23:01:32][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[23:01:32][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[23:01:32][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[23:01:32][I][adf_media_player:135]: got new pipeline state: 3
[23:01:32][D][adf_i2s_out:127]: Set final i2s settings: 22050
[23:01:32][I][esp_adf_pipeline:214]: [ http ] status: 12
[23:01:32][D][esp-idf:000][http]: W (132976) HTTP_STREAM: No more data,errno:0, total_bytes:9541, rlen = 0

[23:01:32][D][esp-idf:000][http]: I (132981) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[23:01:32][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[23:01:32][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=22050, bits=16, ch=1
[23:01:32][D][adf_i2s_out:127]: Set final i2s settings: 22050
[23:01:32][I][esp_adf_pipeline:214]: [ http ] status: 15
[23:01:32][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[23:01:32][I][adf_media_player:135]: got new pipeline state: 4
[23:01:33][D][esp-idf:000][decoder]: I (133371) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[23:01:33][D][esp-idf:000][decoder]: I (133761) MP3_DECODER: Closed

[23:01:33][D][esp-idf:000][i2s_out]: I (133876) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[23:01:33][D][esp_adf_pipeline:400]: Called deinit_all
[23:01:33][D][esp-idf:000]: I (134074) AUDIO_PIPELINE: audio_pipeline_unlinked

[23:01:33][D][esp-idf:000]: W (134077) AUDIO_ELEMENT: [http] Element has not create when AUDIO_ELEMENT_TERMINATE

[23:01:33][D][esp-idf:000]: W (134079) AUDIO_ELEMENT: [decoder] Element has not create when AUDIO_ELEMENT_TERMINATE

[23:01:33][D][esp-idf:000]: W (134081) AUDIO_ELEMENT: [i2s_out] Element has not create when AUDIO_ELEMENT_TERMINATE

[23:01:34][D][esp-idf:000]: I (134085) I2S: DMA queue destroyed

[23:01:34][D][esp_adf_pipeline:302]: State changed from STOPPING to UNINITIALIZED
[23:01:34][I][adf_media_player:135]: got new pipeline state: 0

Nothing plays whether I click the button in home assistant, or the button in the web server from the ESP. The best I’ve managed since updating is a very short mp3 playing once and not playing a second time. Sometimes the ESP restarts when I try, sometimes it doesn’t.
Trying to play alerts to the esp media player using TTS has stopped working too.

If I roll back to the last version of esphome within HA, it works reliably again.
I’m not sure how to find what’s going wrong.

Has anyone got any ideas that could help figure out what’s wrong and get this going again please?

I’ve found that switching from esp-idf version: recommended to version: 4.4.6 gets things going again.
The recommended version was changed from 4.4.6 to 4.4.7 in 2024.5.0.
I’m not sure why 4.4.7 might be causing the problem yet.

My assistant stopped working completely after another update to either home assistant or esphome. It had me stumped for ages. I’ve been using my esp32s3 based DIY assistant as just a speaker to announce stuff.
I’ve been back on the case over the last couple of days though, and got it to the point where it’s usable again!

I started from scratch, and still couldn’t get it going using type: esp-idf. The best I can do is it listening and replying once, then falling over and needing a restart.
Using type: arduino I managed to get it a lot closer, but it plays any TTS (or .mp3, but I don’t use it for that) reeeeeaaaaly slowly while the voice assistant is enabled/listening.
So I created a script to turn off the assistant, say whatever’s needed using TTS and then re-enable the assistant.
Microwakeword can’t be used with arduino, so it’s streaming to my HA server the whole time. I haven’t seen that cause any problems yet though.
It’s working pretty well.

esphome yaml:

# ESP32S3-Zero board https://www.aliexpress.com/item/1005006524672028.html
# MAX98357 I2S 3W amp https://www.aliexpress.com/item/1005006382608935.html
# INMP441 I2S mic

substitutions:
  friendly_name: Jarvis
  host_name: jarvis

  log_level:      DEBUG         # NONE, ERROR, WARN, INFO, DEBUG (Default), VERBOSE, VERY_VERBOSE
  start_volume:   75%

  mic_ws_pin:     GPIO07        # Mic WS
  mic_ck_pin:     GPIO08        # Mic CK
  mic_din_pin:    GPIO09        # Mic SD/DA

  amp_lrc_pin:    GPIO05        # Amp LRC
  amp_bclk_pin:   GPIO04        # Amp BCLK
  amp_din_pin:    GPIO06        # Amp DIN

  rgb_led_pin:    GPIO11        # RGB LED pin, GPIO21 for onboard LED
  no_leds:        "16"          # Number of RGB LEDs
  led_brightness: 25%           # RGB LED brightness for simple on/off
  min_led_brightness: 11%       # Min LED brghtness for effects
  max_led_brightness: 40%       # Max LED brghtness for effects

  assist_mic_gain: 31dBFS       # Assistant mic gain
  assist_noise_supression: "3"  # Assistant noise suppression
  assist_vol_multiplier: "15"   # Assistant volume multiplier

  # Sounds
  pip_sound:               "http://192.168.5.11:8123/local/Voice_assistant/pip.mp3"
  test_sound:              "http://192.168.5.11:8123/local/Voice_assistant/testing_1_2_3_testing.mp3"

esphome:
  name: $host_name
  friendly_name: $friendly_name
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    # then:
    - logger.log:
        level: ERROR
        format: "****Booted"
    - media_player.volume_set: $start_volume
    - media_player.play_media: $pip_sound
    - light.turn_on:
        id: rgb_led
        brightness: $led_brightness
        blue: 0%
        red: 100%
        green: 100%
        effect: slow pulse

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 4MB
  variant: esp32s3
  framework:
    type: arduino

psram:
  mode: quad
  speed: 80MHz

packages:
  logger: !include common/logger.yaml
  api_ota: !include common/api_ota.yaml
  wifi: !include common/wifi.yaml
  # web_server: !include common/web_server.yaml
  sensor: !include common/sensor.yaml
  binary_sensor: !include common/binary_sensor.yaml
  button: !include common/button.yaml
  text_sensors: !include common/text_sensor.yaml
  time: !include common/sync_time.yaml
  # switch: !include common/switch.yaml

logger:                                 # Enable logging. https://esphome.io/components/logger.html
  level: $log_level 

light:
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: $rgb_led_pin
    num_leds: $no_leds
    rmt_channel: 0
    chipset: ws2812
    name: "RGB LED"
    id: "rgb_led"
    default_transition_length: 0ms
    effects:
      - pulse:
          min_brightness: $min_led_brightness
          max_brightness: $max_led_brightness
      - pulse:
          name: "Fast Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
          min_brightness: $min_led_brightness
          max_brightness: $max_led_brightness
      - pulse:
          name: "Slow Pulse"
          # transition_length: 1s       # defaults to 1s
          update_interval: 2s
          min_brightness: $min_led_brightness
          max_brightness: $max_led_brightness
      - pulse:
          name: "Breathe"
          transition_length:
            on_length: 1s
            off_length: 500ms
          update_interval: 1.5s
          min_brightness: $min_led_brightness
          max_brightness: $led_brightness
      - random:
          name: Random
      - random:
          name: Random, custom timing
          transition_length: 500ms
          update_interval: 2s
      - strobe:
      - strobe:
          name: Strobe Effect, RGBW
          colors:
            - state: true
              brightness: $led_brightness
              red: 100%
              green: 0%
              blue: 0%
              duration: 1000ms
            - state: true
              brightness: $led_brightness
              red: 0%
              green: 100%
              blue: 0%
              duration: 1000ms
            - state: true
              brightness: $led_brightness
              red: 0%
              green: 0%
              blue: 100%
              duration: 1000ms
            - state: true
              brightness: $led_brightness
              red: 100%
              green: 100%
              blue: 100%
              duration: 1000ms
      - flicker:
      - flicker:
          name: Flicker Effect With Custom Values
          alpha: 95%
          intensity: 1.5%
      - lambda:
          name: Lambda
          update_interval: 1s
          lambda: |-
            static int state = 0;
            auto call = id(rgb_led).turn_on();
            // Transition of 1000ms = 1s
            call.set_transition_length(1000);
            if (state == 0) {
              call.set_rgb(1.0, 1.0, 1.0);
            } else if (state == 1) {
              call.set_rgb(1.0, 0.0, 1.0);
            } else if (state == 2) {
              call.set_rgb(0.0, 0.0, 1.0);
            } else {
              call.set_rgb(1.0, 0.0, 0.0);
            }
            call.perform();
            state += 1;
            if (state == 4)
              state = 0;              
      - addressable_rainbow:
      - addressable_color_wipe:
      - addressable_twinkle:
      - addressable_random_twinkle:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: $mic_ws_pin  # Mic WS
    i2s_bclk_pin: $mic_ck_pin   # Mic CK
  - id: i2s_out
    i2s_lrclk_pin: $amp_lrc_pin # Amp LRC
    i2s_bclk_pin: $amp_bclk_pin # Amp BCLK

microphone:
  - platform: i2s_audio
    id: mic_id
    i2s_audio_id: i2s_in
    adc_type: external
    sample_rate: 16000
    bits_per_sample: 16bit
    channel: left
    pdm: false
    i2s_din_pin: $mic_din_pin   # Mic SD/DA

media_player:
  - platform: i2s_audio
    name: Media Player
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: $amp_din_pin  # Amp DIN
    mode: mono
    id: i2s_media
    icon: mdi:speaker-wireless

voice_assistant:
  microphone: mic_id
  media_player: i2s_media
  use_wake_word: true
  noise_suppression_level: $assist_noise_supression
  auto_gain: $assist_mic_gain
  volume_multiplier: $assist_vol_multiplier
  on_client_connected:
    - logger.log:
        level: ERROR
        format: "****Client connected"
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - if:
              condition:
                not:
                  - voice_assistant.is_running
              then:
                - logger.log:
                    level: ERROR
                    format: "****Assistant turned on"
                - voice_assistant.stop
                - voice_assistant.start_continuous
                - light.turn_off:
                    id: rgb_led
        else:
          - logger.log:
              level: ERROR
              format: "****Assistant turned off"
          - light.turn_on:
              id: rgb_led
              brightness: $led_brightness
              blue: 0%
              red: 100%
              green: 100%
              effect: slow pulse
          - voice_assistant.stop
  on_client_disconnected:
    - logger.log:
        level: ERROR
        format: "****Client disconnected"
    - voice_assistant.stop
    - light.turn_on:
        id: rgb_led
        brightness: $led_brightness
        blue: 0%
        red: 100%
        green: 0%
        effect: none
  on_listening:
    - logger.log:
        level: ERROR
        format: "****Listening"
    - light.turn_on:
        id: rgb_led
        brightness: $led_brightness
        blue: 0%
        red: 0%
        green: 100%
        effect: none
  on_stt_end:
    - logger.log:
        level: ERROR
        format: "****SST end"
    - light.turn_on:
        id: rgb_led
        brightness: $led_brightness
        blue: 100%
        red: 0%
        green: 0%
        effect: none
  on_idle:
    - logger.log:
        level: ERROR
        format: "****Assistant idle"
    - light.turn_off:
        id: rgb_led
  on_wake_word_detected:
    - media_player.play_media: $pip_sound

switch:                                 # Switches
  - platform: template
    name: Enable Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:microphone-message
    on_turn_on:
      - logger.log:
          level: ERROR
          format: "****Assist turned on"
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
            # - micro_wake_word.start
      - light.turn_off:
          id: rgb_led
    on_turn_off:
    - logger.log:
        level: ERROR
        format: "****Assist turned off"
    - voice_assistant.stop
    - light.turn_on:
        id: rgb_led
        brightness: $led_brightness
        blue: 0%
        red: 100%
        green: 100%
        effect: none

button:                                 # Useful buttons in HA
  - platform: template
    name: Test sound
    id: test_sound
    icon: "mdi:speaker-play"
    disabled_by_default: true           # Shows entity in HA, but disabled by default
    on_press:
      - media_player.play_media: $test_sound

The packages: section includes other .yaml files that hold the basics that I use for almost all esphome projects. They can be replaced with your own basic sections like api: and wifi:.
The logger.log sections log things as errors (in red) so they can easily be seen, and could all be removed.

The script I now use to announce things using TTS:

sequence:
  - type: turn_off
    device_id: d9909511c12525c6dd64cc53d6a5959e
    entity_id: dd1612bb1988f210453075960ade40b6
    domain: switch
  - wait_for_trigger:
      - type: turned_off
        device_id: d9909511c12525c6dd64cc53d6a5959e
        entity_id: dd1612bb1988f210453075960ade40b6
        domain: switch
        trigger: device
    enabled: true
  - delay:
      hours: 0
      minutes: 0
      seconds: 0
      milliseconds: 200
    enabled: true
  - action: tts.speak
    metadata: {}
    data:
      cache: true
      media_player_entity_id: media_player.jarvis_media_player
      message: "{{ message }}"
    target:
      entity_id: tts.piper
  - wait_for_trigger:
      - device_id: d9909511c12525c6dd64cc53d6a5959e
        domain: media_player
        entity_id: e1462bcef826d950e42d5812bcdefcd3
        type: idle
        trigger: device
  - type: turn_on
    device_id: d9909511c12525c6dd64cc53d6a5959e
    entity_id: dd1612bb1988f210453075960ade40b6
    domain: switch
fields:
  message:
    selector:
      text:
        multiline: true
    name: Message
    description: Message to be announced
    default: message
    required: true
alias: Jarvis TTS
description: ""

The device and entity IDs are for the ‘Enable Assistant’ button that will show in HA. I recommenend copy/pasting the yaml into a script and then selecting the device/button using the UI.

It can be called from automations like this:

action: script.jarvis_tts
metadata: {}
data:
 Message to say using TTS

Hope it helps someone get theirs going again