I2S audio won't play all raw (PCM) data

I am trying to get the i2s speaker component to work and play local raw audio. The ESPHome docs don’t say anything about what the required encoding should be for the raw audio.

From what I’ve gathered, it needs to be raw PCM. The data type required for input to the speaker.play method needs to be an 8bit unsigned char (uint8_t), but I can’t tell for certain what the audio format needs to be. An example doc page has some convoluted process of converting to unsigned 8bit 16000Hz, then to signed 8 bit, but doesn’t actually say what the final encoding should be. To make it more confusing, ESPHome sets its config to 16bit 16000Hz (i2s_audio_speaker.cpp) but Espressif mentions that 8bits of the 16 are ignored by the (internal) DAC (I2S - ESP32).

Regarding the DAC… I’m not using the internal one; I have a MAX98357 hooked up to my ESP32 and when I play a signed 8bit 16000Hz raw PCM, it seems to play fast or gets cut off before it finishes playing. I think it might be due to ESPHome configuring the i2s speaker to 16bits. I only just discovered that ESPHome sets the bits per sample to 16, and I’ve tried a bunch of different audio encodings and I think one of them was a 16bit signed little endian format which didn’t play correctly.

I’ve been using variations of this to convert an mp3 to raw PCM data:
ffmpeg -i sound.mp3 -acodec pcm_s8 -f s8 -ac 1 -ar 16000 sound.pcm

I also have the i2s media_player integration and that’s working just fine (streaming audio to it), so the issue is with just the speaker.

Can anyone tell me what the correct format should be for the raw PCM data? Does it need to be different if using an external DAC?

For extra info - the MAX98357A supports 8kHz to 96kHz and 16/24/32bit data.

I would assume that should mean that converting an MP3 like so:
ffmpeg -i sound.mp3 -acodec pcm_s16le -f s16le -ac 1 -ar 16000 sound.pcm
should function properly (16kHz only because that’s what ESPHome has forced on the i2s config), however when I tried to play audio encoded that way, it absolutely didn’t play correctly.

Update: It seems I only tried unsigned 16bit which didn’t play correctly, however signed 16bit 16kHz audio does appear to play at the correct speed, but it is still cutting out early. From what I can tell, it’s playing less than half my sample audio, but at the correct speed/frequency.

Does anyone know why the i2s speaker wouldn’t play the full audio?

I tried different GPIO pins just in case the ones I was using were causing issues, but same result; audio doesn’t finish playing all the way.

Here’s my config currently:

esphome:
  name: $name
  name_add_mac_suffix: true
  friendly_name: $display_name
  project:
    name: $project_name
    version: $project_version

  includes:
    - audio/test_sound.h

  on_boot:
    priority: -100
    then:
      - delay: 3s
      - lambda: |-
          id(i2s_speaker).play(test_sound_pcm, test_sound_pcm_len);

esp32:
  board: esp32dev
  framework:
    type: arduino
    version: recommended

i2s_audio:
  i2s_lrclk_pin: GPIO22
  i2s_bclk_pin: GPIO23

speaker:
  - platform: i2s_audio
    id: i2s_speaker
    dac_type: external
    i2s_dout_pin:
      number: GPIO21
    mode: mono

I have a test mp3 file that I convert to raw PCM using:

ffmpeg -i test_sound.mp3 -acodec pcm_s16le -f s16le -ac 1 -ar 16k test_sound.pcm

I convert the test_sound.pcm to test_sound.h using xxd:

xxd -i test_sound.pcm test_sound.h

test_sound.h contains the raw PCM audio:

const unsigned char test_sound_pcm[] PROGMEM = {
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  ...
  0x31, 0x00, 0xfd, 0xff, 0xc5, 0xff
};
unsigned int test_sound_pcm_len = 52242;

I’ve tried all sorts of formats, and the only ones I’ve got to play and sound ok are signed 8 and 16bit audio. Unsigned 8, 16, signed 32 all don’t work. Also, it doesn’t want any header info (it plays, but you can hear brief static/pop when it starts up I assume as it tries to play the header as audio data).

The only two ideas I have left are that I’m not converting the raw data to C properly (I’d have to use something else instead of xxd), or there’s some issue with the i2s speaker module.

Have you found a solution to this?

I am having a similar issue.

  • 8 bits unsigned, 16kHz - Plays twice as fast
  • 8 bits unsigned, 32kHz - Plays at correct speed, but only plays half the file
  • 16 bits signed, 16kHz - Plays awt correct speed, but only plays half the file

I am using Audacity → SOX → XXD

Have you tried formatting it in Stereo? I’d give it a shot, but I am having the same bootlooping problems you were having.

I have tried stereo and it doesn’t play properly either. In all my tests 8bit seems to work (but you may be right that it plays fast/higher pitch). 16bit signed seems to play at the right speed/pitch, but it’s not half. I was able to find roughly where in my audio it was stopping and it was before half way.

I think xxd isn’t producing the output needed, or there’s a bit that gets converted that i2s interprets as a stop or end of stream or something, or it’s with how ESPHome configures i2s for the speaker component.

I haven’t tinkered with it for a few days, but my next idea is to try and copy the c++ code for the i2s stuff and tweak it and use that instead. I’ve found stuff online where people are using the same DAC I have and seemingly get audio to work, so I’m still hopeful.

FYI, from what I’ve found (and I resorted to installing Audacity, but not SOX), using ffmpeg as I mentioned produces the same raw PCM data. I don’t think Audacity and SOX are necessary.

I’ve logged an issue with ESPHome about the issue. From what I’ve gathered so far, it seems the buffer that’s used to hold the audio data fills up and doesn’t get emptied (in time?) resulting in the buffer becoming completely full which results in the play() function exiting before all the audio is played.

i2s_audio_speaker not playing all audio data · Issue #6222 · esphome/issues (github.com)

Just curious, did you land on a working solution? Care to elaborate?

Yes I did - follow my github thread in the comment just above. Someone contributed a tweak to the i2s code that fixed the buffer issue, and then mixed with a fix to prevent the watchdog from killing the audio if it played too long and it works great now.
Note however that the fix works as of when the comment was posted, which is a few revisions back from where it is now. The suggested patch may likely not work on the current code - but I also haven’t tested the latest release to see if it’s even needed (they’ve done a bunch of changes with i2s).

I’d also mention that the instructions on how to format your audio that were in the ESPHome docs were needlessly complex. There’s nothing overly fancy required, just as I mentioned in this comment: I2S audio won't play all raw (PCM) data - #3 by esand.

Once the changes to i2s stabilize in ESPHome and no customizations are needed to get the audio to play properly, I’m likely (unless someone beats me to it) going to send a PR to update the guide page with these instructions to make it much clearer for everyone.