I am trying to get the i2s speaker component to work and play local raw audio. The ESPHome docs don’t say anything about what the required encoding should be for the raw audio.
From what I’ve gathered, it needs to be raw PCM. The data type required for input to the speaker.play method needs to be an 8bit unsigned char (uint8_t
), but I can’t tell for certain what the audio format needs to be. An example doc page has some convoluted process of converting to unsigned 8bit 16000Hz, then to signed 8 bit, but doesn’t actually say what the final encoding should be. To make it more confusing, ESPHome sets its config to 16bit 16000Hz (i2s_audio_speaker.cpp) but Espressif mentions that 8bits of the 16 are ignored by the (internal) DAC (I2S - ESP32).
Regarding the DAC… I’m not using the internal one; I have a MAX98357 hooked up to my ESP32 and when I play a signed 8bit 16000Hz raw PCM, it seems to play fast or gets cut off before it finishes playing. I think it might be due to ESPHome configuring the i2s speaker to 16bits. I only just discovered that ESPHome sets the bits per sample to 16, and I’ve tried a bunch of different audio encodings and I think one of them was a 16bit signed little endian format which didn’t play correctly.
I’ve been using variations of this to convert an mp3 to raw PCM data:
ffmpeg -i sound.mp3 -acodec pcm_s8 -f s8 -ac 1 -ar 16000 sound.pcm
I also have the i2s media_player integration and that’s working just fine (streaming audio to it), so the issue is with just the speaker.
Can anyone tell me what the correct format should be for the raw PCM data? Does it need to be different if using an external DAC?