Storing i2s binary audio properly?

I have an ESP-WROOM-32 (4Mb flash, 520kb SRAM, 448kb ROM) and I am using it as both an i2s media player and an i2s speaker. The problem I’m encountering is that it boot loops when I try to include a certain amount of raw audio for the i2s speaker (using Create audio clip files for use with I²S Speakers — ESPHome as a guide).

This is my config:

esphome:
  name: $name
  name_add_mac_suffix: true
  friendly_name: $display_name
  project:
    name: $project_name
    version: $project_version
  includes:
    # https://esphome.io/guides/audio_clips_for_i2s
    # ffmpeg -i doorbell_1.mp3 -f s8 -acodec pcm_s8 -ac 1 -ar 16000 doorbell1.raw
    - audio/doorbell_sounds.h

esp32:
  board: esp32dev
  framework:
    type: arduino
    version: recommended

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

mdns:
  disabled: false

api:
  encryption:
    key: $api_key
  services:
    - service: doorbell
      then:
        - speaker.play: !lambda 'return doorbell_sounds.at(id(doorbell).active_index().value());'

logger:

binary_sensor:
  - platform: status
    name: "Status"
    entity_category: "diagnostic"

sensor:
  - platform: uptime
    name: "Uptime"
    type: seconds

  - platform: wifi_signal
    id: wifi_signal_dbm
    name: "WiFi signal strength"
    update_interval: 30s
    unit_of_measurement: dBm
    entity_category: "diagnostic"
    filters:
      - sliding_window_moving_average:
          window_size: 10
          send_every: 10

i2s_audio:
  i2s_lrclk_pin: GPIO27
  i2s_bclk_pin: GPIO26

media_player:
  - platform: i2s_audio
    name: "Media player"
    dac_type: external
    i2s_dout_pin:
      number: GPIO25
      allow_other_uses: true
    mode: mono
    on_play:
      - logger.log: "Playback started!"
    on_pause:
      - logger.log: "Playback paused!"
    on_idle:
      - logger.log: "Playback finished!"

speaker:
  - platform: i2s_audio
    dac_type: external
    i2s_dout_pin:
      number: GPIO25
      allow_other_uses: true
    mode: mono

select:
  - platform: template
    id: doorbell
    name: "Doorbell sound"
    options:
      - "Doorbell 1"
      - "Doorbell 2"
      - "Doorbell 3"
    optimistic: true
    set_action:
      - logger.log:
          format: "Doorbell sound set to '%s'"
          args: ["x.c_str()"]

The doorbell_sounds.h file contains the raw binary audio (using the ffmpeg line in the yaml comment, then using xxd according to the page mentioned above) and looks like this:

std::array<std::vector<unsigned char>, 3> doorbell_sounds = {
  std::vector<unsigned char> {{
    ...
  }},
  std::vector<unsigned char> {{
    ...
  }},
  std::vector<unsigned char> {{
    ...
  }}
};

The project compiles and flashes successfully, however it immediately boot loops saying that something is corrupt (it has memory addresses listed). If I keep only the first std::vector<unsigned char> that’s defined in the array, it flashes and boots ok. However, trying to add more than just that first sound seems to push it over the edge. I tested with various setups to confirm that it seems to be data size related.

When it compiles though, it says that the binary takes only 73% of the space, so it doesn’t seem like it’s exceeding available space. Also, the board is configured as esp32dev which defaults to 320kb RAM, but the board I have apparently has 520kb (I’m assuming it’s the SRAM that is the RAM) - I don’t know if that might have any impact or not.

From the ESPHome docs, PSRAM is supposedly auto-enabled when needed, but I don’t believe my board has PSRAM on it (if it’s needed, I do have some different boards with 16Mb).

Am I missing something in my config, or am I doing something wrong with how the audio should be stored? As it is now, it seems to work (the raw audio needs a bit of fixing; it gets cut off when it plays) but I’d like to be able to store multiple audio clips.

A std::vector always allocates data in RAM (from the heap) so you will almost certainly need PSRAM, and it’s not enabled automatically, you need at a minimum a psram: config line.

The data really should just stay in ROM but since the speaker component requires a vector that’s not possible. That’s a design (or at least implementation) error IMO.

Thank you for this info. I’m not familiar with C++ (last did C like 15+ years ago) so this std::vector stuff is new to me.

From what I can tell in the docs, speaker.play just needs a std::vector<uint8_t> to play, but nothing is saying that all my audio needs to be stored as such (other than that how-to saying to do it that way). I suppose the audio could be stored such that it was put in ROM, then when it’s needed to play it gets put in to RAM in to a std::vector. That’s way beyond my current knowledge though.

Should be as simple as declaring your data like this:

const uint8_t constexpr sound_1[] = {0, 1, 2, 4};
const uint8_t constexpr sound_2[] = {0, 1, 2, 4};
const struct sdata {
    const uint8_t * data;
    unsigned length;
}   sounds[] = {
    sound_1, sizeof sound_1,
    sound_2, sizeof sound_2,
};

I’m not sure that the constexpr is actually required. const definitely is.

Then just pass in the data and length (there is a separate method for this, no vector required.)

     - lambda: |-
              auto &sdata = sounds[id(doorbell).active_index().value()];
              id(my_speaker).play(sdata.data, sdata.length);
1 Like
external_components:
  - source: github://jesserockz/esphome-components
    components: [file]
    refresh: 0s

file:
  - id: timer_finished_wave_file
    file: https://github.com/esphome/firmware/raw/main/voice-assistant/sounds/timer_finished.wav
  - id: google_listen
    file: sounds/voice/google/listen.wav
  - id: google_heard
    file: sounds/voice/google/heard.wav
  - id: google_done
    file: sounds/voice/google/done.wav

...
          - lambda: id(my_speaker).play(id(timer_finished), sizeof(id(timer_finished)));
2 Likes

Hot damn, I like both suggestions!

@clydebarrow - Thanks! I had read in some ESP stuff that consts weren’t kept in RAM so I tried using consts myself but I didn’t know the proper syntax so compiling failed. I was in the middle of resorting to just plain C to see if that would work but never finished (It’s been too long since I did C :frowning:). What you have there looks like it could work.

@veli - That’s just plain neat… I was starting to look in to some extra components, but what I was finding seemed to be for pure Ardruino and other frameworks and I didn’t know how it would/could work with ESPHome. This library looks awesome. I like that I can even pass just a .wav to it so I don’t need to fiddle with it so much.

I’m going to try both and report back.

1 Like

const will store in ROM on ESP32 which has a flat address space for all memory, not on ESP8266, which can’t read byte data from flash - it needs the PROGMEM attribute stuff. No good reason to be using 8266s these days when an ESP32C3 is cheaper, lower power and more powerful.

So, I can confirm that @clydebarrow’s suggestion works. I didn’t need constexpr, just const was enough (and for the record i stuck to const unsigned char).

Thank you also for mentioning the lambda play() method too which I wasn’t aware of.

An issue for a different thread (for anyone looking at this thread) is that it seems the encoded raw audio is either playing at a faster frequency (16 vs 8), or something is up with the i2s speaker feature that’s causing it to cut the audio short (can’t tell if it’s trimming a % off the end, or a static length of play time before it stops). So, heads up… this works but it’s not perfect - yet.

I am running into a similar issue. Could you please post your code?

I haven’t figured out why my audio clips aren’t playing all the way through, but here’s what I have currently (not all my project, just the bits concerning the audio):

esphome:
  includes:
    # ffmpeg -i doorbell1.mp3 -acodec pcm_s16le -f s16le -ac 1 -ar 16000 doorbell1.pcm
    # xxd -i doorbell1.pcm doorbell1.h
    - audio/doorbell1.h
    - audio/doorbell2.h
    - audio/doorbell3.h
    - audio/doorbells.h

esp32:
  board: esp32dev
  framework:
    type: arduino

i2s_audio:
  i2s_lrclk_pin: GPIO27
  i2s_bclk_pin: GPIO26

speaker:
  - platform: i2s_audio
    id: i2s_speaker
    dac_type: external
    i2s_dout_pin:
      number: GPIO25
      allow_other_uses: true
    mode: mono

button:
  - platform: template
    id: doorbell_button
    name: "Doorbell"
    icon: "mdi:doorbell"
    on_press:
      then:
        - lambda: |-
            auto &sdata = doorbells[id(doorbell).active_index().value()];
            id(i2s_speaker).play(sdata.data, sdata.length);

select:
  - platform: template
    id: doorbell
    name: "Doorbell sound"
    options:
      - "Doorbell 1"
      - "Doorbell 2"
      - "Doorbell 3"
    optimistic: true

The doorbell1.h (and 2/3) are formatted like so:

const unsigned char doorbell1_pcm[] = {
  0x02, 0x03, 0x00, 0x02, 0x00, 0x08, 0x0d, 0x02, 0xf9, 0xe5, 0xf7, 0x03,
...
};
unsigned int doorbell1_pcm_len = 36503;

And doorbells.h is this:

const struct sdata {
        const unsigned char * data;
        unsigned length;
} doorbells[] = {
        doorbell1_pcm, doorbell1_pcm_len,
        doorbell2_pcm, doorbell2_pcm_len,
        doorbell3_pcm, doorbell3_pcm_len
};

I believe that my issue is either xxd not converting the raw PCM data in the format I need, or that there’s some issue with the i2s play method. I have another thread (that you seem to have found) about that (I2S audio won’t play all raw (PCM) data).