ESP32 Satellite keep repeating last speech command

I have been running the ESP on a onju-voice-satellite system now for months, with no problems.

Since the last update, it has stopped going idle after a command for example if I ask it the time I get the correct time, but then it repeats that time every minute.

Also if the last thing it said was “Sorry I didn’t understand that.” then that just constantly repeats every minute.

Below is the code. Ant pointer would be a huge help.

substitutions:
  name: "dr-ada"
  friendly_name: "DR Ada"
  wifi_ap_password: "password"

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  min_version: 2023.10.1
  on_boot:
    then:
      - light.turn_on:
          id: top_led
          effect: slow_pulse
          red: 100%
          green: 60%
          blue: 0%
      - wait_until:
          condition:
            wifi.connected:
      - light.turn_on:
          id: top_led
          effect: pulse
          red: 0%
          green: 100%
          blue: 0%
      - wait_until:
          condition:
            api.connected:
      - light.turn_on:
          id: top_led
          effect: none
          red: 0%
          green: 100%
          blue: 0%
      - delay: 1s
      - script.execute: reset_led

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

logger:
api:
  encryption:
   key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  services:
    - service: start_va
      then:
        - voice_assistant.start
    - service: stop_va
      then:
        - voice_assistant.stop

ota:
  platform: esphome

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
    password: "${wifi_ap_password}"

captive_portal:

globals:
  - id: thresh_percent
    type: float
    initial_value: "0.03"
    restore_value: false
  - id: touch_calibration_values_left
    type: uint32_t[5]
    restore_value: false
  - id: touch_calibration_values_center
    type: uint32_t[5]
    restore_value: false
  - id: touch_calibration_values_right
    type: uint32_t[5]
    restore_value: false

interval:
  - interval: 1s
    then:
      - script.execute:
          id: calibrate_touch
          button: 0
      - script.execute:
          id: calibrate_touch
          button: 1
      - script.execute:
          id: calibrate_touch
          button: 2

i2s_audio:
  - i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18

media_player:
  - platform: i2s_audio
    name: None
    id: onju_out
    dac_type: external
    i2s_dout_pin: GPIO12
    mode: mono
    mute_pin:
      number: GPIO21
      inverted: True

######
# speaker:
#   - platform: i2s_audio
#     id: onju_out
#     dac_type: external
#     i2s_dout_pin: GPIO12
#     mode: stereo
######

microphone:
  - platform: i2s_audio
    id: onju_microphone
    i2s_din_pin: GPIO17
    adc_type: external
    pdm: false

voice_assistant:
  id: va
  microphone: onju_microphone
  media_player: onju_out
######
  # speaker: onju_out
######
  use_wake_word: true
  on_start:
    - light.turn_on:
        id: top_led
        blue: 100%
        red: 0%
        green: 0%
        effect: none
  on_listening:
    - light.turn_on:
        id: top_led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_end:
    - media_player.play_media: !lambda return x;
    - light.turn_on:
        id: top_led
        blue: 0%
        red: 20%
        green: 100%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing: onju_out
    - script.execute: reset_led
  on_client_connected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - voice_assistant.start_continuous:
  on_client_disconnected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - voice_assistant.stop:
  on_error:
    - light.turn_on:
        id: top_led
        blue: 0%
        red: 100%
        green: 0%
    - delay: 1s
    - script.execute: reset_led

number:
  - platform: template
    name: "Touch threshold percentage"
    id: touch_threshold_percentage
    update_interval: never
    entity_category: config
    initial_value: 1.25
    min_value: -1
    max_value: 5
    step: 0.25
    optimistic: true
    on_value:
      then:
        - lambda: !lambda |-
            id(thresh_percent) = 0.01 * x;

esp32_touch:
  setup_mode: false
  sleep_duration: 2ms 
  measurement_duration: 800us
  low_voltage_reference: 0.8V
  high_voltage_reference: 2.4V
  
  filter_mode: IIR_16
  debounce_count: 2
  noise_threshold: 0
  jitter_step: 0
  smooth_mode: IIR_2

  denoise_grade: BIT8
  denoise_cap_level: L0

binary_sensor:
  - platform: esp32_touch
    id: volume_down
    pin: GPIO4
    threshold: 539000 # 533156-551132
    on_press: 
      then:
        - light.turn_on: left_led
        - script.execute:
            id: set_volume
            volume: -0.05
        - delay: 1s
        - while:
            condition:
              binary_sensor.is_on: volume_down
            then:
              - script.execute:
                  id: set_volume
                  volume: -0.05
              - delay: 150ms
    on_release: 
      then:
        - light.turn_off: left_led
        
  - platform: esp32_touch
    id: volume_up
    pin: GPIO2
    threshold: 580000 # 575735-593064
    on_press: 
      then:
        - light.turn_on: right_led
        - script.execute:
            id: set_volume
            volume: 0.05
        - delay: 1s
        - while:
            condition:
              binary_sensor.is_on: volume_up
            then:
              - script.execute:
                  id: set_volume
                  volume: 0.05
              - delay: 150ms
    on_release: 
      then:
        - light.turn_off: right_led

  - platform: esp32_touch
    id: action
    pin: GPIO3
    threshold: 751000 # 745618-767100
    on_click:
      - if:
          condition:
            or:
              - switch.is_off: use_wake_word
              - binary_sensor.is_on: mute_switch
          then:
            - if:
                condition: voice_assistant.is_running
                then:
                  - voice_assistant.stop:
                  - script.execute: reset_led
                else:
                  - voice_assistant.start:
          else:
            - voice_assistant.stop
            - delay: 1s
            - script.execute: reset_led
            - script.wait: reset_led
            - voice_assistant.start_continuous:

  - platform: gpio
    id: mute_switch
    pin:
      number: GPIO38
      mode: INPUT_PULLUP
    name: Disable wake word
    on_press:
      - script.execute: turn_on_wake_word
    on_release:
      - script.execute: turn_off_wake_word

  - platform: status
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            and:
              - switch.is_on: use_wake_word
              - binary_sensor.is_off: mute_switch
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            and:
              - switch.is_on: use_wake_word
              - binary_sensor.is_off: mute_switch
          then:
            - voice_assistant.stop:

light:
  - platform: esp32_rmt_led_strip
    id: leds
    pin: GPIO11
    chipset: SK6812
    num_leds: 6
    rgb_order: grb
    rmt_channel: 0
    default_transition_length: 0s
    gamma_correct: 2.8
  - platform: partition
    id: left_led
    segments:
      - id: leds
        from: 0
        to: 0
  - platform: partition
    id: top_led
    segments:
      - id: leds
        from: 1
        to: 4
    effects:
      - pulse:
          name: pulse
          transition_length: 250ms
          update_interval: 250ms
      - pulse:
          name: slow_pulse
          transition_length: 1s
          update_interval: 2s
      - addressable_lambda: 
          name: show_volume
          update_interval: 50ms
          lambda: |-
            int int_volume = int(id(onju_out).volume * 100.0f * it.size());
            int full_leds = int_volume / 100;
            int last_brightness = int_volume % 100;
            int i = 0;
            for(; i < full_leds; i++) {
              it[i] = Color::WHITE;
            }
            if(i < 4) {
              it[i++] = Color(0,0,0).fade_to_white(last_brightness*256/100);
            }
            for(; i < it.size(); i++) {
              it[i] = Color::BLACK;
            }
  - platform: partition
    id: right_led
    segments:
      - id: leds
        from: 5
        to: 5

script:
  - id: reset_led
    then:
      - if:
          condition:
            and:
              - switch.is_on: use_wake_word
              - binary_sensor.is_off: mute_switch
          then:
            - light.turn_on:
                id: top_led
                blue: 100%
                red: 100%
                green: 0%
                brightness: 100%
                effect: none
          else:
            - light.turn_off: top_led

  - id: set_volume
    mode: restart
    parameters:
      volume: float
    then:
      - media_player.volume_set:
          id: onju_out
          volume: !lambda return clamp(id(onju_out).volume+volume, 0.0f, 1.0f);
      - light.turn_on:
          id: top_led
          effect: show_volume
      - delay: 1s
      - script.execute: reset_led

  - id: turn_on_wake_word
    then:
      - if:
          condition:
            and:
              - binary_sensor.is_off: mute_switch
              - switch.is_on: use_wake_word
          then:
            - lambda: id(va).set_use_wake_word(true);
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - voice_assistant.start_continuous
            - script.execute: reset_led

  - id: turn_off_wake_word
    then:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - script.execute: reset_led

  - id: calibrate_touch
    parameters:
      button: int
    then:
      - lambda: |-
          static byte thresh_indices[3] = {0, 0, 0};
          static uint32_t sums[3] = {0, 0, 0};
          static byte qsizes[3] = {0, 0, 0};
          static int consecutive_anomalies_per_button[3] = {0, 0, 0};

          uint32_t newval;
          uint32_t* calibration_values;
          switch(button) {
            case 0:
              newval = id(volume_down).get_value();
              calibration_values = id(touch_calibration_values_left);
              break;
            case 1:
              newval = id(action).get_value();
              calibration_values = id(touch_calibration_values_center);
              break;
            case 2:
              newval = id(volume_up).get_value();
              calibration_values = id(touch_calibration_values_right);
              break;
            default:
              ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
              return;
          }

          if(newval == 0) return;

          //ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
          //ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
          
          if(qsizes[button] == 5) {
            float avg = float(sums[button])/float(qsizes[button]);
            if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
              consecutive_anomalies_per_button[button]++;
              //ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
              if(consecutive_anomalies_per_button[button] < 10)
                return;
            } 
          }

          //ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
          consecutive_anomalies_per_button[button] = 0;

          
          if(qsizes[button] == 5) {
            //ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
            sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
            qsizes[button]--;
          }
          *(calibration_values+thresh_indices[button]) = newval;
          sums[button] += newval;
          qsizes[button]++;
          thresh_indices[button] = (thresh_indices[button] + 1) % 5;

          //ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
          uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
          //ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
          
          switch(button) {
            case 0:
              id(volume_down).set_threshold(newthresh);
              break;
            case 1:
              id(action).set_threshold(newthresh);
              break;
            case 2:
              id(volume_up).set_threshold(newthresh);
              break;
            default:
              ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
              return;
          }

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - script.execute: turn_on_wake_word
    on_turn_off:
      - script.execute: turn_off_wake_word

  - platform: template
    name: Listen LR
    id: listen_lr
    optimistic: true
    on_turn_on:
      - switch.turn_off: use_wake_word
      - delay: 1s
      - voice_assistant.start_continuous
    on_turn_off:
      - switch.turn_on: use_wake_word

Hi, I’m getting the exact same problem, and have been testing and also asking the same question as you on discord. From what I can tell during my own testing, the problem seems to be with the media_player component, together with the voice_assistant component, when using the esp32 framework type of arduino. I have tried swapping the framework type to esp-idf, and still had the same problems. My esp32’s, which are only runnning as a media_player, which also use arduino have the same problem. However, if I swap the media_player only esp32 to esp-idf, then the problem is gone.

Where did you make that change? I cant see in my code where that would go?

Here:

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino <###esp-idf###

I’ve been chatting some more with someone on Discord today, seems like it is a problem with 2025.4 for ESPhome and the media_player component. Only way I can make use of my ESP32 satellites, is to use an external speaker on a separate satellite, Google nest etc to route the TTS response to…

+1
same problem here