Low cost ESP32 voice platforms

RealDeco · May 30, 2025, 3:48am

could you share your working code?

hunterua · June 3, 2025, 1:14pm

hi. try to change channel to “stereo” instead of “mono”. I’ve got it working this way.

AlexDixon · June 4, 2025, 9:39pm

I think I’ve got a similar config. Mine works - wake word, listening, speaker, and PWM control of the display. The WS2812 works as a single RGB LED, and adding some functions to indicate listening status, especially with the display off, is on my to-do list. I also didn’t bother reconfiguring the buttons as they’re annoying to access anyway, though reset works as intended.

I was a bit lazy with the images - the config is only slightly modified from the S3-Box, and I kept the 320x240 images, setting the config to scale them. I didn’t see any difference between the resized images and these once I set the background colour of the four “active” functions to off-white (#F2F4F9).

The text transcription of both my request and the assistant’s response are cut off due to the round screen.

The sticker on my device says “V5-EN”.

---
substitutions:
  name: voice-assistant
  friendly_name: Voice assistant
  loading_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/loading_320_240.png
  idle_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/idle_320_240.png
  listening_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/listening_320_240.png
  thinking_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/thinking_320_240.png
  replying_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/replying_320_240.png
  error_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/error_320_240.png
  timer_finished_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/timer_finished_320_240.png

  wifi: !secret wifi_ssid
  wifi_pass: !secret wifi_password
  ota_pass: !secret ota_pass

  loading_illustration_background_color: "000000"
  idle_illustration_background_color: "000000"
  listening_illustration_background_color: "F2F4F9"
  thinking_illustration_background_color: "F2F4F9"
  replying_illustration_background_color: "F2F4F9"
  error_illustration_background_color: "000000"
  timer_finished_background_color: "F2F4F9"

  voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"
  voice_assist_timer_finished_phase_id: "20"

  # These unique characters have been extracted from every test file of every language available on https://github.com/home-assistant/intents (14 March 2024)
  # However, the Figtree font only contains Latin characters, so there is no point using this... unlessyou change the font configuration accordingly.
  allowed_characters: " !#%'()+,-./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWYZ[]_abcdefghijklmnopqrstuvwxyz{|}°²³µ¿ÁÂÄÅÉÖÚßàáâãäåæçèéêëìíîðñòóôõöøùúûüýþāăąćčďĐđēėęěğĮįıļľŁłńňőřśšťũūůűųźŻżŽžơưșțΆΈΌΐΑΒΓΔΕΖΗΘΚΜΝΠΡΣΤΥΦάέήίαβγδεζηθικλμνξοπρςστυφχψωϊόύώАБВГДЕЖЗИКЛМНОПРСТУХЦЧШЪЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћאבגדהוזחטיכלםמןנסעפץצקרשת،ءآأإئابةتجحخدذرزسشصضطظعغفقكلمنهوىيٹپچڈکگںھہیےংকচতধনফবযরলশষস়ািু্చయలిెొ్ംഅആഇഈഉഎഓകഗങചജഞടഡണതദധനപഫബഭമയരറലളവശസഹാിീുൂെേൈ്ൺൻർൽൾაბგდევზთილმნოპრსტუფქყშჩცძჭხạảấầẩậắặẹẽếềểệỉịọỏốồổỗộớờởợụủứừửữựỳ—、一上不个中为主乾了些亮人任低佔何作供依侧係個側偵充光入全关冇冷几切到制前動區卧厅厨及口另右吊后吗启吸呀咗哪唔問啟嗎嘅嘛器圍在场執場外多大始安定客室家密寵对將小少左已帘常幫幾库度庫廊廚廳开式後恆感態成我戲戶户房所扇手打执把拔换掉控插摄整斯新明是景暗更最會有未本模機檯櫃欄次正氏水沒没洗活派温測源溫漏潮激濕灯為無煙照熱燈燥物狀玄现現瓦用發的盞目着睡私空窗立笛管節簾籬紅線红罐置聚聲脚腦腳臥色节著行衣解設調請謝警设调走路車车运連遊運過道邊部都量鎖锁門閂閉開關门闭除隱離電震霧面音頂題顏颜風风食餅餵가간감갔강개거게겨결경고공과관그금급기길깥꺼껐꼽나난내네놀누는능니다닫담대더데도동됐되된됨둡드든등디때떤뜨라래러렇렌려로료른를리림링마많명몇모무문물뭐바밝방배변보부불블빨뽑사산상색서설성세센션소쇼수스습시신실싱아안않알았애야어얼업없었에여연열옆오온완외왼요운움워원위으은을음의이인일임입있작잠장재전절정제져조족종주줄중줘지직진짐쪽차창천최추출충치침커컴켜켰쿠크키탁탄태탬터텔통트튼티파팬퍼폰표퓨플핑한함해했행혀현화활후휴힘，？"

  # Add support for non-unicode characters by using better glyphset
  font_glyphsets: "GF_Latin_Core"
  # for Greek use "Noto Sans" for other languages use a compatible font family
  font_family: Figtree

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  min_version: 2025.5.0
  name_add_mac_suffix: true
  on_boot:
    priority: 600
    then:
      - script.execute: draw_display
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
            - script.execute: draw_display

esp32:
  board: esp32s3box
  flash_size: 16MB
  cpu_frequency: 240MHz
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"

psram:
  mode: octal
  speed: 80MHz

api:
  on_client_connected:
    - script.execute: draw_display
  on_client_disconnected:
    - script.execute: draw_display

ota:
  - platform: esphome
    id: ota_esphome

logger:
  hardware_uart: USB_SERIAL_JTAG

wifi:
  ssid: $wifi
  password: $wifi_pass
  ap:
    ssid: "ESPHome Voice Fallback Hotspot"
    password: "tMUPeiAXc2G3"
  on_connect:
    - script.execute: draw_display
  on_disconnect:
    - script.execute: draw_display

captive_portal:

button:
  - platform: factory_reset
    id: factory_reset_btn
    internal: true

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO0
      inverted: true
    id: left_top_button
    internal: true
    on_multi_click:
      - timing:
          - ON for at least 50ms
          - OFF for at least 50ms
        then:
          - switch.turn_off: timer_ringing
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

output:
  - platform: ledc
    pin: GPIO3
    id: backlight_output

light:
  - platform: monochromatic
    id: led
    name: Screen
    icon: "mdi:television"
    entity_category: config
    output: backlight_output
    restore_mode: RESTORE_DEFAULT_ON
    default_transition_length: 250ms
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: GPIO48
    num_leds: 1
    chipset: WS2812
    name: "RGB LED"

i2s_audio:
  - id: i2s_mic # For microphone
    i2s_lrclk_pin: GPIO4 #WS 
    i2s_bclk_pin: GPIO5  #SCK
  - id: i2s_audio_bus # For speaker
    i2s_lrclk_pin: GPIO16
    i2s_bclk_pin: GPIO15
  
microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_mic
    adc_type: external
    i2s_din_pin: GPIO6 #SD
    id: box_mic
    channel: stereo
    pdm: false
    bits_per_sample: 16bit

speaker:
  - platform: i2s_audio
    id: box_speaker
    i2s_audio_id: i2s_audio_bus
    dac_type: external
    i2s_dout_pin:   
      number: GPIO7 # DIN Pin of the MAX98357A Audio Amplifier
    channel: stereo
    buffer_duration: 100ms

media_player:
  - platform: speaker
    name: None
    id: speaker_media_player
    volume_min: 0.5
    volume_max: 0.8
    announcement_pipeline:
      speaker: box_speaker
      format: WAV
      sample_rate: 48000
      num_channels: 1  # S3 Box only has one output channel
    files:
      - id: timer_finished_sound
        file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
    on_announcement:
      # Stop the wake word (mWW or VA) if the mic is capturing
      - if:
          condition:
            - microphone.is_capturing:
          then:
            - script.execute: stop_wake_word
            # Ensure VA stops before moving on
            - if:
                condition:
                  - lambda: return id(wake_word_engine_location).state == "In Home Assistant";
                then:
                  - wait_until:
                      - not:
                          voice_assistant.is_running:
      # Since VA isn't running, this is user-intiated media playback. Draw the mute display
      - if:
          condition:
            not:
              voice_assistant.is_running:
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: draw_display
    on_idle:
      # Since VA isn't running, this is the end of user-intiated media playback. Restart the wake word.
      - if:
          condition:
            not:
              voice_assistant.is_running:
          then:
            - script.execute: start_wake_word
            - script.execute: set_idle_or_mute_phase
            - script.execute: draw_display

micro_wake_word:
  id: mww
  models:
    - okay_nabu
    - hey_mycroft
    - hey_jarvis
  on_wake_word_detected:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;

voice_assistant:
  id: va
  microphone: box_mic
  media_player: speaker_media_player
  micro_wake_word: mww
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - text_sensor.template.publish:
        id: text_request
        state: "..."
    - text_sensor.template.publish:
        id: text_response
        state: "..."
    - script.execute: draw_display
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: draw_display
  on_stt_end:
    - text_sensor.template.publish:
        id: text_request
        state: !lambda return x;
    - script.execute: draw_display
  on_tts_start:
    - text_sensor.template.publish:
        id: text_response
        state: !lambda return x;
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: draw_display
  on_end:
    # Wait a short amount of time to see if an announcement starts
    - wait_until:
        condition:
          - media_player.is_announcing:
        timeout: 0.5s
    # Announcement is finished and the I2S bus is free
    - wait_until:
        - and:
            - not:
                media_player.is_announcing:
            - not:
                speaker.is_playing:
    # Restart only mWW if enabled; streaming wake words automatically restart
    - if:
        condition:
          - lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - lambda: id(va).set_use_wake_word(false);
          - micro_wake_word.start:
    - script.execute: set_idle_or_mute_phase
    - script.execute: draw_display
    # Clear text sensors
    - text_sensor.template.publish:
        id: text_request
        state: ""
    - text_sensor.template.publish:
        id: text_response
        state: ""
  on_error:
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: draw_display
          - delay: 1s
          - if:
              condition:
                switch.is_off: mute
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: draw_display
  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - script.execute: start_wake_word
    - script.execute: set_idle_or_mute_phase
    - script.execute: draw_display
  on_client_disconnected:
    - script.execute: stop_wake_word
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: draw_display
  on_timer_started:
    - script.execute: draw_display
  on_timer_cancelled:
    - script.execute: draw_display
  on_timer_updated:
    - script.execute: draw_display
  on_timer_tick:
    - script.execute: draw_display
  on_timer_finished:
    - switch.turn_on: timer_ringing
    - wait_until:
        media_player.is_announcing:
    - lambda: id(voice_assistant_phase) = ${voice_assist_timer_finished_phase_id};
    - script.execute: draw_display

script:
  - id: draw_display
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                  wifi.connected:
                then:
                  - if:
                      condition:
                        api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(s3_box_lcd).show_page(listening_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(s3_box_lcd).show_page(thinking_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(s3_box_lcd).show_page(replying_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(s3_box_lcd).show_page(error_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(s3_box_lcd).show_page(muted_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(s3_box_lcd).show_page(no_ha_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_timer_finished_phase_id}:
                                id(s3_box_lcd).show_page(timer_finished_page);
                                id(s3_box_lcd).update();
                                break;
                              default:
                                id(s3_box_lcd).show_page(idle_page);
                                id(s3_box_lcd).update();
                            }
                      else:
                        - display.page.show: no_ha_page
                        - component.update: s3_box_lcd
                else:
                  - display.page.show: no_wifi_page
                  - component.update: s3_box_lcd
          else:
            - display.page.show: initializing_page
            - component.update: s3_box_lcd

  - id: fetch_first_active_timer
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          auto output_timer = timers.begin()->second;
          for (auto &iterable_timer : timers) {
            if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
              output_timer = iterable_timer.second;
            }
          }
          id(global_first_active_timer) = output_timer;
  - id: check_if_timers_active
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          bool output = false;
          if (timers.size() > 0) {
            for (auto &iterable_timer : timers) {
              if(iterable_timer.second.is_active) {
                output = true;
              }
            }
          }
          id(global_is_timer_active) = output;
  - id: fetch_first_timer
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          auto output_timer = timers.begin()->second;
          for (auto &iterable_timer : timers) {
            if (iterable_timer.second.seconds_left <= output_timer.seconds_left) {
              output_timer = iterable_timer.second;
            }
          }
          id(global_first_timer) = output_timer;
  - id: check_if_timers
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          bool output = false;
          if (timers.size() > 0) {
            output = true;
          }
          id(global_is_timer) = output;

  - id: draw_timer_timeline
    then:
      - lambda: |
          id(check_if_timers_active).execute();
          id(check_if_timers).execute();
          if (id(global_is_timer_active)){
            id(fetch_first_active_timer).execute();
            int active_pixels = round( 320 * id(global_first_active_timer).seconds_left / max(id(global_first_active_timer).total_seconds , static_cast<uint32_t>(1)) );
            if (active_pixels > 0){
              id(s3_box_lcd).filled_rectangle(0 , 225 , 320 , 15 , Color::WHITE );
              id(s3_box_lcd).filled_rectangle(0 , 226 , active_pixels , 13 , id(active_timer_color) );
            }
          } else if (id(global_is_timer)){
            id(fetch_first_timer).execute();
            int active_pixels = round( 320 * id(global_first_timer).seconds_left / max(id(global_first_timer).total_seconds , static_cast<uint32_t>(1)));
            if (active_pixels > 0){
              id(s3_box_lcd).filled_rectangle(0 , 225 , 320 , 15 , Color::WHITE );
              id(s3_box_lcd).filled_rectangle(0 , 226 , active_pixels , 13 , id(paused_timer_color) );
            }
          }
  - id: draw_active_timer_widget
    then:
      - lambda: |
          id(check_if_timers_active).execute();
          if (id(global_is_timer_active)){
            id(s3_box_lcd).filled_rectangle(80 , 40 , 160 , 50 , Color::WHITE );
            id(s3_box_lcd).rectangle(80 , 40 , 160 , 50 , Color::BLACK );

            id(fetch_first_active_timer).execute();
            int hours_left = floor(id(global_first_active_timer).seconds_left / 3600);
            int minutes_left = floor((id(global_first_active_timer).seconds_left - hours_left * 3600) / 60);
            int seconds_left = id(global_first_active_timer).seconds_left - hours_left * 3600 - minutes_left * 60 ;
            auto display_hours = (hours_left < 10 ? "0" : "") + std::to_string(hours_left);
            auto display_minute = (minutes_left < 10 ? "0" : "") + std::to_string(minutes_left);
            auto display_seconds = (seconds_left  < 10 ? "0" : "") + std::to_string(seconds_left) ;

            std::string display_string = "";
            if (hours_left > 0) {
              display_string = display_hours + ":" + display_minute;
            } else {
              display_string = display_minute + ":" + display_seconds;
            }
            id(s3_box_lcd).printf(120, 47, id(font_timer), Color::BLACK, "%s", display_string.c_str());
          }
  # Starts either mWW or the streaming wake word, depending on the configured location
  - id: start_wake_word
    then:
      - if:
          condition:
            and:
              - not:
                  - voice_assistant.is_running:
              - lambda: return id(wake_word_engine_location).state == "On device";
          then:
            - lambda: id(va).set_use_wake_word(false);
            - micro_wake_word.start:
      - if:
          condition:
            and:
              - not:
                  - voice_assistant.is_running:
              - lambda: return id(wake_word_engine_location).state == "In Home Assistant";
          then:
            - lambda: id(va).set_use_wake_word(true);
            - voice_assistant.start_continuous:
  # Stops either mWW or the streaming wake word, depending on the configured location
  - id: stop_wake_word
    then:
      - if:
          condition:
            lambda: return id(wake_word_engine_location).state == "In Home Assistant";
          then:
            - lambda: id(va).set_use_wake_word(false);
            - voice_assistant.stop:
      - if:
          condition:
            lambda: return id(wake_word_engine_location).state == "On device";
          then:
            - micro_wake_word.stop:
  # Set the voice assistant phase to idle or muted, depending on if the software mute switch is activated
  - id: set_idle_or_mute_phase
    then:
      - if:
          condition:
            switch.is_off: mute
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          else:
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

switch:
  - platform: gpio
    name: Speaker Enable
    pin: GPIO46
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    disabled_by_default: true
  - platform: template
    name: Mute
    id: mute
    icon: "mdi:microphone-off"
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_off:
      - microphone.unmute:
      - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
      - script.execute: draw_display
    on_turn_on:
      - microphone.mute:
      - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: draw_display
  - platform: template
    id: timer_ringing
    optimistic: true
    internal: true
    restore_mode: ALWAYS_OFF
    on_turn_off:
      # Turn off the repeat mode and disable the pause between playlist items
      - lambda: |-
              id(speaker_media_player)
                ->make_call()
                .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_OFF)
                .set_announcement(true)
                .perform();
              id(speaker_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 0);
      # Stop playing the alarm
      - media_player.stop:
          announcement: true
    on_turn_on:
      # Turn on the repeat mode and pause for 1000 ms between playlist items/repeats
      - lambda: |-
            id(speaker_media_player)
              ->make_call()
              .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_ONE)
              .set_announcement(true)
              .perform();
            id(speaker_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 1000);
      - media_player.speaker.play_on_device_media_file:
          media_file: timer_finished_sound
          announcement: true
      - delay: 15min
      - switch.turn_off: timer_ringing

select:
  - platform: template
    entity_category: config
    name: Wake word engine location
    id: wake_word_engine_location
    icon: "mdi:account-voice"
    optimistic: true
    restore_value: true
    options:
      - In Home Assistant
      - On device
    initial_option: On device
    on_value:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - wait_until:
                lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id} || id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
            - if:
                condition:
                  lambda: return x == "In Home Assistant";
                then:
                  - micro_wake_word.stop
                  - delay: 500ms
                  - if:
                      condition:
                        switch.is_off: mute
                      then:
                        - lambda: id(va).set_use_wake_word(true);
                        - voice_assistant.start_continuous:
            - if:
                condition:
                  lambda: return x == "On device";
                then:
                  - lambda: id(va).set_use_wake_word(false);
                  - voice_assistant.stop
                  - delay: 500ms
                  - if:
                      condition:
                        switch.is_off: mute
                      then:
                        - micro_wake_word.start

globals:
  - id: init_in_progress
    type: bool
    restore_value: false
    initial_value: "true"
  - id: voice_assistant_phase
    type: int
    restore_value: false
    initial_value: ${voice_assist_not_ready_phase_id}
  - id: global_first_active_timer
    type: voice_assistant::Timer
    restore_value: false
  - id: global_is_timer_active
    type: bool
    restore_value: false
  - id: global_first_timer
    type: voice_assistant::Timer
    restore_value: false
  - id: global_is_timer
    type: bool
    restore_value: false

image:
  - file: ${error_illustration_file}
    id: casita_error
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: ${idle_illustration_file}
    id: casita_idle
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: ${listening_illustration_file}
    id: casita_listening
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: ${thinking_illustration_file}
    id: casita_thinking
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: ${replying_illustration_file}
    id: casita_replying
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: ${timer_finished_illustration_file}
    id: casita_timer_finished
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: ${loading_illustration_file}
    id: casita_initializing
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-wifi.png
    id: error_no_wifi
    resize: 240x240
    type: RGB
    transparency: alpha_channel
  - file: https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-ha.png
    id: error_no_ha
    resize: 240x240
    type: RGB
    transparency: alpha_channel

font:
  - file:
      type: gfonts
      family: ${font_family}
      weight: 300
      italic: true
    id: font_request
    size: 15
    glyphsets:
      - ${font_glyphsets}
  - file:
      type: gfonts
      family: ${font_family}
      weight: 300
    id: font_response
    size: 15
    glyphsets:
      - ${font_glyphsets}
  - file:
      type: gfonts
      family: ${font_family}
      weight: 300
    id: font_timer
    size: 30
    glyphsets:
      - ${font_glyphsets}

text_sensor:
  - id: text_request
    platform: template
    on_value:
      lambda: |-
        if(id(text_request).state.length()>32) {
          std::string name = id(text_request).state.c_str();
          std::string truncated = esphome::str_truncate(name.c_str(),31);
          id(text_request).state = (truncated+"...").c_str();
        }

  - id: text_response
    platform: template
    on_value:
      lambda: |-
        if(id(text_response).state.length()>32) {
          std::string name = id(text_response).state.c_str();
          std::string truncated = esphome::str_truncate(name.c_str(),31);
          id(text_response).state = (truncated+"...").c_str();
        }

color:
  - id: idle_color
    hex: ${idle_illustration_background_color}
  - id: listening_color
    hex: ${listening_illustration_background_color}
  - id: thinking_color
    hex: ${thinking_illustration_background_color}
  - id: replying_color
    hex: ${replying_illustration_background_color}
  - id: loading_color
    hex: ${loading_illustration_background_color}
  - id: error_color
    hex: ${error_illustration_background_color}
  - id: active_timer_color
    hex: "26ed3a"
  - id: paused_timer_color
    hex: "3b89e3"

spi:
  - id: spi_bus
    clk_pin: 14
    mosi_pin: 17

display:
  - platform: ili9xxx
    id: s3_box_lcd
    model: GC9A01A
    invert_colors: true
    data_rate: 40MHz
    cs_pin: 13
    dc_pin: 10
    reset_pin:
      number: 18
    update_interval: never
    dimensions:
        height: 240
        width: 240
    rotation: 0
    pages:
      - id: idle_page
        lambda: |-
          it.fill(id(idle_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_idle), ImageAlign::CENTER);
          id(draw_timer_timeline).execute();
          id(draw_active_timer_widget).execute();
      - id: listening_page
        lambda: |-
          it.fill(id(listening_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_listening), ImageAlign::CENTER);
          id(draw_timer_timeline).execute();
      - id: thinking_page
        lambda: |-
          it.fill(id(thinking_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_thinking), ImageAlign::CENTER);
          it.filled_rectangle(20 , 20 , 280 , 30 , Color::WHITE );
          it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
          it.printf(30, 25, id(font_request), Color::BLACK, "%s", id(text_request).state.c_str());
          id(draw_timer_timeline).execute();
      - id: replying_page
        lambda: |-
          it.fill(id(replying_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_replying), ImageAlign::CENTER);
          it.filled_rectangle(20 , 20 , 280 , 30 , Color::WHITE );
          it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
          it.filled_rectangle(20 , 190 , 280 , 30 , Color::WHITE );
          it.rectangle(20 , 190 , 280 , 30 , Color::BLACK );
          it.printf(30, 25, id(font_request), Color::BLACK, "%s", id(text_request).state.c_str());
          it.printf(30, 195, id(font_response), Color::BLACK, "%s", id(text_response).state.c_str());
          id(draw_timer_timeline).execute();
      - id: timer_finished_page
        lambda: |-
          it.fill(id(idle_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_timer_finished), ImageAlign::CENTER);
      - id: error_page
        lambda: |-
          it.fill(id(error_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_error), ImageAlign::CENTER);
      - id: no_ha_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_ha), ImageAlign::CENTER);
      - id: no_wifi_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_wifi), ImageAlign::CENTER);
      - id: initializing_page
        lambda: |-
          it.fill(id(loading_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_initializing), ImageAlign::CENTER);
      - id: muted_page
        lambda: |-
          it.fill(Color::BLACK);
          id(draw_timer_timeline).execute();
          id(draw_active_timer_widget).execute();

Neil_Brownlee · June 5, 2025, 3:02pm

Code works well on my 5-EN too. Thanks.

bjm · June 5, 2025, 7:13pm

Yup works well on my V5-EN,
Only issue really, but not is it tskes what seems like an age to do a voice reply, 5-8 seconds, anyone else experiencing this?
Also media playback is crazy patchy, can do away with that but would be nice if i wanted news or weathrr, etc

AlexDixon · June 5, 2025, 9:08pm

That’s expected - the wake word processes quickly, but all the voice → text → voice handling is done on the server. It takes no longer for me than the in-browser voice assistant.

For stuttering, you could try adding some buffer parameters to the media_player config. I’ve only had occasional stutters with the voice, and there’s no way I’d use the speaker as a media player.

buffer_size: 2000000
buffer_duration: 1000ms

bjm · June 5, 2025, 9:46pm

thanks, i’ll give that a try
and for my next trick, I’m going add the clock face as a screensaver type thing

Neil_Brownlee · June 6, 2025, 9:38am

90% of the time mine is rapid, even for long responses. Only now and again is it slow, which I would blame on my use of ElevenLabs and Gemini

bjm · June 6, 2025, 11:17am

OK so I have added the clock with weather for it, take a look here
you’ll need to copy the weather folder as well , the location you will fid in the yaml. Still a bit of tweaking to go

jcr1 · June 6, 2025, 2:21pm

Thank for this code
But I have an error compiling with image:

Failed config

image: [source /config/esphome/boule-assit.yaml:646]

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/error_320_240.png
id: casita_error
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/idle_320_240.png
id: casita_idle
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/listening_320_240.png
id: casita_listening
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/thinking_320_240.png
id: casita_thinking
resize: 240x240
type: RGB
transparency: alpha_channel

File can’t be opened as image: /data/image/7b836626.

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/replying_320_240.png
id: casita_replying
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/timer_finished_320_240.png
id: casita_timer_finished
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/loading_320_240.png
id: casita_initializing
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-wifi.png
id: error_no_wifi
resize: 240x240
type: RGB
transparency: alpha_channel

file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-ha.png
id: error_no_ha
resize: 240x240
type: RGB
transparency: alpha_channel

What’s wrong ?

TickTack · June 6, 2025, 6:11pm

Sorry for asking a maybe obvious question, but where exactly should I put the buffer settings?
Directly in the media_player: section or below speaker, or announcement_pipeline?

Thanks in advance

jcr1 · June 6, 2025, 6:18pm

Okay, it works fine.
I just reinstalled and this time I didn’t get any errors.
I don’t know why it didn’t work the first time.

bjm · June 6, 2025, 11:30pm

Looks like the “casita” was cached, i changed it to my own git for the pics?

bjm · June 6, 2025, 11:31pm

Cant remember off the top of my head, look at my git in the mini clock yaml, thatll tell you

bjm · June 6, 2025, 11:44pm

Also, it woks ok for text to voice and short sound files , anything too long and streaming is still unusable will look at that later some time

rediculum · June 7, 2025, 1:31pm

I am streaming radio through the xiaozhi since minutes with your mini-clock code, no issues

bjm · June 7, 2025, 8:30pm

Hmmm maybe its me, also i am in the process of tidying it up, someone graciously pointed out to me in another forum thats its crap

Teiruzu · June 7, 2025, 9:29pm

I uploaded the code and everything apparently works except the microphone, with the Xiaozhi firmware it could hear me, but with the EspHome code it didn’t work, it doesn’t do anything, I’ve already tried all the activation words

bjm · June 7, 2025, 10:06pm

What version of ball are you ysing, there are 3 different verdiobs eith diffeent GPIO pins, mine us for 2.

movecall-moji-esp32s3
Spotpear astronaut ball v1 (discussed in this thread)
Spotpear ESP32-S3-1.28-BOX (v2 with touch, sd card, and battery)

Also my cide us messy as and very ChatGPT cumbersome but it dies work,
Check this thread out

Teiruzu · June 7, 2025, 10:30pm

The sticker says V5-EN, The PCB says Spotpear Esp32-AI-1.28 V1.0, but it doesn’t have a battery and I don’t think it has a touch screen, since it didn’t do anything with the original firmware either.