could you share your working code?
hi. try to change channel to “stereo” instead of “mono”. I’ve got it working this way.
I think I’ve got a similar config. Mine works - wake word, listening, speaker, and PWM control of the display. The WS2812 works as a single RGB LED, and adding some functions to indicate listening status, especially with the display off, is on my to-do list. I also didn’t bother reconfiguring the buttons as they’re annoying to access anyway, though reset works as intended.
I was a bit lazy with the images - the config is only slightly modified from the S3-Box, and I kept the 320x240 images, setting the config to scale them. I didn’t see any difference between the resized images and these once I set the background colour of the four “active” functions to off-white (#F2F4F9).
The text transcription of both my request and the assistant’s response are cut off due to the round screen.
The sticker on my device says “V5-EN”.
---
substitutions:
name: voice-assistant
friendly_name: Voice assistant
loading_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/loading_320_240.png
idle_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/idle_320_240.png
listening_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/listening_320_240.png
thinking_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/thinking_320_240.png
replying_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/replying_320_240.png
error_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/error_320_240.png
timer_finished_illustration_file: https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/timer_finished_320_240.png
wifi: !secret wifi_ssid
wifi_pass: !secret wifi_password
ota_pass: !secret ota_pass
loading_illustration_background_color: "000000"
idle_illustration_background_color: "000000"
listening_illustration_background_color: "F2F4F9"
thinking_illustration_background_color: "F2F4F9"
replying_illustration_background_color: "F2F4F9"
error_illustration_background_color: "000000"
timer_finished_background_color: "F2F4F9"
voice_assist_idle_phase_id: "1"
voice_assist_listening_phase_id: "2"
voice_assist_thinking_phase_id: "3"
voice_assist_replying_phase_id: "4"
voice_assist_not_ready_phase_id: "10"
voice_assist_error_phase_id: "11"
voice_assist_muted_phase_id: "12"
voice_assist_timer_finished_phase_id: "20"
# These unique characters have been extracted from every test file of every language available on https://github.com/home-assistant/intents (14 March 2024)
# However, the Figtree font only contains Latin characters, so there is no point using this... unlessyou change the font configuration accordingly.
allowed_characters: " !#%'()+,-./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWYZ[]_abcdefghijklmnopqrstuvwxyz{|}°²³µ¿ÁÂÄÅÉÖÚßàáâãäåæçèéêëìíîðñòóôõöøùúûüýþāăąćčďĐđēėęěğĮįıļľŁłńňőřśšťũūůűųźŻżŽžơưșțΆΈΌΐΑΒΓΔΕΖΗΘΚΜΝΠΡΣΤΥΦάέήίαβγδεζηθικλμνξοπρςστυφχψωϊόύώАБВГДЕЖЗИКЛМНОПРСТУХЦЧШЪЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћאבגדהוזחטיכלםמןנסעפץצקרשת،ءآأإئابةتجحخدذرزسشصضطظعغفقكلمنهوىيٹپچڈکگںھہیےংকচতধনফবযরলশষস়ািু্చయలిెొ్ംഅആഇഈഉഎഓകഗങചജഞടഡണതദധനപഫബഭമയരറലളവശസഹാിീുൂെേൈ്ൺൻർൽൾაბგდევზთილმნოპრსტუფქყშჩცძჭხạảấầẩậắặẹẽếềểệỉịọỏốồổỗộớờởợụủứừửữựỳ—、一上不个中为主乾了些亮人任低佔何作供依侧係個側偵充光入全关冇冷几切到制前動區卧厅厨及口另右吊后吗启吸呀咗哪唔問啟嗎嘅嘛器圍在场執場外多大始安定客室家密寵对將小少左已帘常幫幾库度庫廊廚廳开式後恆感態成我戲戶户房所扇手打执把拔换掉控插摄整斯新明是景暗更最會有未本模機檯櫃欄次正氏水沒没洗活派温測源溫漏潮激濕灯為無煙照熱燈燥物狀玄现現瓦用發的盞目着睡私空窗立笛管節簾籬紅線红罐置聚聲脚腦腳臥色节著行衣解設調請謝警设调走路車车运連遊運過道邊部都量鎖锁門閂閉開關门闭除隱離電震霧面音頂題顏颜風风食餅餵가간감갔강개거게겨결경고공과관그금급기길깥꺼껐꼽나난내네놀누는능니다닫담대더데도동됐되된됨둡드든등디때떤뜨라래러렇렌려로료른를리림링마많명몇모무문물뭐바밝방배변보부불블빨뽑사산상색서설성세센션소쇼수스습시신실싱아안않알았애야어얼업없었에여연열옆오온완외왼요운움워원위으은을음의이인일임입있작잠장재전절정제져조족종주줄중줘지직진짐쪽차창천최추출충치침커컴켜켰쿠크키탁탄태탬터텔통트튼티파팬퍼폰표퓨플핑한함해했행혀현화활후휴힘,?"
# Add support for non-unicode characters by using better glyphset
font_glyphsets: "GF_Latin_Core"
# for Greek use "Noto Sans" for other languages use a compatible font family
font_family: Figtree
esphome:
name: ${name}
friendly_name: ${friendly_name}
min_version: 2025.5.0
name_add_mac_suffix: true
on_boot:
priority: 600
then:
- script.execute: draw_display
- delay: 30s
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
- script.execute: draw_display
esp32:
board: esp32s3box
flash_size: 16MB
cpu_frequency: 240MHz
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
psram:
mode: octal
speed: 80MHz
api:
on_client_connected:
- script.execute: draw_display
on_client_disconnected:
- script.execute: draw_display
ota:
- platform: esphome
id: ota_esphome
logger:
hardware_uart: USB_SERIAL_JTAG
wifi:
ssid: $wifi
password: $wifi_pass
ap:
ssid: "ESPHome Voice Fallback Hotspot"
password: "tMUPeiAXc2G3"
on_connect:
- script.execute: draw_display
on_disconnect:
- script.execute: draw_display
captive_portal:
button:
- platform: factory_reset
id: factory_reset_btn
internal: true
binary_sensor:
- platform: gpio
pin:
number: GPIO0
inverted: true
id: left_top_button
internal: true
on_multi_click:
- timing:
- ON for at least 50ms
- OFF for at least 50ms
then:
- switch.turn_off: timer_ringing
- timing:
- ON for at least 10s
then:
- button.press: factory_reset_btn
output:
- platform: ledc
pin: GPIO3
id: backlight_output
light:
- platform: monochromatic
id: led
name: Screen
icon: "mdi:television"
entity_category: config
output: backlight_output
restore_mode: RESTORE_DEFAULT_ON
default_transition_length: 250ms
- platform: esp32_rmt_led_strip
rgb_order: GRB
pin: GPIO48
num_leds: 1
chipset: WS2812
name: "RGB LED"
i2s_audio:
- id: i2s_mic # For microphone
i2s_lrclk_pin: GPIO4 #WS
i2s_bclk_pin: GPIO5 #SCK
- id: i2s_audio_bus # For speaker
i2s_lrclk_pin: GPIO16
i2s_bclk_pin: GPIO15
microphone:
- platform: i2s_audio
i2s_audio_id: i2s_mic
adc_type: external
i2s_din_pin: GPIO6 #SD
id: box_mic
channel: stereo
pdm: false
bits_per_sample: 16bit
speaker:
- platform: i2s_audio
id: box_speaker
i2s_audio_id: i2s_audio_bus
dac_type: external
i2s_dout_pin:
number: GPIO7 # DIN Pin of the MAX98357A Audio Amplifier
channel: stereo
buffer_duration: 100ms
media_player:
- platform: speaker
name: None
id: speaker_media_player
volume_min: 0.5
volume_max: 0.8
announcement_pipeline:
speaker: box_speaker
format: WAV
sample_rate: 48000
num_channels: 1 # S3 Box only has one output channel
files:
- id: timer_finished_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
on_announcement:
# Stop the wake word (mWW or VA) if the mic is capturing
- if:
condition:
- microphone.is_capturing:
then:
- script.execute: stop_wake_word
# Ensure VA stops before moving on
- if:
condition:
- lambda: return id(wake_word_engine_location).state == "In Home Assistant";
then:
- wait_until:
- not:
voice_assistant.is_running:
# Since VA isn't running, this is user-intiated media playback. Draw the mute display
- if:
condition:
not:
voice_assistant.is_running:
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: draw_display
on_idle:
# Since VA isn't running, this is the end of user-intiated media playback. Restart the wake word.
- if:
condition:
not:
voice_assistant.is_running:
then:
- script.execute: start_wake_word
- script.execute: set_idle_or_mute_phase
- script.execute: draw_display
micro_wake_word:
id: mww
models:
- okay_nabu
- hey_mycroft
- hey_jarvis
on_wake_word_detected:
- voice_assistant.start:
wake_word: !lambda return wake_word;
voice_assistant:
id: va
microphone: box_mic
media_player: speaker_media_player
micro_wake_word: mww
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
on_listening:
- lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- text_sensor.template.publish:
id: text_request
state: "..."
- text_sensor.template.publish:
id: text_response
state: "..."
- script.execute: draw_display
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: draw_display
on_stt_end:
- text_sensor.template.publish:
id: text_request
state: !lambda return x;
- script.execute: draw_display
on_tts_start:
- text_sensor.template.publish:
id: text_response
state: !lambda return x;
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: draw_display
on_end:
# Wait a short amount of time to see if an announcement starts
- wait_until:
condition:
- media_player.is_announcing:
timeout: 0.5s
# Announcement is finished and the I2S bus is free
- wait_until:
- and:
- not:
media_player.is_announcing:
- not:
speaker.is_playing:
# Restart only mWW if enabled; streaming wake words automatically restart
- if:
condition:
- lambda: return id(wake_word_engine_location).state == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- micro_wake_word.start:
- script.execute: set_idle_or_mute_phase
- script.execute: draw_display
# Clear text sensors
- text_sensor.template.publish:
id: text_request
state: ""
- text_sensor.template.publish:
id: text_response
state: ""
on_error:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- script.execute: draw_display
- delay: 1s
- if:
condition:
switch.is_off: mute
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: draw_display
on_client_connected:
- lambda: id(init_in_progress) = false;
- script.execute: start_wake_word
- script.execute: set_idle_or_mute_phase
- script.execute: draw_display
on_client_disconnected:
- script.execute: stop_wake_word
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
- script.execute: draw_display
on_timer_started:
- script.execute: draw_display
on_timer_cancelled:
- script.execute: draw_display
on_timer_updated:
- script.execute: draw_display
on_timer_tick:
- script.execute: draw_display
on_timer_finished:
- switch.turn_on: timer_ringing
- wait_until:
media_player.is_announcing:
- lambda: id(voice_assistant_phase) = ${voice_assist_timer_finished_phase_id};
- script.execute: draw_display
script:
- id: draw_display
then:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- if:
condition:
wifi.connected:
then:
- if:
condition:
api.connected:
then:
- lambda: |
switch(id(voice_assistant_phase)) {
case ${voice_assist_listening_phase_id}:
id(s3_box_lcd).show_page(listening_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_thinking_phase_id}:
id(s3_box_lcd).show_page(thinking_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_replying_phase_id}:
id(s3_box_lcd).show_page(replying_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_error_phase_id}:
id(s3_box_lcd).show_page(error_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_muted_phase_id}:
id(s3_box_lcd).show_page(muted_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_not_ready_phase_id}:
id(s3_box_lcd).show_page(no_ha_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_timer_finished_phase_id}:
id(s3_box_lcd).show_page(timer_finished_page);
id(s3_box_lcd).update();
break;
default:
id(s3_box_lcd).show_page(idle_page);
id(s3_box_lcd).update();
}
else:
- display.page.show: no_ha_page
- component.update: s3_box_lcd
else:
- display.page.show: no_wifi_page
- component.update: s3_box_lcd
else:
- display.page.show: initializing_page
- component.update: s3_box_lcd
- id: fetch_first_active_timer
then:
- lambda: |
const auto timers = id(va).get_timers();
auto output_timer = timers.begin()->second;
for (auto &iterable_timer : timers) {
if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
output_timer = iterable_timer.second;
}
}
id(global_first_active_timer) = output_timer;
- id: check_if_timers_active
then:
- lambda: |
const auto timers = id(va).get_timers();
bool output = false;
if (timers.size() > 0) {
for (auto &iterable_timer : timers) {
if(iterable_timer.second.is_active) {
output = true;
}
}
}
id(global_is_timer_active) = output;
- id: fetch_first_timer
then:
- lambda: |
const auto timers = id(va).get_timers();
auto output_timer = timers.begin()->second;
for (auto &iterable_timer : timers) {
if (iterable_timer.second.seconds_left <= output_timer.seconds_left) {
output_timer = iterable_timer.second;
}
}
id(global_first_timer) = output_timer;
- id: check_if_timers
then:
- lambda: |
const auto timers = id(va).get_timers();
bool output = false;
if (timers.size() > 0) {
output = true;
}
id(global_is_timer) = output;
- id: draw_timer_timeline
then:
- lambda: |
id(check_if_timers_active).execute();
id(check_if_timers).execute();
if (id(global_is_timer_active)){
id(fetch_first_active_timer).execute();
int active_pixels = round( 320 * id(global_first_active_timer).seconds_left / max(id(global_first_active_timer).total_seconds , static_cast<uint32_t>(1)) );
if (active_pixels > 0){
id(s3_box_lcd).filled_rectangle(0 , 225 , 320 , 15 , Color::WHITE );
id(s3_box_lcd).filled_rectangle(0 , 226 , active_pixels , 13 , id(active_timer_color) );
}
} else if (id(global_is_timer)){
id(fetch_first_timer).execute();
int active_pixels = round( 320 * id(global_first_timer).seconds_left / max(id(global_first_timer).total_seconds , static_cast<uint32_t>(1)));
if (active_pixels > 0){
id(s3_box_lcd).filled_rectangle(0 , 225 , 320 , 15 , Color::WHITE );
id(s3_box_lcd).filled_rectangle(0 , 226 , active_pixels , 13 , id(paused_timer_color) );
}
}
- id: draw_active_timer_widget
then:
- lambda: |
id(check_if_timers_active).execute();
if (id(global_is_timer_active)){
id(s3_box_lcd).filled_rectangle(80 , 40 , 160 , 50 , Color::WHITE );
id(s3_box_lcd).rectangle(80 , 40 , 160 , 50 , Color::BLACK );
id(fetch_first_active_timer).execute();
int hours_left = floor(id(global_first_active_timer).seconds_left / 3600);
int minutes_left = floor((id(global_first_active_timer).seconds_left - hours_left * 3600) / 60);
int seconds_left = id(global_first_active_timer).seconds_left - hours_left * 3600 - minutes_left * 60 ;
auto display_hours = (hours_left < 10 ? "0" : "") + std::to_string(hours_left);
auto display_minute = (minutes_left < 10 ? "0" : "") + std::to_string(minutes_left);
auto display_seconds = (seconds_left < 10 ? "0" : "") + std::to_string(seconds_left) ;
std::string display_string = "";
if (hours_left > 0) {
display_string = display_hours + ":" + display_minute;
} else {
display_string = display_minute + ":" + display_seconds;
}
id(s3_box_lcd).printf(120, 47, id(font_timer), Color::BLACK, "%s", display_string.c_str());
}
# Starts either mWW or the streaming wake word, depending on the configured location
- id: start_wake_word
then:
- if:
condition:
and:
- not:
- voice_assistant.is_running:
- lambda: return id(wake_word_engine_location).state == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- micro_wake_word.start:
- if:
condition:
and:
- not:
- voice_assistant.is_running:
- lambda: return id(wake_word_engine_location).state == "In Home Assistant";
then:
- lambda: id(va).set_use_wake_word(true);
- voice_assistant.start_continuous:
# Stops either mWW or the streaming wake word, depending on the configured location
- id: stop_wake_word
then:
- if:
condition:
lambda: return id(wake_word_engine_location).state == "In Home Assistant";
then:
- lambda: id(va).set_use_wake_word(false);
- voice_assistant.stop:
- if:
condition:
lambda: return id(wake_word_engine_location).state == "On device";
then:
- micro_wake_word.stop:
# Set the voice assistant phase to idle or muted, depending on if the software mute switch is activated
- id: set_idle_or_mute_phase
then:
- if:
condition:
switch.is_off: mute
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
switch:
- platform: gpio
name: Speaker Enable
pin: GPIO46
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
disabled_by_default: true
- platform: template
name: Mute
id: mute
icon: "mdi:microphone-off"
optimistic: true
restore_mode: RESTORE_DEFAULT_OFF
entity_category: config
on_turn_off:
- microphone.unmute:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: draw_display
on_turn_on:
- microphone.mute:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: draw_display
- platform: template
id: timer_ringing
optimistic: true
internal: true
restore_mode: ALWAYS_OFF
on_turn_off:
# Turn off the repeat mode and disable the pause between playlist items
- lambda: |-
id(speaker_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_OFF)
.set_announcement(true)
.perform();
id(speaker_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 0);
# Stop playing the alarm
- media_player.stop:
announcement: true
on_turn_on:
# Turn on the repeat mode and pause for 1000 ms between playlist items/repeats
- lambda: |-
id(speaker_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_ONE)
.set_announcement(true)
.perform();
id(speaker_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 1000);
- media_player.speaker.play_on_device_media_file:
media_file: timer_finished_sound
announcement: true
- delay: 15min
- switch.turn_off: timer_ringing
select:
- platform: template
entity_category: config
name: Wake word engine location
id: wake_word_engine_location
icon: "mdi:account-voice"
optimistic: true
restore_value: true
options:
- In Home Assistant
- On device
initial_option: On device
on_value:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- wait_until:
lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id} || id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
- if:
condition:
lambda: return x == "In Home Assistant";
then:
- micro_wake_word.stop
- delay: 500ms
- if:
condition:
switch.is_off: mute
then:
- lambda: id(va).set_use_wake_word(true);
- voice_assistant.start_continuous:
- if:
condition:
lambda: return x == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- voice_assistant.stop
- delay: 500ms
- if:
condition:
switch.is_off: mute
then:
- micro_wake_word.start
globals:
- id: init_in_progress
type: bool
restore_value: false
initial_value: "true"
- id: voice_assistant_phase
type: int
restore_value: false
initial_value: ${voice_assist_not_ready_phase_id}
- id: global_first_active_timer
type: voice_assistant::Timer
restore_value: false
- id: global_is_timer_active
type: bool
restore_value: false
- id: global_first_timer
type: voice_assistant::Timer
restore_value: false
- id: global_is_timer
type: bool
restore_value: false
image:
- file: ${error_illustration_file}
id: casita_error
resize: 240x240
type: RGB
transparency: alpha_channel
- file: ${idle_illustration_file}
id: casita_idle
resize: 240x240
type: RGB
transparency: alpha_channel
- file: ${listening_illustration_file}
id: casita_listening
resize: 240x240
type: RGB
transparency: alpha_channel
- file: ${thinking_illustration_file}
id: casita_thinking
resize: 240x240
type: RGB
transparency: alpha_channel
- file: ${replying_illustration_file}
id: casita_replying
resize: 240x240
type: RGB
transparency: alpha_channel
- file: ${timer_finished_illustration_file}
id: casita_timer_finished
resize: 240x240
type: RGB
transparency: alpha_channel
- file: ${loading_illustration_file}
id: casita_initializing
resize: 240x240
type: RGB
transparency: alpha_channel
- file: https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-wifi.png
id: error_no_wifi
resize: 240x240
type: RGB
transparency: alpha_channel
- file: https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-ha.png
id: error_no_ha
resize: 240x240
type: RGB
transparency: alpha_channel
font:
- file:
type: gfonts
family: ${font_family}
weight: 300
italic: true
id: font_request
size: 15
glyphsets:
- ${font_glyphsets}
- file:
type: gfonts
family: ${font_family}
weight: 300
id: font_response
size: 15
glyphsets:
- ${font_glyphsets}
- file:
type: gfonts
family: ${font_family}
weight: 300
id: font_timer
size: 30
glyphsets:
- ${font_glyphsets}
text_sensor:
- id: text_request
platform: template
on_value:
lambda: |-
if(id(text_request).state.length()>32) {
std::string name = id(text_request).state.c_str();
std::string truncated = esphome::str_truncate(name.c_str(),31);
id(text_request).state = (truncated+"...").c_str();
}
- id: text_response
platform: template
on_value:
lambda: |-
if(id(text_response).state.length()>32) {
std::string name = id(text_response).state.c_str();
std::string truncated = esphome::str_truncate(name.c_str(),31);
id(text_response).state = (truncated+"...").c_str();
}
color:
- id: idle_color
hex: ${idle_illustration_background_color}
- id: listening_color
hex: ${listening_illustration_background_color}
- id: thinking_color
hex: ${thinking_illustration_background_color}
- id: replying_color
hex: ${replying_illustration_background_color}
- id: loading_color
hex: ${loading_illustration_background_color}
- id: error_color
hex: ${error_illustration_background_color}
- id: active_timer_color
hex: "26ed3a"
- id: paused_timer_color
hex: "3b89e3"
spi:
- id: spi_bus
clk_pin: 14
mosi_pin: 17
display:
- platform: ili9xxx
id: s3_box_lcd
model: GC9A01A
invert_colors: true
data_rate: 40MHz
cs_pin: 13
dc_pin: 10
reset_pin:
number: 18
update_interval: never
dimensions:
height: 240
width: 240
rotation: 0
pages:
- id: idle_page
lambda: |-
it.fill(id(idle_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_idle), ImageAlign::CENTER);
id(draw_timer_timeline).execute();
id(draw_active_timer_widget).execute();
- id: listening_page
lambda: |-
it.fill(id(listening_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_listening), ImageAlign::CENTER);
id(draw_timer_timeline).execute();
- id: thinking_page
lambda: |-
it.fill(id(thinking_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_thinking), ImageAlign::CENTER);
it.filled_rectangle(20 , 20 , 280 , 30 , Color::WHITE );
it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
it.printf(30, 25, id(font_request), Color::BLACK, "%s", id(text_request).state.c_str());
id(draw_timer_timeline).execute();
- id: replying_page
lambda: |-
it.fill(id(replying_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_replying), ImageAlign::CENTER);
it.filled_rectangle(20 , 20 , 280 , 30 , Color::WHITE );
it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
it.filled_rectangle(20 , 190 , 280 , 30 , Color::WHITE );
it.rectangle(20 , 190 , 280 , 30 , Color::BLACK );
it.printf(30, 25, id(font_request), Color::BLACK, "%s", id(text_request).state.c_str());
it.printf(30, 195, id(font_response), Color::BLACK, "%s", id(text_response).state.c_str());
id(draw_timer_timeline).execute();
- id: timer_finished_page
lambda: |-
it.fill(id(idle_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_timer_finished), ImageAlign::CENTER);
- id: error_page
lambda: |-
it.fill(id(error_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_error), ImageAlign::CENTER);
- id: no_ha_page
lambda: |-
it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_ha), ImageAlign::CENTER);
- id: no_wifi_page
lambda: |-
it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_wifi), ImageAlign::CENTER);
- id: initializing_page
lambda: |-
it.fill(id(loading_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_initializing), ImageAlign::CENTER);
- id: muted_page
lambda: |-
it.fill(Color::BLACK);
id(draw_timer_timeline).execute();
id(draw_active_timer_widget).execute();
Code works well on my 5-EN too. Thanks.
Yup works well on my V5-EN,
Only issue really, but not is it tskes what seems like an age to do a voice reply, 5-8 seconds, anyone else experiencing this?
Also media playback is crazy patchy, can do away with that but would be nice if i wanted news or weathrr, etc
That’s expected - the wake word processes quickly, but all the voice → text → voice handling is done on the server. It takes no longer for me than the in-browser voice assistant.
For stuttering, you could try adding some buffer parameters to the media_player config. I’ve only had occasional stutters with the voice, and there’s no way I’d use the speaker as a media player.
buffer_size: 2000000
buffer_duration: 1000ms
thanks, i’ll give that a try
and for my next trick, I’m going add the clock face as a screensaver type thing
90% of the time mine is rapid, even for long responses. Only now and again is it slow, which I would blame on my use of ElevenLabs and Gemini
OK so I have added the clock with weather for it, take a look here
you’ll need to copy the weather folder as well , the location you will fid in the yaml. Still a bit of tweaking to go
Thank for this code
But I have an error compiling with image:
Failed config
image: [source /config/esphome/boule-assit.yaml:646]
- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/error_320_240.png
id: casita_error
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/idle_320_240.png
id: casita_idle
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/listening_320_240.png
id: casita_listening
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/thinking_320_240.png
id: casita_thinking
resize: 240x240
type: RGB
transparency: alpha_channelFile can’t be opened as image: /data/image/7b836626.
- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/replying_320_240.png
id: casita_replying
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/timer_finished_320_240.png
id: casita_timer_finished
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/casita/loading_320_240.png
id: casita_initializing
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-wifi.png
id: error_no_wifi
resize: 240x240
type: RGB
transparency: alpha_channel- file: |-
https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-ha.png
id: error_no_ha
resize: 240x240
type: RGB
transparency: alpha_channel
What’s wrong ?
Sorry for asking a maybe obvious question, but where exactly should I put the buffer settings?
Directly in the media_player: section or below speaker, or announcement_pipeline?
Thanks in advance
Okay, it works fine.
I just reinstalled and this time I didn’t get any errors.
I don’t know why it didn’t work the first time.
Looks like the “casita” was cached, i changed it to my own git for the pics?
Cant remember off the top of my head, look at my git in the mini clock yaml, thatll tell you
Also, it woks ok for text to voice and short sound files , anything too long and streaming is still unusable will look at that later some time
I am streaming radio through the xiaozhi since minutes with your mini-clock code, no issues
Hmmm maybe its me, also i am in the process of tidying it up, someone graciously pointed out to me in another forum thats its crap
I uploaded the code and everything apparently works except the microphone, with the Xiaozhi firmware it could hear me, but with the EspHome code it didn’t work, it doesn’t do anything, I’ve already tried all the activation words
What version of ball are you ysing, there are 3 different verdiobs eith diffeent GPIO pins, mine us for 2.
- movecall-moji-esp32s3
- Spotpear astronaut ball v1 (discussed in this thread)
- Spotpear ESP32-S3-1.28-BOX (v2 with touch, sd card, and battery)
Also my cide us messy as and very ChatGPT cumbersome but it dies work,
Check this thread out