And read the docs for tips.
You will see that buffer size is defualts to 1000000, so even if its not written in the config it is still there.
Also the docs explain a bit about the wifi.
And read the docs for tips.
You will see that buffer size is defualts to 1000000, so even if its not written in the config it is still there.
Also the docs explain a bit about the wifi.
Yes, you need a rock solid wifi connection. Atleast 45dB or better. I even added an external antenna to remove every possible stutter
How do I connect ESP32-S3-WROOM with all the other components. Can I follow the diagram with a board ESP32 S3 or do I need to change something because of the Freenove WROOM?
The configuration code from the article will not work for a standard esp32. You should look at the example for Atom Echo. The connection diagram can be almost any, most of the data pins are interchangeable.
If it was me I would not bother connecting the wroom, it will at best be a poorer solution, and you will end up using the n16r8 i the end anyway may as well buy one and start with that.
I agree with that. I have made around 5 speakers and 3 voice pucks by now and the esp32-s3 n16r8 is the prefered choice with the least trouble.
Oh…really. Damn. I thought I did choose better board ![]()
I asked chatgpt to explain what is the difference:
…but obviously there is. What is the difference, where did I go wrong?
The wroom version will do just fine.
Thanks
I thought that I messed that up.
Now I’m returning my original question,. @mchk you mentioned that Atom Echo example would be the one for me.
You mean this one:
wake-word-voice-assistants/m5stack-atom-echo/m5stack-atom-echo.yaml at a342c909639d423c936a6020a43e9c8aca213cc9 · kahrendt/wake-word-voice-assistants · GitHub
Yes, I was talking about that config.
In the original message, you didn’t specify the board type (WROOM is more commonly associated with the original ESP32), but if you have an S3, you can safely ignore it and just use the example from Smart Home Circle.
What instructions there are lots here and plenty more ways to do it.
I notice in post after post that I am asking things with really incomplete information. Sorry for this and I am at the beginning of the so-called journey.
I also referred to this link in the instructions, which I talked about a few posts ago:
So I connect the USB cable that came with the WROOM board to the UART port and the other end to the laptop. In Home Assistant, in ESPHOME builder, I click “New device” → Open ESPHOME web. I click “Connect”, and the device is found in the COM3 port. After this, I click “Prepare for first use”, then press the boot button on the WROOM board, and the installation starts. The percentages progress and go up to 100%. Congratulations on the successful installation. Now that I should have configured the wifi, I guess that’s where it ends and I get that error message.
Why would you follow outdated tutorials while you also could get your up to date info from the official site at https://esphome.io/
hmm…That seemed to be complete instructions for me to follow. That is the reason why I refer SmartHomeCircle’s page.
Now I’m little bit lost as I followed the link you posted.I probably need to figure that one out.
EDIT: actually I managed to flash one of the Wrooms and it’s now installed to home assistant with mimimum settings. I guess next thing is to take the voice assistant yaml and upload it to device.
As I mentioned when you first posted that link and as AshaiRey said do not follow 2 year old tutorials for something that has been in constant development over those 2 years. It will at best only cause issues, but more likely just not work.
Starting out with esphome following the docs will be straight forward for simple projects, something like getting voice working will take some experience. There are up to date working yaml examples in this thread, but there are no step by step instructions, and even if there were without basic understanding of how to setup a device and modify the yaml you will struggle.
What I am trying to say is start with something a bit simpler, and master that before attempting to get 2000 lines of code working.
I got your point but I’m not sure how much outdated instructions are we talking about. Here’s a snip from pages: (says 2025.07)
So I’m going to skip that training part you mentioned
I know it’s probably bad idea but if I managed to flash it after the struggle with let’s say 15 lines of code… why not with 1000lines also…
Well in that case i will post a yaml that use the same components as you have. To get it working you need to get the gpio pins riight. Either in the yaml or on the hardware side.
This yaml was last succesfully compiled 3 weeks ago.
# ESP Voice Assistant
# 07 Oct 2024 (v 3.0.0)
# - Based on the ESP32-S3_BOX version
# adapted to work without a screen but with I2s amp and mic
# Added some LED's for interacting
# Request and response sensors in HA
# Made to work in contineous mode
#
# by A.A. van Zoelen
# Based on the work of Giants
# Voice puck location: WOONKAMER
substitutions:
# Phases of the Voice Assistant
# IDLE: The voice assistant is ready to be triggered by a wake-word
voice_assist_idle_phase_id: '1'
# LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
voice_assist_listening_phase_id: '2'
# THINKING: The voice assistant is currently processing the command
voice_assist_thinking_phase_id: '3'
# REPLYING: The voice assistant is replying to the command
voice_assist_replying_phase_id: '4'
# NOT_READY: The voice assistant is not ready
voice_assist_not_ready_phase_id: '10'
# ERROR: The voice assistant encountered an error
voice_assist_error_phase_id: '11'
# MUTED: The voice assistant is muted and will not reply to a wake-word
voice_assist_muted_phase_id: '12'
esphome:
name: "esp-assistant-vp"
friendly_name: ESP Assistant VP
project:
name: AA_van_Zoelen.VoicePuck
version: '4.0.0'
on_boot:
priority: 600
then:
- light.turn_on:
id: led_strip
blue: 0%
red: 0%
green: 100%
brightness: 50%
effect: "scanning"
- delay: 30s
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
- light.turn_off:
id: led_strip
esp32:
board: esp32-s3-devkitc-1
cpu_frequency: 240MHz
variant: esp32s3
flash_size: 16MB
framework:
type: esp-idf
version: recommended
sdkconfig_options:
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
CONFIG_SPIRAM_RODATA: "y"
CONFIG_SPIRAM_FETCH_INSTRUCTIONS: "y"
CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"
CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC: "y"
CONFIG_MBEDTLS_SSL_PROTO_TLS1_3: "y"
#network:
# enable_ipv6: true
# Enable logging
logger:
# Enable Home Assistant API
api:
# encryption:
# key: !secret api_key
actions:
- action: start_va
then:
- voice_assistant.start
- action: stop_va
then:
- voice_assistant.stop
ota:
- platform: esphome
# password: "----------------"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
output_power: 8.5dB
# If the device connects, or disconnects, to the Wifi: Run the script to refresh the LED status
on_connect:
# - script.execute: led_off
- light.turn_off:
id: led_strip
on_disconnect:
# - script.execute: control_led
- light.turn_on:
id: led_strip
blue: 0%
red: 100%
green: 0%
brightness: 98%
effect: "Fast Pulse"
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: !secret fallback_ssid
password: !secret fallback_password
psram:
mode: octal
speed: 80MHz
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO18 #WS
i2s_bclk_pin: GPIO2 #SCK
- id: i2s_out
i2s_lrclk_pin: GPIO6
i2s_bclk_pin: GPIO7
microphone:
- platform: i2s_audio
id: mic_id
adc_type: external
i2s_audio_id: i2s_in
i2s_din_pin: GPIO4 #SD
channel: left
speaker:
- platform: i2s_audio
id: speaker_id
i2s_audio_id: i2s_out
dac_type: external
i2s_dout_pin:
number: GPIO8 #DIN Pin of the MAX98357A Audio Amplifier
sample_rate: 48000
buffer_duration: 90ms
- platform: mixer
id: mixer_speaker_id
output_speaker: speaker_id
source_speakers:
- id: announcement_spk_mixer_input
- id: media_spk_mixer_input
- platform: resampler
id: media_spk_resampling_input
output_speaker: media_spk_mixer_input
- platform: resampler
id: announcement_spk_resampling_input
output_speaker: announcement_spk_mixer_input
globals:
# Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
- id: init_in_progress
type: bool
restore_value: no
initial_value: 'true'
# Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
- id: voice_assistant_phase
type: int
restore_value: no
initial_value: ${voice_assist_not_ready_phase_id}
# Variable for tracking TTS triggering
- id: is_tts_active
type: bool
restore_value: no
initial_value: 'false'
# Variable for tracking built-in continued conversations
- id: question_flag
type: bool
restore_value: no
initial_value: 'false'
# Variable for tracking ww
- id: last_wake_word
type: std::string
restore_value: no
initial_value: '""'
light:
- platform: esp32_rmt_led_strip
rgb_order: GRB
pin: GPIO17 #GPIO48 # On board light
num_leds: 3
chipset: WS2812
name: "Status LED"
id: led_strip
disabled_by_default: True
entity_category: config
icon: mdi:led-on
default_transition_length: 0s
effects:
- pulse:
name: "Slow Pulse"
transition_length: 770ms
update_interval: 770ms
min_brightness: 10%
max_brightness: 20%
- pulse:
name: "Fast Pulse"
transition_length: 100ms
update_interval: 100ms
min_brightness: 60%
max_brightness: 80%
- addressable_scan:
name: "Scanning"
move_interval: 120ms
scan_width: 1
- pulse:
name: "Waiting for wake word"
min_brightness: 15%
max_brightness: 35%
transition_length: 3s # defaults to 1s
update_interval: 3s
media_player:
- platform: speaker
name: None
id: speaker_media_player_id
media_pipeline:
speaker: media_spk_resampling_input
num_channels: 1
announcement_pipeline:
speaker: announcement_spk_resampling_input
num_channels: 1
on_announcement:
- mixer_speaker.apply_ducking:
id: media_spk_mixer_input
decibel_reduction: 25
duration: 0.2s
on_state:
- delay: 0.7s
- if:
condition:
and:
- not:
voice_assistant.is_running:
- not:
media_player.is_announcing:
then:
- mixer_speaker.apply_ducking:
id: media_spk_mixer_input
decibel_reduction: !lambda |-
return id(ducking_decibel).state;
duration: 1.0s
files:
- id: alarm_sound
#file: https://github.com/mitrokun/esp32s3-voice_assistant/raw/main/alarm.flac # 48000 Hz sample rate, mono or stereo audio, and 16 bps
file: sounds/alarm.flac
- id: beep
#file: https://github.com/mitrokun/esp32s3-voice_assistant/raw/main/r2d2d.flac
file: sounds/r2d2d.flac
voice_assistant:
id: va
microphone:
microphone: mic_id
gain_factor: 16
media_player: speaker_media_player_id
micro_wake_word: mww
noise_suppression_level: 2.0
auto_gain: 0 dbfs
volume_multiplier: 1
# When the voice assistant connects to HA:
# Set init_in_progress to false (Initialization is over).
# If the switch is on, start the voice assistant
on_client_connected:
- lambda: id(init_in_progress) = false;
- if:
condition:
switch.is_on: voice_enabled
then:
- micro_wake_word.start
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- light.turn_on:
id: led_strip
blue: 10%
red: 10%
green: 100%
effect: "Waiting for wake word"
- delay: 5s
- light.turn_off:
id: led_strip
# When the voice assistant disconnects to HA:
# Stop the voice assistant
on_client_disconnected:
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
- micro_wake_word.stop
on_listening:
# Reset flags
- lambda: |-
id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
id(is_tts_active) = false;
id(question_flag) = false;
# Microphone operation indicator (red led)
- light.turn_on:
id: led_strip
blue: 0%
red: 100%
green: 0%
brightness: 50%
effect: "Fast Pulse"
# Waiting for speech for 4 seconds, otherwise exit
- script.execute: listening_timeout
on_stt_vad_start:
# Turn off the script if speech is detected
- script.stop: listening_timeout
on_stt_vad_end:
- light.turn_off:
id: led_strip
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
on_stt_end:
# Event for HA with recognized speech
- homeassistant.event:
event: esphome.stt_text
data:
text: !lambda return x;
on_intent_progress:
- if:
condition:
# A nonempty x variable means a streaming TTS url was sent to the media player
lambda: 'return !x.empty();'
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
# Set the flag when the stage is reached
- lambda: |-
id(is_tts_active) = true;
# Start a script that would potentially enable the stop word if the response is longer than a second
- script.execute: activate_stop_word_once
on_tts_start:
- if:
condition:
# The intent_progress trigger didn't start the TTS Reponse
lambda: 'return id(voice_assistant_phase) != ${voice_assist_replying_phase_id};'
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
# Start a script that would potentially enable the stop word if the response is longer than a second
- script.execute: activate_stop_word_once
# Finding a question mark at the end of a sentence.
- lambda: |-
bool is_question = false;
if (!x.empty() && x.back() == '?') {
is_question = true;
}
id(question_flag) = is_question;
# - logger.log:
# format: "question_flag: %d (0=false, 1=true)"
# args:
# - id(question_flag)
on_tts_end:
- if:
condition:
switch.is_on: extended_dialog
then:
- lambda: |-
id(is_tts_active) = true;
on_timer_finished:
then:
- switch.turn_on: timer_ringing
on_end:
# Additional check for microphone LED
- if:
condition:
- light.is_on: led_strip
then:
- light.turn_off:
id: led_strip
- wait_until:
condition:
- media_player.is_announcing:
timeout: 0.5s
- wait_until:
not:
voice_assistant.is_running:
- delay: 0.5s
# New start of the pipeline if the conditions are met
- if:
condition:
and:
- switch.is_on: continued_conversation_enabled
- lambda: 'return !id(question_flag);'
- lambda: 'return id(is_tts_active);'
- lambda: 'return id(last_wake_word) != "Stop";'
then:
- voice_assistant.start:
wake_word: !lambda return id(last_wake_word);
else:
# Stop ducking audio.
- mixer_speaker.apply_ducking:
id: media_spk_mixer_input
decibel_reduction: !lambda |-
return id(ducking_decibel).state;
duration: 1.0s
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
# When the voice assistant encounters an error:
# Wait 1 second and set the correct phase (idle or muted depending on the state of the switch)
on_error:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- delay: 1s
- if:
condition:
switch.is_on: voice_enabled
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
micro_wake_word:
models:
- model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu_20241226.3/okay_nabu.json
id: okay_nabu
- model: https://raw.githubusercontent.com/Darkmadda/ha-v-pe/refs/heads/main/hey_glados.json
id: hey_glados
- model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
id: stop
internal: true
vad:
model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/vad.json
id: mww
stop_after_detection: false
on_wake_word_detected:
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
else:
- if:
condition:
switch.is_on: voice_enabled
then:
- if:
condition:
voice_assistant.is_running:
# Restart the pipeline if Continued conversation is enabled
# Or stop it completely by saying “Stop”
then:
- lambda: id(last_wake_word) = wake_word;
- delay: 100ms
- voice_assistant.stop:
# Stop any other media player announcement
else:
- if:
condition:
media_player.is_announcing:
then:
- media_player.stop:
announcement: true
# Start the voice assistant and play the wake sound, if enabled
else:
- lambda: id(last_wake_word) = wake_word;
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(beep);
- delay: 280ms
# - media_player.speaker.play_on_device_media_file:
# media_file: beep
# announcement: true
# - delay: 300ms
- voice_assistant.start:
wake_word: !lambda return wake_word;
script:
# - id: led_off
# then:
# - light.turn_off:
# id: led_strip
- id: listening_timeout
mode: restart
then:
- delay: 3s
- if:
condition:
lambda: |-
return id(voice_assistant_phase) == 2;
then:
# BARM - switch.turn_off: wake_led
- light.turn_off:
id: led_strip
- voice_assistant.stop:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- id: activate_stop_word_once
then:
- delay: 1s
# Enable stop wake word
- if:
condition:
switch.is_off: timer_ringing
then:
- micro_wake_word.enable_model: stop
- wait_until:
not:
media_player.is_announcing:
- if:
condition:
switch.is_off: timer_ringing
then:
- micro_wake_word.disable_model: stop
- id: play_sound
parameters:
priority: bool
sound_file: "audio::AudioFile*"
then:
- lambda: |-
if (priority) {
id(speaker_media_player_id)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
}
if ( (id(speaker_media_player_id).state != media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING ) || priority) {
id(speaker_media_player_id)
->play_file(sound_file, true, false);
}
select:
- platform: template
name: "Wake word sensitivity"
optimistic: true
initial_option: Slightly sensitive
restore_value: true
entity_category: config
options:
- Slightly sensitive
- Slightly+ sensitive
- Moderately sensitive
- Very sensitive
on_value:
# Sets specific wake word probabilities computed for each particular model
# Note probability cutoffs are set as a quantized uint8 value, each comment has the corresponding floating point cutoff
# False Accepts per Hour values are tested against all units and channels from the Dinner Party Corpus.
# These cutoffs apply only to the specific models included in the firmware: [email protected], hey_jarvis@v2, hey_mycroft@v2
lambda: |-
if (x == "Slightly sensitive") {
id(okay_nabu).set_probability_cutoff(217); // 0.85 -> 0.000 FAPH on DipCo (Manifest's default)
} else if (x == "Slightly+ sensitive") {
id(okay_nabu).set_probability_cutoff(191); // 0.75
} else if (x == "Moderately sensitive") {
id(okay_nabu).set_probability_cutoff(176); // 0.69 -> 0.376 FAPH on DipCo
} else if (x == "Very sensitive") {
id(okay_nabu).set_probability_cutoff(143); // 0.56 -> 0.751 FAPH on DipCo
}
- platform: logger
id: logger_select
name: Logger Level
disabled_by_default: true
button:
- platform: restart
name: reboot
number:
- platform: template
name: "Decibel Reduction"
id: ducking_decibel
min_value: 0
max_value: 12
step: 1
initial_value: 0
unit_of_measurement: "dB"
set_action:
- mixer_speaker.apply_ducking:
id: media_spk_mixer_input
decibel_reduction: !lambda 'return (int)x;'
duration: 0.2s
switch:
- platform: template
name: Enable Voice Assistant
id: voice_enabled
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
icon: mdi:assistant
# When the switch is turned on (on Home Assistant):
# Start the voice assistant component
on_turn_on:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- if:
condition:
not:
- voice_assistant.is_running
then:
- micro_wake_word.start
- light.turn_on:
id: led_strip
blue: 15%
red: 0%
green: 15%
brightness: 20%
effect: "Slow Pulse"
# When the switch is turned off (on Home Assistant):
# Stop the voice assistant component
on_turn_off:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- voice_assistant.stop
- micro_wake_word.stop
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- light.turn_off:
id: led_strip
- platform: template
name: "Ring Timer"
id: timer_ringing
optimistic: true
restore_mode: ALWAYS_OFF
on_turn_off:
# Stop playing the alarm
- media_player.stop:
announcement: true
on_turn_on:
- while:
condition:
switch.is_on: timer_ringing
then:
# Play the alarm sound as an announcement
- media_player.speaker.play_on_device_media_file:
media_file: alarm_sound
announcement: true
# Wait until the alarm sound starts playing
- wait_until:
media_player.is_announcing:
# Wait until the alarm sound stops playing
- wait_until:
not:
media_player.is_announcing:
- delay: 1000ms
- platform: template
name: Continued Conversation
id: continued_conversation_enabled
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
icon: mdi:chat-processing-outline
- platform: template
name: Continued Conversation+
id: extended_dialog
optimistic: true
restore_mode: RESTORE_DEFAULT_OFF
icon: mdi:chat-plus-outline
Thanks @AshaiRey .
The only things I changed in the configuration were the WLAN SSID and password, as well as the OTA and API passwords. I also added a static IP to the configuration.
I managed to run the configuration on the board, but I still couldn’t get anything to work
So the ESP is online, the configuration is in and the device is kind of controllable in Home Assistant. I tried to turn on the status LED, but nothing happened. In addition, I defined the wake word as “Okey Nabu” in the device’s management interface, but no matter how many times I repeat it, not a single line appears in the log.
Of course, the speaker is still missing from my configuration, but the MAX Amplifier is connected correctly. I actually made the pin connections directly according to the configuration without changing anything. The reboot button from the management interface does work and restarts the board, so there is clearly something alive there.
And now I apologize in advance, because I didn’t have time to go through the configuration except for the network and pins. If there is, for example, something commented out there that affects the current inoperability, then I think they will become clear when I start investigating. If you guys have any ideas about what could go wrong, feel free to tease me.
EDIT: but wait. Are we running exactly the same boards (ESP32-S3-WROOM) because in configuration pin 6,7,8 are used…but aren’t they flash pins and cannot be used?