Need help with some basics in testing ESP32-C3 voice assistant

Hi folks,
I think I am 90% of the way there, but am struggling to find the right way to troubleshoot further…

What I have;
HASS and ESPhome in Docker on Ubuntu. Hass is 2025.5.2, ESPhome is 2025.6.0
an aliexpress special ESP32-C3-Zero dev board (https://www.waveshare.com/wiki/ESP32-C3-Zero), with a MAX98357, INMP441 and the onboard RGB LED, which is a WS2812.

Everything is wired in this config;

ESPHome voice assistant with local wake word – Tristam

suitably modified for the C3-Zero

I have mDNS operating, and Hass sees both the ESPHome container, and the device.
I can open the device’s Web page, and switch the RGB LEd on and off. on the Webpage, I can see the logs, and the device correctly detects when I use the “Hey Jarvis” wakeword…

i see this;

|22:05:29|[D]|[micro_wake_word:325]|Detected 'Hey Jarvis' with sliding average probability is 0.98 and max probability is 1.00|
| --- | --- | --- | --- |
|22:05:29|[D]|[voice_assistant:456]|State changed from IDLE to START_MICROPHONE|
|22:05:29|[D]|[voice_assistant:463]|Desired state set to START_PIPELINE|
|22:05:29|[W]|[light:475]|'LED bar' - No such effect 'scan'|
|22:05:29|[D]|[light:036]|'LED bar' Setting:|
|22:05:29|[D]|[light:047]|State: ON|
|22:05:29|[D]|[light:051]|Brightness: 30%|
|22:05:29|[D]|[light:058]|Red: 100%, Green: 100%, Blue: 100%|
|22:05:29|[D]|[light:085]|Transition length: 1.0s|
|22:05:29|[D]|[micro_wake_word:370]|Stopping wake word detection|
|22:05:29|[D]|[voice_assistant:186]|Starting Microphone|
|22:05:29|[D]|[ring_buffer:034]|Created ring buffer with size 16384|
|22:05:29|[D]|[voice_assistant:456]|State changed from START_MICROPHONE to STARTING_MICROPHONE|
|22:05:29|[D]|[micro_wake_word:378]|State changed from DETECTING_WAKE_WORD to STOPPING|
|22:05:29|[D]|[voice_assistant:456]|State changed from STARTING_MICROPHONE to START_PIPELINE|
|22:05:29|[D]|[micro_wake_word:273]|Inference task is stopping, deallocating buffers|
|22:05:29|[D]|[micro_wake_word:278]|Inference task is finished, freeing task resources|
|22:05:29|[D]|[micro_wake_word:378]|State changed from STOPPING to STOPPED|
|22:05:29|[D]|[voice_assistant:207]|Requesting start|
|22:05:29|[D]|[voice_assistant:456]|State changed from START_PIPELINE to STARTING_PIPELINE|
|22:05:29|[D]|[voice_assistant:478]|Client started, streaming microphone|
|22:05:29|[D]|[voice_assistant:456]|State changed from STARTING_PIPELINE to STREAMING_MICROPHONE|
|22:05:29|[D]|[voice_assistant:463]|Desired state set to STREAMING_MICROPHONE|
|22:05:58|[D]|[light:036]|'LED bar' Setting:|
|22:05:58|[D]|[light:047]|State: OFF|
|22:05:58|[D]|[light:085]|Transition length: 1.0s|
|22:10:29|[D]|[voice_assistant:159]|reset conversation ID|

So, What isn’t working;

  1. when I add the device under ESPhome in HASS, it adds 6 entities - “assist satellite”, assistant, “finished speaker detection”, LED bar, mute Microhpne and “va” - the smartspeaker wake word entity -but this is “unavailable”.

  2. It doesn’t make a sound.

What I don’t understand, and need some help troubleshooting is

  1. how do i make the wake_word entity become available? I can’t seem to “connect” it to the HASS assistant, as it is greyed out.

  2. how do i make the speaker play a sound or say something so i can test the sound output.

I suspect that the sound hardware is working “kinda”, as I originally had the i2S data lines for the mic and speaker swapped, causing the speaker to crackle a lot. clearly it’s capable of making some sound :smiley:

You are unlikely to achieve acceptable results. Overall, it has a weaker single-core processor and less memory. There may be architectural issues for some components.

Thanks for replying, but honestly… this is just the home assistant / esphome “community” experience through and through…

I was really asking for incredibly basic help in how to make an esphome device do something. how to tell it to play a sound, troubleshoot, whether there are logs to check and so on…

lots of views, no actual help, but when I see a reply and check to see if someone is helping me out… no, it’s instead just a post to tell me that I am doing it wrong.

but with no suggestion of how to actually do it right

what would have helped - rather than simply saying what’s bad about the module I was trying to use, how about suggesting a module that WOULD work better?

as I say, thanks for replying at all - that’s more engagement than I expected, but this wasn’t really helpful.

I’ve been playing with an esp32-s3 and MAX98357A i²s DAC/ AMP today, so that part is kind of easy’ish (should be relatively similar on the ESP32-C3, but I would be concerned about the ESP32-C3 performance for this task as well):

substitutions:
  name: esphome-XXXX
  friendly_name: esp32-s3

esphome:
  name: ${name}
  friendly_name: ${friendly_name}

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  cpu_frequency: 240MHz
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"

psram:
  mode: octal
  speed: 80MHz

# Enable logging
logger:
  hardware_uart: USB_SERIAL_JTAG

# Enable Home Assistant API
api:
  encryption:
    key: !secret esp32-s3_api_key

ota:
  - platform: esphome
    id: ota_esphome
    password: !secret esp32-s3_ota_password

wifi:
  ssid: !secret wifi_ssid_wpa3
  password: !secret wifi_password_wpa3

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esp32-S3 Fallback Hotspot"
    password: !secret wifi_password_fallback

captive_portal:

web_server:
  port: 80

i2s_audio:
    id: i2s_audio_bus # For speaker
    i2s_lrclk_pin: GPIO16
    i2s_bclk_pin: GPIO15

speaker:
  - platform: i2s_audio
    id: box_speaker
    i2s_audio_id: i2s_audio_bus
    dac_type: external
    i2s_dout_pin:   
      number: GPIO7 # DIN Pin of the MAX98357A Audio Amplifier
    channel: stereo
    buffer_duration: 1000ms

media_player:
  - platform: speaker
    name: None
    id: speaker_media_player
    volume_min: 0.5
    volume_max: 0.8
    announcement_pipeline:
      speaker: box_speaker
      format: FLAC
      sample_rate: 48000
      num_channels: 1  # mono, single MAX98357A

3 GPIOs needed:

  • GPIO7: DIN
  • GPIO15: BCLK
  • GPIO16: LRC(LK)

GAIN and GND connected to ground, VIN and 5V connected to 3.3V.

//done.

Useful schematics: https://cdn.static.spotpear.com/uploads/picture/learn/ESP32/ESP32-S3-1.28inch-AI/ESP32S3-1.28inch-AI.pdf

Interesting read: SpotPear "DeepSeek Voice Chat" config

1 Like

Ignore the rotary encoder, that’s still a work in progress.

Sound quality, not great.
Volume, too low (active speaker needed).
Works.

Thanks so much for this -

I had ordered the same S3 module you have shown, and it’s due to arrive today, so will give that a spin. I have some C3 Zeros available so used one as a test.

a (possibly daft) question; When you add this in Home Assistant, how do you actually interact with it? does it just show as a regular media player entity?

Yes, it does.