Low cost ESP32 voice platforms

In the last couple of weeks (presumably driven by the release of Deepseek), a number of low cost ESP32 voice assistants have arrived, eg.,

https://www.aliexpress.com/item/1005008600891141.html

https://www.aliexpress.com/item/1005008634826817.html

https://www.aliexpress.com/item/1005008634051137.html

Has anyon e tried any of those and any comments on how well they work or even how far we are from using them with ESPHome and microwakeword?

1 Like

Interesting … looks good … based on ESP32-S3 (like the Voice Assist PE) so in theory shouldn’t be too hard to run ESPHome.

Interesting. I’ve bought one. Will have a play and report back.

1 Like

Which one did you get?

I’ll be very interested to hear, sadly currently lack the time to go and hack at them…

The one in the first link. Thought it looked the best. Haven’t received it yet though.

OK. So I got this device: DeepSeek XiaoZhi AI Voice Chat Robot ESP32-S3 1.28 inch LCD N16R8 Development Board Astronaut Clock Desktop Ornament - AliExpress 7

It appears that the documentation for this device is available here: Moji å°ę™ŗAIč”ē”Ÿē‰ˆ - ē«‹åˆ›å¼€ęŗē”¬ä»¶å¹³å°

This is very useful as it provides all of the pinouts etc. Edit: See my later post for the correct information. I’m afraid that what I posted above was for a similar but different device, which resulted in wasted time working with the wrong pinouts.

I have successfully installed ESPHome on the device. All the standard stuff works, like wifi, ota etc, but in terms of device specific hardware I have thus far only managed to get the backlight to work. The display is listed as a GC9A01 display, but none of the various permutations I have tried have yielded any results. If anyone knows which ESPHome component I should be using with this display I would be most grateful. In the meantime, I’ll continue with my trial and error approach. Is there a better way? Advice from googling and asking Copilot has thus far proved incorrect.

2 Likes

Unfortunately the devices sold on AliExpress are inexpensive clones that just use a MAX98357, INMP441, GC9A01 and a single WS2812.

You can find more details of the product here: DeepSeek XiaoZhi AI Voice Chat Robot ESP32-S3 1.28 inch LCD N16R8 Development Board Astronaut Clock Desktop Ornament

Schematic: http://cdn.static.spotpear.com/uploads/picture/learn/ESP32/ESP32-S3-1.28inch-AI/ESP32S3-1.28inch-AI.pdf

I have successfully got the ESPHome Voice PE template and the S3-BOX-3 template running with a few minor tweaks.

Hi. Thanks for that. I’ve independently found the same information, after quite a lot of digging. One of the problems is that there are a lot of versions of this kicking around, and it is quite hard to find the correct one. I’ve found the source code accessible here and I’ve successfully compiled it and downloaded it to the device, so I can confirm it is the correct one for that device. It also took me a little while to figure out which of the many boards listed in the source code was the correct one: answer ā€œbread-compact-wifiā€. Having names in Chinese makes things entertaining! Anyway, once I figured out the board type, that gave me the pin assignments and I was able to make quite a lot of progress mapping this onto an ESPHome project. I have the display working fine, and also the speaker. (Though curiously first this worked in Music Assistant but not Home Assistant, and now it is vice versa! Still tracking that one down). At the moment my microphone doesn’t seem to work, but I’ll follow up your tip on using the s3-box-3 template and see whether that helps.
While I had the device on the stock firmware, I couldn’t get the wakeword functionality to work. It only started listening when I pressed the boot button at the back. I suspect this is all I’ll be able to do on ESPHome as well.

If you flip your device over, does it have 3 screws or 1 singular? Does it have a sticker that says v4?

The schematic shows the pinouts for the components, once I mapped the esphome templates to the correct GPIO it ā€˜just worked’ you should be able to get wakeword working, but I found the microphone position inside the case makes struggle to listen unless you’re right next to it

Yes, it has 3 screws and a sticker that says V4. Which template are you using? Is this this one?

https://github.com/esphome/wake-word-voice-assistants/blob/fcb1345b8271b32221e14ad3ea3b783d129240aa/esp32-s3-box-3/esp32-s3-box-3.yaml

You will need to remove the i2c, audio_adc/es7210, audio_dac/es8311 and switch the images to use https://github.com/jptrsn/wake-word-voice-assistants/tree/main/casita

https://github.com/esphome/home-assistant-voice-pe/blob/7f5aaf00fa7aeb88aa455b461942e92a904bb4a0/home-assistant-voice.yaml

You will need to remove the voice_kit, audio_dac, external_components. This will have no screen output, but work similarly to the official device

I have based mine template on voice pe with the display bits from the box s3 ported across, so ideally you will want to merge bits once tested independently that theyre working.

I have a ā€œV2ā€ version…
Could you share your ESPHome config?

@lareeth I just received the XiaoZhi ā€œV5 ENā€ device. I got most of it working but audio is still missing. How did you manage to interface the audio ic? Is it using the ES8311?

V5 is v1.62 of the software, nothing changed with hardware, and the pins are the same for all versions of this cube and some other models too like this one:
https://www.aliexpress.com/item/1005008720707304.html
except ofcourse, that one has no display.

also same pins as they also use for their standard breadboard version.

So, I opened up the device — hardware looks identical.

It’s still using the MAX98357A instead of the ES8311, just like yours.

But for some reason, I still can’t get any audio output working. Any idea whats wrong?

Here’s my code:

i2s_audio:
  - id: i2s # For microphone
    i2s_lrclk_pin: GPIO4 #WS 
    i2s_bclk_pin: GPIO5  #SCK
  - id: speaker_i2s # For speaker
    i2s_lrclk_pin: GPIO16
    i2s_bclk_pin: GPIO15

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s
    adc_type: external
    i2s_din_pin: GPIO6 #SD
    id: va_mic
    channel: left
    pdm: false
    bits_per_sample: 16bit

speaker:
    platform: i2s_audio
    id: va_speaker
    i2s_audio_id: i2s
    dac_type: external
    i2s_dout_pin:   
      number: GPIO7 # DIN Pin of the MAX98357A Audio Amplifier
    channel: mono

Same hardware and circuit diagram,
This configuration can produce sound normally in my environment


esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    

i2s_audio:
  - id: i2s # For microphone
    i2s_lrclk_pin: GPIO4 #WS 
    i2s_bclk_pin: GPIO5  #SCK
  - id: speaker_i2s # For speaker
    i2s_lrclk_pin: GPIO16
    i2s_bclk_pin: GPIO15

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s
    adc_type: external
    i2s_din_pin: GPIO6 #SD
    id: va_mic

speaker:
    platform: i2s_audio
    id: va_speaker
    i2s_audio_id: i2s
    dac_type: external
    i2s_dout_pin:   
      number: GPIO7 # DIN Pin of the MAX98357A Audio Amplifier

media_player:
  - platform: speaker
    name: None
    announcement_pipeline:
      speaker: va_speaker
      format: WAV
    volume_min: 0.4
1 Like

Hello everyone,

I’m thinking on replacing my alexa’s without breaking the bank, do you think this is a good solution?

How’s the sound and how does the microphone catches wakeup words?

Hope you can help me out.

i will add the display part to your code:

time:
  - platform: homeassistant
    id: homeassistant_time

spi:
  clk_pin: GPIO14
  mosi_pin: GPIO17
  interface: hardware
  id: spihwd

font:
  - file: "gfonts://Roboto"
    id: my_font
    size: 28
  - file: "gfonts://Roboto"
    id: roboto_52
    size: 52

image:
  - file: images/ow480x480.jpg
    id: my_image
    resize: 240x240
    type: RGB

display:
  - platform: ili9xxx
    invert_colors: true
    model: GC9A01A
    rotation: 0
    reset_pin: GPIO18 # (Pin on display - RESET)
    cs_pin: GPIO13 # (Pin on display - CS)
    dc_pin: GPIO10 # (Pin on display - DC)
    lambda: |-
      auto red = Color(255,0,0), green = Color(0,255,0), blue = Color(0,0,255), yellow = Color(255,255,0), white = Color(255,255, 255), cyberlightgreen = Color(0,255,159), cyberlightblue = Color(0,184,255), cyberpink = Color(214,0,255);
      it.image(0, 0, id(my_image));
      it.strftime(120, 120, id(roboto_52), cyberlightgreen, TextAlign::CENTER, "%H:%M:%S", id(homeassistant_time).now());

this will show a image with the time on top.

2 Likes

This is V5-EN version?

it’s a V4 updated to V5 since the version is the software, not hardware. v4 was 1.60, v5 is 1.62 (current) of the xiaozhi code, but since the one in that image is flashed ith esphome it’s none of them, if anything it’s the version of esphome 2025.5

2 Likes