Espressif's EchoEar ESP32-S3 voice-controlled AI chatbot runs esp-brookesia

This just popped up on my Reddit feed… looks smart / cute and wondered if , for the price, it was worth a punt? I’m looking at slowly getting rid of all my Alexa devices and moving to a local Home Assistant solution but only been having the occasional read of any Voice PE posts!

ANNND discuss… :slight_smile:

1 Like

Looks like it could be a very nice candidate for the ESPHome Ready-Made Projects as both voice assistant and media player reference hardware:

Espressif Systems EchoEar ESP32-S3 have a rechargable battery and a dock as well as a round display are unique features among those existing ready-made ESPHome projects on that page:

It is missing an dedicated MCU for advanced DSP so would not make a perfect Home Assistant Voice Satellite hardware but good enough as a development kit and for use in kids rooms. Might the era of open-source voice assistants (as in plural) finally be here?

Based on its specification it looks to basically be a development kit built around Espressif’s ESP32-S3-WROOM-1 Wi-Fi 4 and Bluetooth 5 module, however the enclosure design make it seem ready for retail?

Espressif’s EchoEar key features are:

  • ESP32-S3.
  • 2 microphones in an array (Dual LMA3729T381-OY3S microphone array).
  • Built-in 3W mini speaker.
    • NS4150B 3W Class-D amplifier
    • ES7210 audio ADC (4-ch)
    • ES8311 audio codec (ADC + DAC)
  • 1.85-inch circular touch display (ST77916 driver).
  • Bosch BMI270 6-axis IMU Sensor
  • microSD card slot for data storage
  • USB Type-C port for programming, power, and log printing
  • 3.7 V Li-ion battery (connector)
    • BQ27220 battery management chip
    • TP4057 Li-ion charging IC (250 mA charging current)
    • TlV62569 / SY8088AAC buck converters (5 V → 3.3 V)
    • SAM8108 power control chip (power on/off management)
  • magnetic connector for popopins (serial + 5 V) to use UART pass-through docking
    • Internal I2C header

The Espressif EchoEar hardware can be purchased from AliExpress for $40 and on Amazon for $55.00.

1 Like

What functionality would DSP add?

It can be used in the pipeline for audio input or audio output, the relevant use case here is audio input from the microphones.

Home Assistent Voice Preview Edition features an XMOS XU316 as a dedicated DSP (really just a power MCU) that runs algoritmens which clean up the audio coming from the far-field microphone array that makes it much easier for a speech-to-text to understand what you are actually saying, especially if you are in a noisy room or far away from the microphone. The DSP could also be used on the audio output pipeline to enable digital equalizer, but then you really want a better speaker.

For the audio input pipeline the XMOS xCORE DSP chip acts like sound-card co-processor adding in-line off-loading of audio noise removal (voice clean-up) from the microphone(s), like Interference Cancellation (IC)​, Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and/and other audio post-processing algorithms to improve the solution’s voice recognition capabilities). (Depending on which XMOS chip they use their XCORE-VOICE framework could technically also allow also for up to 16 PDM microphones to be connected to a single xCORE device with a different PCB design).

For more details and practical use cases check put these two other threads with differenr XMOS development boards:

I have this device! Just arrived a few days ago. I think I must have been one of the first to order it when it went live on Espressif’s AliExpress store… :smile:

Happy to answer any questions!

Build quality is good (though there’s a tiny bit of backlight leakage on the bottom of the screen from certain angles, but not a deal-breaker). Speaker is LOUD & good for voice assistant text-to-speech, but obviously :grimacing: for listening to music.

It’s cute, and I really like the placement of the mics in the cat ears.

I have voice assistant and microWakeWord working perfectly with ESPHome – happy to share my draft configuration YAML so far. There’s a LOT of power amplifiers you have to switch on manually for the sound and screen backlight to work; on the flip side, since the device supports running off battery, these will provide plenty of opportunities to optimize sleep modes.

I have the backlight working, but unfortunately I haven’t figured out the QSPI screen yet. I have the display’s initialization codes, and I’ve triple-checked the pin assignments based on the official schematic and documentation, but it remains blank, with both LVGL and ESPHome’s native lambda drawing.

Hopefully I’m just missing something stupid and obvious in the config… :thinking:

Bottom line: I plan to get more of these. Thinking of building some privacy-conscious voice assistants for my nieces and nephews. :slightly_smiling_face:

I hope some of the fun colors become available soon (yellow and red are pictured in the product photography; AliExpress only has black).

Also, for those who might not be aware, in the Chinese market, there’s a base add-on accessory that adds 360º rotation! :scream::bangbang: (It snaps on using the magnetic pogo pins on the bottom.) I’m hoping to get my hands on one…

1 Like

I would love to see that draft config! Amazon says mine is going to arrive today, I’m excited to finally test out HAs voice assistant.

1 Like

My first EchoEar arrived with a bad screen, but my replacement just arrived. Not sue how to get it past “invalid configuration” at this point though. Have you put your config up on Github yet? I would love to play around with somthing that’s actually working.

I’m watching this too although my echoear is repeating itself in an almost endless loop with the stock firmware.

i have working yaml for echoear v1.0 here:

edit: now also for v1.2

1 Like

Hello I’m new here, but I actually almost fully reverse engineered the coze ai api in python, that works with the esp-brookesia repository.

since the firmware that mine shipped with (openai) is not available and support is 0 ATM. Even the esp32 redit removes all topics related to it.

(yes I did it a bit out of spite)

you only need to dive into the firmware to replace the web address to your local ip and you’ll need to create your own keys.

Not super hard with the right instructions, if you know how to work vscode a little. I could share the python code here if anyone would be interested

Hello, I buy that EachEar and cannot find guide how to connect it to Home Assistant and use as my local voice assistant.
Regards.

RealDeco posted code at link 2 post above yours.
install Esphome Device Builder
Connect echo ear to PC by USB
use esphome device builder to flash echoear1.2.yaml from at link in realdeco post

you will need to edit yaml with your wifi details
you should add api key and ota password as well. you need to read esphome docs for that.

hi @Kwisss

My device just arrive and 100% interested in making it work. Please do share the informations

Thank you @RealDeco for the wonderful instructions, the video was extremely helpful.
However, I’m stuck now after adding the device to Home Assistant, I see it in Home Assistant, the logs look just like yours, everything seems fine, but the wake word doesn’t work, neither one. I tried all the available wake words, I also tried to click on the screen to put it in listening mode (is that what’s supposed to happen?) but it’s like the microphone isn’t used at all here. Any idea what could be the issue? Any help will be greatly appreciated…

Edit: I tried both devices/Espressif/echoear1.2.yaml and devices/Espressif/echoear1.0.yaml, in both cases, I see other interactions like tapping on it etc, being detected in Home Assistant, but the wake words do nothing

to troubleshoot issues with esp devices it is best to look at the devices live log in the esphome device builder. Looking there you will see if a component failed or other issue device may encounter.

for now, a test you can try is to send tts message to device to see if speaker is working. I would do this while looking a log as well.

I did some investigation and realized that my device is most likely echoear 1.0 so I flashed that yaml.
I can successfully send TTS messages from Home Assistant to the device, and this is the log I get when I do that:

[08:22:39]ESP-ROM:esp32s3-20210327
[08:22:39][I][logger:121]: Log initialized
[08:22:39][C][safe_mode:084]: Unsuccessful boot attempts: 0
[08:22:39][D][esp32.preferences:149]: Writing 1 items: 0 cached, 1 written, 0 failed
[08:22:39][I][app:077]: Running through setup()
[08:22:39][I][i2c.idf:200]: Performing bus recovery
[08:22:39][C][component:208]: Setup i2c took 21ms
[08:22:39][C][component:208]: Setup spi took 0ms
[08:22:39][C][component:208]: Setup preferences took 0ms
[08:23:04][C][component:208]: Setup power_supply took 1[D][media_player:084]: 'EchoEar' - Setting
[08:23:04][D][media_player:091]:   Media URL: http://192.168.1.229:8123/api/esphome/ffmpeg_proxy/f5728b273089c5848d9498bca9b5f2df/d7hzhfuCdx32Mrlb4Tntcg.flac
[08:23:04][D][media_player:097]:  Announcement: yes
[08:23:04][D][micro_wake_word:367]: Stopping wake word detection
[08:23:04][D][speaker_media_player:406]: State changed to ANNOUNCING
[08:23:04][W][component:490]: speaker.media_player took a long time for an operation (344 ms)
[08:23:04][W][component:493]: Components should block for at most 30 ms
[08:23:04][D][micro_wake_word:375]: State changed from DETECTING_WAKE_WORD to STOPPING
[08:23:04][E][i2s_audio.speaker:521]: Parent I2S bus not free
[08:23:04][E][i2s_audio.speaker:148]: Driver failed to start; retrying in 1 second
[08:23:04][E][component:362]: i2s_audio.speaker set Error flag: unspecified
[08:23:04][D][speaker_media_player.pipeline:114]: Reading FLAC file type
[08:23:04][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[08:23:04][D][ring_buffer:034][ann_read]: Created ring buffer with size 250000
[08:23:04][D][micro_wake_word:271]: Inference task is stopping, deallocating buffers
[08:23:04][D][micro_wake_word:276]: Inference task is finished, freeing task resources
[08:23:04][D][micro_wake_word:375]: State changed from STOPPING to STOPPED
[08:23:05][E][component:379]: i2s_audio.speaker cleared Error flag
[08:23:05][D][i2s_audio.speaker:102]: Starting
[08:23:05][D][i2s_audio.speaker:106]: Started
[08:23:05][D][ring_buffer:034][speaker_task]: Created ring buffer with size 48000
[08:23:06][D][i2s_audio.speaker:111]: Stopping
[08:23:06][D][i2s_audio.speaker:116]: Stopped
[08:23:06][D][micro_wake_word:357]: Starting wake word detection
[08:23:06][D][speaker_media_player:406]: State changed to IDLE
[08:23:06][D][micro_wake_word:375]: State changed from STOPPED to STARTING
[08:23:06][D][micro_wake_word:259]: Inference task has started, attempting to allocate memory for buffers
[08:23:06][D][micro_wake_word:264]: Inference task is running
[08:23:06][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD
[08:23:06][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[08:23:16][D][sensor:135]: 'Battery Voltage': Sending state 3.67600 V with 2 decimals of accuracy
[08:23:16][D][sensor:135]: 'Battery Percentage': Sending state 47.60001 % with 0 decimals of accuracy

So it specifically say it’s stopping the wake word detection to announce and then it’s starting it again when it’s done announcing and there are no errors in the log… What else can I do?

EDIT: I notice when I tap on the screen it also supposedly goes into “listening mode” of sorts, instead of the wake word, and even then nothing gets picked up. So the issue here is the microphone, not something specific to the wake word itself.

EDIT2: I didn’t want to but I opened the casing. Is what I’m seeing possible? What does it mean V1.1?

EDIT3: I got it to work. I’m not sure exactly what fixed it, I reflashed the original firmware and then reflashed this home assistant v1.0 yaml again after that, I also recharged the device to 100% and added a microsd card. One of those things did the trick, it’s now working.


: