This just popped up on my Reddit feed… looks smart / cute and wondered if , for the price, it was worth a punt? I’m looking at slowly getting rid of all my Alexa devices and moving to a local Home Assistant solution but only been having the occasional read of any Voice PE posts!
Espressif Systems EchoEar ESP32-S3 have a rechargable battery and a dock as well as a round display are unique features among those existing ready-made ESPHome projects on that page:
It is missing an dedicated MCU for advanced DSP so would not make a perfect Home Assistant Voice Satellite hardware but good enough as a development kit and for use in kids rooms. Might the era of open-source voice assistants (as in plural) finally be here?
Based on its specification it looks to basically be a development kit built around Espressif’s ESP32-S3-WROOM-1 Wi-Fi 4 and Bluetooth 5 module, however the enclosure design make it seem ready for retail?
It can be used in the pipeline for audio input or audio output, the relevant use case here is audio input from the microphones.
Home Assistent Voice Preview Edition features an XMOS XU316 as a dedicated DSP (really just a power MCU) that runs algoritmens which clean up the audio coming from the far-field microphone array that makes it much easier for a speech-to-text to understand what you are actually saying, especially if you are in a noisy room or far away from the microphone. The DSP could also be used on the audio output pipeline to enable digital equalizer, but then you really want a better speaker.
For the audio input pipeline the XMOS xCORE DSP chip acts like sound-card co-processor adding in-line off-loading of audio noise removal (voice clean-up) from the microphone(s), like Interference Cancellation (IC), Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and/and other audio post-processing algorithms to improve the solution’s voice recognition capabilities). (Depending on which XMOS chip they use their XCORE-VOICE framework could technically also allow also for up to 16 PDM microphones to be connected to a single xCORE device with a different PCB design).
For more details and practical use cases check put these two other threads with differenr XMOS development boards:
I have this device! Just arrived a few days ago. I think I must have been one of the first to order it when it went live on Espressif’s AliExpress store…
Happy to answer any questions!
Build quality is good (though there’s a tiny bit of backlight leakage on the bottom of the screen from certain angles, but not a deal-breaker). Speaker is LOUD & good for voice assistant text-to-speech, but obviously for listening to music.
It’s cute, and I really like the placement of the mics in the cat ears.
I have voice assistant and microWakeWord working perfectly with ESPHome – happy to share my draft configuration YAML so far. There’s a LOT of power amplifiers you have to switch on manually for the sound and screen backlight to work; on the flip side, since the device supports running off battery, these will provide plenty of opportunities to optimize sleep modes.
I have the backlight working, but unfortunately I haven’t figured out the QSPI screen yet. I have the display’s initialization codes, and I’ve triple-checked the pin assignments based on the official schematic and documentation, but it remains blank, with both LVGL and ESPHome’s native lambda drawing.
Hopefully I’m just missing something stupid and obvious in the config…
Bottom line: I plan to get more of these. Thinking of building some privacy-conscious voice assistants for my nieces and nephews.
I hope some of the fun colors become available soon (yellow and red are pictured in the product photography; AliExpress only has black).
Also, for those who might not be aware, in the Chinese market, there’s a base add-on accessory that adds 360º rotation! (It snaps on using the magnetic pogo pins on the bottom.) I’m hoping to get my hands on one…
My first EchoEar arrived with a bad screen, but my replacement just arrived. Not sue how to get it past “invalid configuration” at this point though. Have you put your config up on Github yet? I would love to play around with somthing that’s actually working.
RealDeco posted code at link 2 post above yours.
install Esphome Device Builder
Connect echo ear to PC by USB
use esphome device builder to flash echoear1.2.yaml from at link in realdeco post
you will need to edit yaml with your wifi details
you should add api key and ota password as well. you need to read esphome docs for that.
Thank you @RealDeco for the wonderful instructions, the video was extremely helpful.
However, I’m stuck now after adding the device to Home Assistant, I see it in Home Assistant, the logs look just like yours, everything seems fine, but the wake word doesn’t work, neither one. I tried all the available wake words, I also tried to click on the screen to put it in listening mode (is that what’s supposed to happen?) but it’s like the microphone isn’t used at all here. Any idea what could be the issue? Any help will be greatly appreciated…
Edit: I tried both devices/Espressif/echoear1.2.yaml and devices/Espressif/echoear1.0.yaml, in both cases, I see other interactions like tapping on it etc, being detected in Home Assistant, but the wake words do nothing
to troubleshoot issues with esp devices it is best to look at the devices live log in the esphome device builder. Looking there you will see if a component failed or other issue device may encounter.
for now, a test you can try is to send tts message to device to see if speaker is working. I would do this while looking a log as well.
I did some investigation and realized that my device is most likely echoear 1.0 so I flashed that yaml.
I can successfully send TTS messages from Home Assistant to the device, and this is the log I get when I do that:
[08:22:39]ESP-ROM:esp32s3-20210327
[08:22:39][I][logger:121]: Log initialized
[08:22:39][C][safe_mode:084]: Unsuccessful boot attempts: 0
[08:22:39][D][esp32.preferences:149]: Writing 1 items: 0 cached, 1 written, 0 failed
[08:22:39][I][app:077]: Running through setup()
[08:22:39][I][i2c.idf:200]: Performing bus recovery
[08:22:39][C][component:208]: Setup i2c took 21ms
[08:22:39][C][component:208]: Setup spi took 0ms
[08:22:39][C][component:208]: Setup preferences took 0ms
[08:23:04][C][component:208]: Setup power_supply took 1[D][media_player:084]: 'EchoEar' - Setting
[08:23:04][D][media_player:091]: Media URL: http://192.168.1.229:8123/api/esphome/ffmpeg_proxy/f5728b273089c5848d9498bca9b5f2df/d7hzhfuCdx32Mrlb4Tntcg.flac
[08:23:04][D][media_player:097]: Announcement: yes
[08:23:04][D][micro_wake_word:367]: Stopping wake word detection
[08:23:04][D][speaker_media_player:406]: State changed to ANNOUNCING
[08:23:04][W][component:490]: speaker.media_player took a long time for an operation (344 ms)
[08:23:04][W][component:493]: Components should block for at most 30 ms
[08:23:04][D][micro_wake_word:375]: State changed from DETECTING_WAKE_WORD to STOPPING
[08:23:04][E][i2s_audio.speaker:521]: Parent I2S bus not free
[08:23:04][E][i2s_audio.speaker:148]: Driver failed to start; retrying in 1 second
[08:23:04][E][component:362]: i2s_audio.speaker set Error flag: unspecified
[08:23:04][D][speaker_media_player.pipeline:114]: Reading FLAC file type
[08:23:04][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[08:23:04][D][ring_buffer:034][ann_read]: Created ring buffer with size 250000
[08:23:04][D][micro_wake_word:271]: Inference task is stopping, deallocating buffers
[08:23:04][D][micro_wake_word:276]: Inference task is finished, freeing task resources
[08:23:04][D][micro_wake_word:375]: State changed from STOPPING to STOPPED
[08:23:05][E][component:379]: i2s_audio.speaker cleared Error flag
[08:23:05][D][i2s_audio.speaker:102]: Starting
[08:23:05][D][i2s_audio.speaker:106]: Started
[08:23:05][D][ring_buffer:034][speaker_task]: Created ring buffer with size 48000
[08:23:06][D][i2s_audio.speaker:111]: Stopping
[08:23:06][D][i2s_audio.speaker:116]: Stopped
[08:23:06][D][micro_wake_word:357]: Starting wake word detection
[08:23:06][D][speaker_media_player:406]: State changed to IDLE
[08:23:06][D][micro_wake_word:375]: State changed from STOPPED to STARTING
[08:23:06][D][micro_wake_word:259]: Inference task has started, attempting to allocate memory for buffers
[08:23:06][D][micro_wake_word:264]: Inference task is running
[08:23:06][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD
[08:23:06][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[08:23:16][D][sensor:135]: 'Battery Voltage': Sending state 3.67600 V with 2 decimals of accuracy
[08:23:16][D][sensor:135]: 'Battery Percentage': Sending state 47.60001 % with 0 decimals of accuracy
So it specifically say it’s stopping the wake word detection to announce and then it’s starting it again when it’s done announcing and there are no errors in the log… What else can I do?
EDIT: I notice when I tap on the screen it also supposedly goes into “listening mode” of sorts, instead of the wake word, and even then nothing gets picked up. So the issue here is the microphone, not something specific to the wake word itself.
EDIT2: I didn’t want to but I opened the casing. Is what I’m seeing possible? What does it mean V1.1?
EDIT3: I got it to work. I’m not sure exactly what fixed it, I reflashed the original firmware and then reflashed this home assistant v1.0 yaml again after that, I also recharged the device to 100% and added a microsd card. One of those things did the trick, it’s now working.