ESPHome Voice Assistant

Thanks Chris. I’ve really been struggling to get my Ali esp wroom-32 board to work - been trying every combination I come across on the 'tinternet.

I’ll take a gander at it when I get back home… :crossed_fingers:

1 Like

Chris, you legend!! I got the mic working on HA and I can see from the logs that all is functioning. Thank you so much, I’d almost given up hope!!

Still some work to go as I plan on adding a ld2410 mmw sensor and some decent temp sensor to my config.

@philimon121 I wouldn’t bother with the temperature sensor UNLESS you can somehow thermally isolate the sensor. As a few people have previously mentioned on various threads the LD2410 generates quite a bit of heat.

I tried it when I first got the sensors and I can agree they get really mess up any sensible room temperature! I’ve just added a BH1750 to each of my LD2410s, but with the sensor inside an enclosure I’m not a sensible value, but a reading that I can use.

Hope that helps.

You can find some of my config in here:

https://github.com/Nerivec/SmartHomeEnhanced/tree/main/VoiceAssist

I used ESP32-S3-WROOM-1-N16R8, with dual INMP441 and PCM5102A for jack output, works pretty well (with the workarounds mentioned in the link above); though I did have to automate periodic restarts of the boards to avoid issues, the voice pipeline is not really stable, nor optimized yet (wake word detection).

Details on the board can be found here (mainly folder #5):

https://github.com/vcc-gnd/YD-ESP32-S3
1 Like

Yes @ChrisThomas I’ve got Couple of Bme280’s outputting works of fiction with them being in the same case as the two Ld2410 sensors I’ve deployed. Had to install separate zigbee temp/humidity sensors:

(TMP01 are the zigbee device readings, HPD is the bme280 readings)

Just added a bh1750 to my breadboard too :grin:. Next experiment is it see if the ESP can handle the mic, amp, Ld2410, bh1750 and sht31 temp/humidity sensors all at the same time. My pcb I’m designing as it on the rear of the board so hoping that will help isolate it.

Thanks again for your help and input - much appreciated!

1 Like

I had the same problem with flashing my M5 Atom Echo with the smart speaker/mediaplayer yaml.
Playing music works fine, but using the voice assistant feature gives the “stt-no-text-recognized” error.
The solution turned out to be very simple, the button on the front of the M5 Atom Echo needs to be HELD DOWN not pressed and released immediately.
The speaker only listens to your voice while it is held down. Which is why it does not recognize any text when pressed shortly. I am sure the yaml could be adjusted to change this behavior, but for me holding works fine.

After some trial & error, I finally got the INMP441 mic working but the quality is really shit…

I am using 32bit but in order to actually hear anything, and it being able to perform STT, I need to shout so that probably even my neighbours can hear me. Additionally, it only works if I‘m pretty much just a few cm away from the mic.
Even then the recording contains so much noise that me shouting is barely recognizable - just enough for STT to work in 70-80% of cases.

The wires are not longer than 10cm.
The device I‘m working on should function as a multisensor containing temp/hum, lux, CO2/AQI, and a mmWave sensor. For prototyping they are located approx. 10-15cm apart from each other but the whole setup is in a room with quite some electronic devices (phones, computers, displays, wifi access points, …). Nothing out of the ordinary though when thinking about real world conditions for a device, especially considering the sensors should also fit in a smaller enclosure later on.

Is there anything I‘m missing here?
Listening to the recording, the quality and noise filtering needs to improve considerably to be usable.
I’m happy to hear about any suggestions.