A voice assistant solution with an ESP32, 2xINMP441 mics, TPA3118 amplifier on a 50w speaker, leveraging the AEC/VAD/NS capabilities how to clear voice pickup even when playing music?

I want to automate my home using a Raspberry Pi 5 running Home Assistant and for each room i want a ESP32, so far I’ve chosen the ESP32-S3-N16R8, driving a TPA3116 amplifier tied to a 50w speaker, i plan on adding more sensors such as temperature/humidity, camera, lux, but I’m tracking the most important part first i never did this and the more i research more questions i have.

ESP32-S3-N16R8 > MAX98357(DAC) > TPA3116 > 50w speaker

And on the same ESP32-S3-N16R8 > 2x INMP441(one next to the speaker and other far apart)

First i have questions about the pinout, the GPIO25 and GPIO26 are DAC’s, do i need the MAX98357 betwen the esp and the amplifier? Or can i input use these GPIOs directly onto the amplifier?

On the INMP441 documentation i saw this schematics
A stereo configuration, is this how i should wire the 2 mics to be able to use AEC/VAD/NS and i cite those since i want to play music on the speaker and also want it to hear me when i use the activation word to the voice assistant, I’m in the right path? The mics i plan to drill a hole on the speaker case in the ceiling and the other on one of the ceiling lights. Can i have more than two mics? Will be beneficial? How should i wire these? Since i read that each device has an address but since the mics are the same they have the same address so i need to wire then on different pins in the ESP? Also some of the pins can be wired to the same pin since some functions like clock and something called SPI? I’m really confused on how to wire everything, do i need more than 1 ESP?

Also i heard that i should use Squeezelite, can i have the Squeezelite software on the ESP32 and also other softwares to drive cameras, and others sensor to the same ESP?

1 Like