All that is needed is a 3.5mm TRS jack that ends with dupont connectors and then no soldering needed.
Just never found one.
Its only the Max9814 board its that any analogue preamp with silicon AGC can extend into near/far field with any USB as really they are expecting close field Mics and why often input volume is low.
So any mic preamp or even a mems with built in AGC its just the Max9814 with controllable gain and AGC is widely avail and we just lack 3.5mm TRS jack plugs ending in dupoints as surely they most exist or easily attained.
My fave USB is because its a very rare stereo ADC is the PLUGABLE USB AUDIO ADAPTER $9.95
That simple analogue x2 Mics 71.45mm spaced mic array could be used on various devices from Pi to ESP32-S3 and with a lowcost ADC so that you can use the special Alexa audio sauce in
https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/audio_front_end/README.html
Send the stereo channels to a Pi and run 2x KWS to get the KW hit to select the best channel from the BSS output.
Then you actually have a farfield mic that uses BSS for audio preprocessing.
Likely we can do the 2x KWS on a ESP32-S3 and just send the ‘Voice Command Audio’ with TFLite4Micro.
But yeah getting some easily available components would be a massive plus considering we are not talking much more than 3.5mm TRS dupoint ending jackplugs and premade HomeAssistant housings.
At least then we are a little closer to commercial performance in the quality of recognition and surely its better than pushing extremely poor quality units in terms of function and recognition purely because they come already in a housing but for purpose are relative ewaste.
On a Pi still searching for a BSS alg to make a nice efficient C/C++ routine for and its just a shame the BSS from Esspressif is a blob.
I did do a simple delaysum beamformer but a KWS needs to lock on to the command sentance and even with the respeaker 2mic which is better than some think still needs a case.
So for me someone if someone who is conversent with ESP32 hack out the above Audio Front-end Framework - ESP32-S3 - — ESP-SR latest documentation and just couple it to TFlite4Micro with the same as Esspressif do that is really just 2x KWS and the BSS stream with that highest Softmax is used.
That the 2mic preamp with AGC is made or assembled from parts using electrets or mems I don’t really care that just one exists.
Otherwise we are still at the same point with Year of the Voice where likely an Alexa or Google just works so much better and when advocating poorly fitting speakerphones and webcams even much cheaper.
PS this was just a hack for 2 existing projects that I just converted to realtime but if anyone would like to clean the code up and optimise the FFT to use Neon please do as at least it is some initial audio processing which is a massive part of what smart speakers do.
I name drop the Max9815 and that usb because apart from the 3.5mm TRS jack to duponts you can drill a hole in a case and push fit the mics into 9.5mm rubber grommets because they are avail and relatively easy.
The honest truth though is the devices and systems for voice control of a standard that many are used to on other devices may not be this year.
I have hunch because as opposed to many other algs its computational load is less that Esspressif have some form of DUET BSS ALG the math and C/C++ skills are way beyond my simple hack ability.
Noise with smartspeakers is often command voice vs media noise and the sources are often clearly spatially different.
BSS is not perfect but in that 80/20 rule such as static noise filters or AEC only processing ‘Own’ noise BSS will cover them all and likely its a variation of that Google use their Voicefilter lite as they scrapped beam forming and now have just 2 mics and lower cost.
I am not a ESP32-S3 fan boy either as keep dodging how to use there IDF but they do have an Alexa certified Audio Front-end Framework and the bits needed are actually less than they put in there S3 box systems.
You can not just stick a single mic input with no audio processing to a synthesized voice KWS to a full vocab ASR to use simple word stemming for control and say Voilà ‘Year of the Voice’.
Not with the many engineered systems most people are now used to.
Maybe call it the HomeAssistant AIY voice kit and declare scope of intent.