I know I’m late to the party, but have just started working on this project.
Looking at the ESP32-Korvo-V1.1 (and also ESP32-LyraTD-MSC), I believe this off the shelf currently produced product meets the needs of the project. They can currently be purchased for around $20
It’s hardware is fully documented and open source. It looks like it meets (or at least is very close to) the quality of existing commercial Voice assistants (Google Home/Alexa). (though I wold much rather that the device had a VPU (voice processing unit) with Acoustic Echo Cancelation) Maybe the ESP32-LyraTD-MSC could be a better alternative?
Circuit schematics can be found here:
ESP32-Korvo V1.1
The core parts I believe a polished voice assistant need to have:
-Far field voice detection, using a VPU and at least an array of 3 microphones
-Fast response time from Spoken intent to audio Response
-One wake, if multiple Satellites are in range of the wake word, only use the closest satellite to interface with.
-Context aware communication, the area a satellite detects communication should be able to infer commands. i.e. turn on the light
-An table top ready enclosure, 3D printable so no reliance on specific retail parts that aren’t available globally. I find 3D printed objects unpleasing on the eye, so this would be covered in a Speaker grille textile to provide a elegant finish
-LED feedback in the form of a Ring to provide feedback for the direction of intent being picked up from the array, and other statuses
-Music playback, the device should function as a DLNA media renderer
-A decent quality speaker and amplifier, able to render audio to a enjoyable quality and volume.
Optional extra goals:
-A flush mounted ceiling mount, and other mounting options. Possibly utilizing an Echo Dot v3 form factor to utilize the ecosystem of mounting options already available
-A couple of options for Microphone array layout, Circular is good for the Centre of a room, but Linear is good for against a wall (i.e. under a TV)
-Breakout for spare GPIO (if any) from the MCU/CPU to allow easy end user modification and customization
-Be compatible with multiple different self hosted Voice assistant systems, i.e. Willow, Rhasspy, etc
-Basic sensors, Lux, Humidity, Presence, Temperature, ETC (Room Temperature is probably not a good fit, as the interior of the unit will likely have poor air flow and generate heat)
I have 3 of these boards on order to work with, but if they don’t meet the needs of the project, I will get a new set of boards designed and tested.
Concept Case designs are being drafted, and will be posted here over the next couple of days. Expect multiple edits