Voice assist hardware recommendations 2024

The year of the voice has ended and I bet we all had a lot of fun experimenting with all the possibilities. I’ve startet with a M5Atom Echo and went through different esp32 with several microphones, like inmp441, MSM261S4030H0 and others I’ve found on Amazon, eBay and AliExpress. Most of them worked pretty well, while being close, but poor while being 3 meters/10 foot away, specially with other noices around like TV.
So I’m now stuck at hardware recommendations and settings to kick out Alexa. Anyone else working on all day, all environment solutions? Own PCB with custom components? My 2024 goal is to develop my own PCB with esp32 (s3), ws2812 Led, Mic and MAX98357A to have one in every room (20+ needed) for full voice control in new house.
Also anyone tried to use more then one microphone? I know, Alexa is using 7. Wouldn’t this be better to improve? But how to set up esphome for multiple microphones?

Best regards Moritz

3 Likes

Willow Voice Assistant - Hardware - Home Assistant Community (home-assistant.io)

1 Like

I’ve stumbled over that as well, was reading it all and also hardware support and HA integration.
It’s nice, but don’t want to buy 20+ S3 boxes for 50$+ each and not running my own WIS server or send information to a cloud. So my question is really for HA “year of the voice” recommendations and not other solutions.

2 Likes

Gotcha. Just throwing it out there.

How far have you gotten on this, Moritz? I’m in the same boat, I’ve been using AtomEchos but they’re just a starting point and I’d like something with better audio pickup and response.

I too would like one per room, so $50/pc is a bit much. I have a PCB mill and was thinking to just start designing something to prototype, preferably with POE and presence detection on it.

This is my current state:


PCB is ready and after CNY I’ll order them. It’s based around a Esp32-S3-WROOM-1-N16R8 (16mb flash, 8mb psram). I have 4 microphones on it, direction left/right and 45° to bottom left/right. So should do voice pickup at any direction.
I also have solder points for ld2450, ld2410 for present and movement detection. Bh1750 is for light intensity, bme680 for temperature and air pressure, mq2 for smoke detection and mics5524 and mics4514 for additional air quality. Later on, while testing, I’ll see which once I actually use and which not. :sweat_smile:
Also I have a buzzer and 6 LED for smoke alarm and status indication on board.

When the PCB are here, I’ll see what I actually can do with them, but the price is around 12$/piece, without additional modules. I’m not yet sure which microphones I’ll should use and if I put them on PCB directly or as modules to solder. This will be 1$/each and with smoke, temperature, humidity and so on, it will be around 25-30$/piece, which is ok for me for a every room and area device for smoke, air and presence detection.

P.S. I’m not planning to use a speaker or voice feedback. It’s nice, but I’m not using it and when light goes on, I don’t need someone telling me “light is on”. So I’m just using buzzer and LEDs as response.

4 Likes

That’s excellent! Adding a speaker wouldn’t be necessary though it would be a nice feature to go with all the rest you have on it. I’m guessing from the pins I see you’re planning on INMP441 mics, so correct me if I’m wrong, but one could hijack one or two of the inputs for an I2S speaker like MAX98357A?

Are you submitting that for the Voice Assistant contest?

You are wrong. :rofl: The es7210 I want to use, uses PDM mic’s. So I’m going by these instead of i2s. INMP are deprecated for long time and you don’t get good once, just China clones. Espressif Lobo and S3-Box documents give some hints for useful microphones, but also MP34DT01 (from adafruit) and others should work.
Breaking out i2s later on is easy for i2s amplifier. S3 can map it to every available pin. So basically I can already test it with current layout and 2nd i2s interface (which is used by ES8311 on Korvo/S3.

Only if it’s working :rofl:

1 Like

I haven’t run across the PDM mics yet, so that’s interesting. I was just guessing on I2S from the number of pins rather than trying to trace the connections back to the MCU.

I had looked at using something like the S3Box but the expense and availability were a downside.

Well, I look forward to seeing your progress. Hopefully you update the topic so I get notifications as you progress.

Nice work! I think you will see a lot of interest in the contest if you decide to post, this has a lot of features that would be popular.

Every PDM mic has data (, clock) and L/R channel, same as i2s. The difference is how they transmit data. In HA esphome microphone configuration you already have the PDM true/false option.

I hope I don’t forget to update the topic if there is progress. :sweat_smile:

You’ve made me overthink my project and I’ve changed it in some parts:
I2S header is now broken out for speaker to connect. (Bottom left above R9)
The tested devices are directly on the main PCB (including 4 microphones). 4 more Pins are on the back of PCB to attache something else. :smiley: There is 1 spare I2S, so you could even connect a 2nd amp and mic with.

I’ve submitted it to the contest and while I’m working on code/yaml, I wait till the prototypes arrive.

2 Likes

I’m pretty excited about this idea! I came here looking for an all-in-one kinda box for every room to have voice control and environmental sensors. Please keep us updated. I’d definitely be interested in buying a few of these if you end up making them (or having them made).

Well, PCBs arrived today.
I had a quick look, plugged them in, tested light sensor and LEDs, these are working. Now need to test mmWave and voice over weekend.

P.S. more information can be found at the voice contest section.

3 Likes

I’m on the hunt for a solution on the mic end. Has this progressed any further?

Check the follow-up post here @Duck72790

Any news? This looks great.

I am a bit jealous. I recently learned how to create my own pcbs, and was looking forward to replace Alexa’s in my house. The issue I ran into (didn’t even get a chance to test mic functionality) was that my setup would have to be in spanish, and speech to text (I think whisper) was basically unusable. I did try it in english for a bit, and was pretty impressed on how well it seemed to work.

if you’re OK with less flexibility / no LLM - try picovoice rhino. It supports Spanish and works locally.