Best setup for voice control in a modern new house?

meliborn · November 12, 2024, 11:51pm

I’m building a new house and want to implement voice control for home automation (HA). My goals are as follows:

To have a separate microphone in each room (approximately 11 rooms).
To use multiple microphones in a large room without interference issues.
To have high-quality microphones with noise reduction, similar to those found in smartphones.
To discreetly integrate the microphones into a modern, minimalistic interior design—no LEDs or audible voice responses.
If using wired microphones, they should function effectively at distances greater than 50 meters from the HA server.
The system should operate without internet connectivity. Whisper and Piper are acceptable options.

I have looked into devices like the M5Stack ATOM Echo and similar products, but it seems their microphones may not meet the required range and quality for practical use.

I will be setting up a multi-room audio system, such as Sonos, and I want the HA system to respond to commands directed to it rather than through localized microphones (as the ATOM does).

Cost is not a concern; I am simply looking for the best setup. Any suggestions?

jackjourneyman · November 13, 2024, 12:19am

Welcome to the forum.

In a smart home context, I don’t think any of your goals are achievable at the moment - particularly the microphones. It would all have to be purpose built and would probably be out of date in six months.

Voice control is still in its infancy, changing very rapidly and it’s far from clear what direction it’s going to move in. Amazon took a punt very early on with Alexa, and to some extent set standards, but that has all been upended by the arrival of LLMs.

Best advice would be to make sure that anything you install can be removed easily and upgraded without doing any damage to the fabric of the house.

Sonos is another matter. I use them for TTS and they’re great. Two issues, though:

Where they have microphones, there doesn’t seem to be any third-party access to them. You can mute them with a physical button, but that’s it. As far as I know they can’t be used by HA.
Where you use them solely for TTS, there is the added complexity of deciding which speakers to use - it usually has to be the room where the command was issued, but it may not be obvious which one that is in a system like HA which has a central server.

I assume you’re already familiar with HA. If not I’d suggest you play with it for a few months to see what’s possible.

meliborn · November 18, 2024, 3:18pm

That’s my hunch - make system flexible as much as possible.
Let’s leave voice recognition to the server and speak about mics setup. I was trying M5echo, not so bad, but the range detection isn’t so far.

Back to Sonos - I don’t want to use their mics for voice recognition (i don’t think it’s possible). I want to use their speakers for the sound response.

So the question is which exactly mics to use?

P.S. I’m pretty familiar with HA.

jackjourneyman · November 18, 2024, 3:48pm

In that case, the main issue I’ve had is deciding which speakers should respond. I use a combination of movement and Bluetooth tracking to decide which room is in use, but the more people there are, the more complicated it gets if you can’t detect which microphone was used.

baudneo · November 18, 2024, 4:16pm

You’re a bit ahead of the development of this feature. What you want is what everyone is aiming for, but we aren’t there yet.

Your use case would require testing several microphone setups. On top of the microphones, you also need to have audio signal processing. From what I know, the only actual option available for that, that is supported (somewhat) is something like the reSpeaker Lite board with an esp32-s3 as the MCU. The s3 allows for on device wake word detection and VAD. The respeaker uses an xmos chip for signal processing and spits it out to the esp which in turn sends it to Hass for Hass to process. Hass may then pass the audio data on to whisper and such via Wyoming.

If you don’t do signal processing, you’ll never have a good experience. If you set up a voice assistant that doesn’t do wake word on device, it will be streaming audio 24/7 to Hass for it to shuffle around and listen for the wakeword, so if you want 11 microphones that’s 11 - 24/7 audio data streams being fed to the wake word setup.

If I were in your shoes, I would plumb everything needed like speaker wires, aux cords, power, etc. and wait a bit for the dust to settle on how this all plays out.

Another thing to be aware of is, there are microphone arrays ment for directional and omni-directional. So, type & placement of the microphones is quite important. Also, you would want to have the microphone somewhat seperate from the speaker, to increase wakeword detection chances. Alexa and Google do a great job with an all in one device, but we don’t have the R&D budget for purpose built enclosures that help with far field wake word detection and keep the output somewhat segregated from the input.

meliborn · November 18, 2024, 5:45pm

Every microphone places in each room and has it’s own uniq device in HA. In terms of HA, it is too complicated to determine from which microphone the signal is coming. When I was tasting M5Stack ATOM Echo on the setup step it asked me an area for association.

andrewballinger · February 4, 2025, 9:53pm

I have something semi-functional by putting the HA app on my phone and configuring it as the default voice assistant. It’s not perfect (I still need to grab my communicator and hold the power button to issue commands) but it’s a step in the right direction (ToS computers, but not TNG computers).

This isn’t a direct answer to your question, since it’s not what you asked for, but this is the best setup I have so far.