Help to decide/understand the hardware for voice assistant

Hi all
I need some help to find out which are the options regarding the hardware for voice assistant. My goal is to be able to use it like eg google home or alexa in native language. I have a HAOS intalled on a NUC

I have read about some esphome devices, but ideally I would prefer something plug ana play (in a reasonable price) which I could install it in the living room for example and look nice or at least not to be noticed.

Ideally I woukd like to use for example an older android phone as a microphoe and why not as a speaker too, but I don’t think I can.

thanks

Hey there! :slight_smile:

At the moment, you can use a few different ways to connect with your voice assistant:

  • A microphone and speaker, based on an ESP chip set, that can be flashed via ESPHome. An example would be the m5Stack ATOM Echo.
  • Any old Android or Apple phone, tablet or smartwatch, that can run the HA companion app.
  • A separate computer, that runs microphone and speaker and sends the signal in some way to HA. An example would be a Rhasspy server and/or client, that runs on a Pi (Pi Zero, Pi2, Pi3, Pi4).

There are other devices out there, that can be connected as well, but in the end, these boil down to the three things above. Eg. there is a nice little ESP-based speaker out there, the ESPmuse, that works nicely, but it is an ESP based device.

What I’d do is check where you want to put your mic/speaker and more important, how you’d use it. Eg. I have an ATOM Echo on my livingroom table, but I’ll change its location rather sooner than later, as we always have our phones and one tablet right there, too. Doesn’t make sense, as all mobile devices are running the companion app. So one click on the ATOM Echo is nearly the same as touching two buttons on my phone.

On the other hand, you’d likely not want to touch your phone, right after you have left the shower. This will be a good place for an Echo. :slight_smile:

So my recommendation is: buy one or two of the ATOM Echo, flashing is really easy, and try them out. Get one or more (what you have at hand) Android or Apple devices and install the companion app. Give it a few weeks test run, to see, how you and your other household members are working with it. You’ll likely find some things totally different than expected, in terms of how people use these voice assistants.

I for one was sure, the mic on the livingroom table will be the most used device. Turns out, I was totally wrong. My wife uses her phone/companion app to speak her voice commands, and the next one in line is the tablet in the hallway. The Echos are fine and work great, but they are not yet located at the right places.

What I want to say is this: take your time, buy one mic/speaker beside the app, and test it. You’ll see quite quickly, where you need a mic, and where a mic is totally useless. And an ATOM Echo is around $15 each, so no real harm done in trying. If you order via the manufacturer shop, there is a discount code from NC available. :wink: Depends on where you live. :slight_smile:

1 Like

I am going to order the atom echo, but until then is there a link or sonething that I coukd see how I can use an old android phone?

Just install the Home Assistant Companion App from the PlayStore and configure it.

Configure means, the wizard will guide you to setup the connection to HA, and you can afterwards configure, which sensors you want to show up in HA. That’s it, simple as that. :slight_smile:

If you now open the app, you have the Assist icon in the header, just click it and start speaking. :slight_smile: Depending on your phone, you can even set it up as default Assistant, so a long press on the home button opens up directly the voice assistant and you can start speaking. :slight_smile:

1 Like

I will try it on friday when I get home.
but is there a way to “wake” the assistant in the phone with my voice? like hey google?

Not yet. :slight_smile: A so called “wake word” is in development, rumors say, but I’m sure it is. :laughing:

The last “Year of the voice” video on YT was expected to announce “wake word” functionality, but it seems, we have to wait for that. I’m fine with it, I’m very happy without.

But I used to use Rhasspy, where a wake word functionality is already implemented, but I didn’t like it. It had too many false positive, especially in the livingroom, when the TV was on. So for me the way better approach is to use it with a one-click-and-talk approach or with a long press on the home button on Android.

1 Like

Usally, almost everything I need is automated, and this is my prefered method. However sometimes there is a need, to ask by voice HA to do something, eg turn on a light while my hands are wet so the button wouldn’t suit me.

Just wondering if a bluetooth conference call microphone/speaker combination would work with a raspberry pi as a voice assistant. Has anyone tried that?

This is my big issue too, and I believe it always will be when the TV audio out doesn’t come from the Voice Assistant device. Subtracting the audio being played out the speaker from the audio signal input heard by the microphone is necessary to identify what sound is being created in the room. That’s why they demo Google/Alexa/Siri playing music from its integrated speaker.

Personally I use Rhasspy and turn off the satellite device (a RasPi 3A with reSpeaker 4-mic HAT) in the living room when the TV is turned on. I have added a button to pause my media centre/TV and enable Rhasspy when pressed; and to un-pause on double-tap. Not an especially elegant solution, but it works for me better than having false positives from the TV :wink:

As you are probably aware, the new HA voice assistant is the next generation by the developer of rhasspy; and I’m sure it will be well worth waiting for. Home Assistant will provide most of the processing centrally, with small cheap satellite devices (mic + speaker + communications + minimal processing) spread around the house.

Re: voice assistant hardware … currently for Rhasspy, a RasPi with reSpeaker HAT is commonly recommended; but is more expensive than google/alexa options :frowning:

I believe the limitation for ATOM Echo with Rhasspy is that it doesn’t have enough memory or CPU power to detect wakeword locally - so has to send its audio stream 24/7 over the LAN to the more powerful server to detect its wakeword. The current HA voice assistant release gets around this by using a button instead of a wakeword.

I am personally hoping that Nabu Casa will use its ESPHome expertise to develop an integrated ESPHome based satellite device at a competitive price.

Yes, in the rhasspy forum several are using Jabra conferencing devices. I understand the audio quality is excellent - but these are expensive, and designed to be used in closed conference rooms without background noise, so back to the first point :wink:

This you should be able to solve with an automation. While giving back the response to Rhasspy, you could also unpause the TV.

You shouldn’t look at this as two different tings, it is all coming together nicely. :slight_smile: HA will still provide a functionality, where you will be able to use the ECHOs, but as you said, I highly doubt, wake word will be functional on these devices. But the combination Rhasspy/HA Assist will solve this for us:

  • Assist for push-to-talk-mics
  • Rhasspy for wake-word (but, see the next paragraph!)

That’s why Paulus announced (in a side sentence), that the device they are working on right now will be an ESP S3 probably with a display. This device has enough power and space to use a wake word on the device. :slight_smile: But he also said, these aren’t available broadly, at least for now, so the team isn’t in a hurry to release anything here. :slight_smile:

As Don said, in the Rhasspy forum there are some people using these, but my highest recommendation for mic arrays are these:

They go for $20 a pop and are technically very good and reliable. They are really a quality product and run without any problems. Added bonus: you can still use the camera inside these to get a video feed or something like that. Just needs a Pi that can handle it (aka not a Pi2 :wink: ).

do you have any instructions (or a link to some) on how to use the PlayStation Eye as a voice assistant?

I am also looking for hardware to create a couple of voice assistants spread throughout the house, but would definitely want to have wake word as that is the use case I am seeing in our house.

What options are there currently for having a wake word?
Also, when using wake word, what range or how good are these devices at picking up the wake word? Google Assistant is really good in this regard I believe even when other audio from other sources are currently playing.

+1 looking for a cheap, efficient, nice looking and not power hungry device.
As already said earlier in this thread, a nabucasa speaker based on esp32 would be great (I personally do not need a display).
Right now the esp32-s3-box-3 seems out of stock everywhere…

Regards