Home assistant voice preview edition weak mic?

Hy there. I’v just got my Home Assitant voice PE, and it picks up “hey jarvis” only when i’m at max 1m from the device. Its is by design, or there could be something wrong with my specific unit ?

2 Likes

Im also experiencing the same issue.

Almost have to shout at the device even when in the same room.

Using the “Ok nabu” wake word.

After it has woken up, it also often misinterpret or don’t understand my commands at all unless i speak very loudly. (using swedish language with ha cloud)

Any more people having this issue?

If this is by design, sadly i’ll have to return the device.

3 Likes

It’s probably not “by design” but it’s just how it is. As much as the XMOS chip is hyped up in it, none of the hardware out there using it works especially well, especially if you’re used to the far-better setups used in the big-name commercial VA hardware.

In my testing, I think there’s a few issues. First, the XMOS chip is really bad about filtering out audio being generated by the device – above about half volume, wake words stop working so “stop” doesn’t work with alarms unless you’re right on top of it. I have it on my list to remove it from the case and see if it still has that problem – it could be distorted audio conducting through the case causing the problem, or it could be room echos.

Second, microwakeword is an amazing bit of code, but it really doesn’t work very well. “Okay Nabu” works reasonably, but it’s just such a dumb wake-word. I don’t especially like the whole “hey ____” thing, but I get how it helps with false-positives. But “Okay”? Yeah, no. But “hey Jarvis” is a lot less accurate. But, really, I get it – Amazon and Google have hundreds of millions of samples of people using their wake words to train with, both positive and negative. And no one has figured out how to do an automatic training that works well without real-world data. IMO, a lot of the struggle using the Vpe stems directly from this. I have one of mine reconfigured to use openwakeword and it is far more reliable. Still not even close to a $25 Google Hub Mini, but… better.

Lastly, I’ve found that Whisper’s quality of STT drops off enormously as the signal gets noisier, far more than what Nabu Casa’s hosted STT does. (Which is weird, as I would’ve just assumed they were running Whisper, but maybe they’ve got non-default settings?) So even when it wakes and the XMOS chip is doing automatic gain control and boosting levels, they don’t work from across the room without basically shouting to keep the AGC in check.

Now, some of those things can be software fixed, but a lot can’t. But – there really aren’t better options out there that are “open”, so… you have to pick your poison. You can use Alexa or Google to feed commands into your HA and be limited to the configuration they’ve got, or DIY and be limited to the hardware you can get.

Personally, I like having the Vpes. Although, so far, my wife is not as sold on them. I think part of that is the training on “hey Jarvis” is especially biased towards male voices. She can’t get it to activate 3/4 of the time.

IMO, that’s the biggest thing that needs improving – they really, really need wake words that work reliably.

5 Likes

All 3 wake words work for 4 to 5 times and then they just stop working. I have to change the word in the settings and it works for 4 or 5 times again. Did not find a solution yet.

2 Likes

I’m experiencing the same issue with the “Okay Nabu” wake word. I have to speak very loudly for it to activate, which is quite frustrating. I’ve tried different setups, but nothing seems to help.

I’m seriously considering returning the device…

1 Like

Same, wake words are hit or miss. Have to resort to hitting the button

Same issues here. The wake words are awful and do not work 9 times out of 10. I know it’s a preview but currently it’s a £60 paperweight. :person_facepalming:

I have just got the PE edition a few days ago too. Hey Jarvis only works for me, 1 in 20 times it feels. Thought my wife and trigger it 1 in 3 times. Lower pitch in voice. My kids both can’t trigger it at all. Even 10cm away or so. I have yet to try the other two. But I preferer hey Jarvis anyway.

Just my experience with this so far. I have not really used it much yet. But future looks promising.

–edit, I am Australian if matters. Using HA cloud for TTS/STT both ways.
Running beta firmware.

same problem

I’m also Australian, and am finding similar issues. For my voice, “Okay Nabu” is recognised about half the time, but my wife (also Australian) isn’t able to trigger it at all with her voice.

I have been experiencing this. At times it’s OK, then at times I have to either yell, repeat, or just press the button. 5 out of 6 times it just doesn’t respond.
I have been searching for a way to add something via the jack perhaps to hear better. So far no cigar.

Same issue here. I have to yell, say the wake word several times, or hit the button after the 3rd or 4th try. Also, I have to be pretty much be on top of the PE for it to wake. I was hoping I could be able to be able to wake it from another room. Like Alexa/Google can be.

+1 - hopefully not a hardware issue

For me the same issue happens with “Okay Nabu”. Even it somehow has a preference with male voices since it works very well with me, but it struggles with my wife.

Also I am using whisper locally for STT and when I use my phone to talk with the HA Assistant it understands me perfectly, but talking to the HA Voice PE, it struggles must of the time to understand exactly what I want to say. Hopefully is not a hardware issue, but looks like it :cry:

Same here. It’s basically unusable. It’s not just wake words, it barely understands any spoken phrase. I wonder if it’s because it has just 2 mics, while the cheap Echo Dot has four. But still, this XMOS chip should do the trick with 2 mics, right? Or maybe not.

same. understands basically zero … .full disappointment.

ok nabu. whats the temperature… No entity temporturiaamme found.

It’s confusing how different the experience with the VPE is.

Yes, the wake word detection isn’t on par with the big companies, but I don’t have problems with the VPE not understanding what I say once the wake word has been recognized.

Works very good, both when set to English or to German for me.
Even with pretty loud music playing in the same room.

I just looked through the recognized texts in the assistant debug view, and I only found one wrong word from the last days (And I use it quite heavily at the moment, as I write some new scripts/tools for assist).
In this case I didn’t even recognize the error, as the LLM assistant still figured out what I meant and gave me the correct answer. :smile:

Thought i’d chime in with my experiences with the vpe in dutch. My issues are mainly on the STT part. I tried multiple but not even with the paid cloud versions (Whisper-1/gpt4-transcribe) did I get consistently good results which makes me suspect issues with the hardware.

Acoustics of the room and placement impact quality of the vpe microphone more than I expected.

Initial testing having the vpe in the bedroom on a bedside table had quite good results. this gave me a very false sense of confidence…

Because when I set the vpe in the living room on a closet against a wall (at around 120cm height) it completely crapped the bed. 0% success rate. Wake word detection was still great, but NONE of the commands were recognized correctly. I was at a complete loss since the bedroom tests were quite ok. Not perfect but def. usable.

I started debugging and listening to various .wav files and while i was able to clearly discern what i was saying, there was noise going on. I am not an expert in this type of thing so I can’t judge where the noise might be coming from. Maybe the STT’s I used struggle with noise?

People that have no issue with the STT using vpe: Which STT do you use? did you tweak any settings on both the STT and vpe?

I want to support the cause so I will keep my vpe. Fun to play around with it anyways but if I cant get this to work I’ll have to rely on Google home a bit longer.

Chiming in with my experience over the past couple of months trying to get this up and running. Using English, have the VoicePE hanging on a wall in a quiet environment. Wake word works most of the time (using hey Jarvis) but after that success rate is pretty abysmal. Have some scripts set up to play music and it almost never hears what I’m saying correctly. For example, if I try to say “Play mix one” then it will hear variations such as “plate/plane/played” and "mixed/next/knicks.’ I have edited the automations I’m using to try and account for the different commands but it seems like each day it finds something new to mishear. I know this is a preview edition so I’m not expecting perfection out of the box, and don’t mind tinkering a bit to get things working well, but this is rather disappointing. Are there no other better solutions with better microphones available?

Also, someone mentioned that they were able to listen to .wav files of their own commands, is that available by default? The only thing I can find is in the \config\tts and those are .flac files of the assistant’s responses. I’m aware of the debug assistant section that shows the text of what it hears which is how I’ve updated my automations, just haven’t been able to locate the sound files of the commands I’ve given.

You can do that by adding the following to your configuration.yaml file:

assist_pipeline:
  debug_recording_dir: /share/assist_pipeline 

My own experience with the Voice Assistant PE has been weird. I’m not sure which parts of this are the device itself vs Home Assistant, ESPHome, or other things.

When I first got the device and was using in February 2025, I had decent luck with it. The wake word sometimes didn’t pick up, but the transcription (via faster-whisper with - I think - default configuration settings) worked well. It worked so well that I was able to say commands to the device from upstairs.

Now, it does an OK job understanding certain phrases, but some stuff it really struggles with. I created an alias “day bright all.”

I’ve gotten the following, from the voice assistant debug:

  • Daybreak’s all.
  • D-Bright’s ball.
  • D-Bright’s ball.
  • D-Brate All
  • Date writes all.
  • Dave Bright Hall.
  • Day bright all

That’s not cherry picked; that’s the actual series of attempts, with the last few being after I did a factory reset & reuploaded the firmware to the device.

Some of this is likely the fact that this is a nonsensical phrase; Whisper maybe does less well with these sorts of things out of context. I also did those tests with my “Day bright all” alias missing- but, the issue was with the actual STT, not the behavior after STT. (Does HA somehow pre-load Whisper with the aliases you have, such that the transcription works better for phrases you have saved? I would assume not, but I certainly don’t know.)

Regardless, my experience with this was “it works pretty well” at the start to not working well for the last several months.