Very low wake word detection and speech recognition rate on Voice PE

Got my Voice PE a few days ago and my biggest issue with it is the very low wake word recognition ( I am using the German language btw). I know comparing the voice PE with Alexa is not fair, but that is what I have here to test. With the PE I have to be at max 1.5 meters away and I have to raise my voice quite a bit for it to even get the wake word. Understanding the command is a 1 in 10 chance. On the Alexa device on the other hand I can be 5+ meters away and successfully give commands without having to raise my voice. The difference between the two devices in regards to recognition and command execution it pretty hefty.

I did enable the debug option so it saves the wav files to disk. I can personally understand the command from the wav though there is a base noise floor to the whole recording that could prevent the stt from recognising the speech.

Now, is there an option to tweak the noise suppression and the mic gain without having to take over the device in ESPHome? I don’t really fancy that process given that I could run into all sorts of troubles or incompatibilities later down the road.

You’ll have to take over the device.

Hi
I have had my Voice PE for roughly a week. I think it does a great job

I also have 5 home made Wyoming Satellites. They detect false wake words so much that I had to disable the microphone on the one in the bed room because even my scorring makes it think I say “Hej Jarvis”

I am for the moment still using Okey Nabu on the Voice PE. Which wakeword are you using?
The first hour I had problems because I kept on saying “Hey Nabu” and that worked 1 in 10. Then I realized that it is “Okey Nabu” and it reacted 100% reliably as long as I am in the same room.

One thing Amazon Echoes does much better is detecting anything when the TV is talking. But that is also a really difficult task and I am deeply impressed with the Amazon devices for that.

I wish it was that sensitive here. Funny enough hey Jarvis works even worse than ok Nabu. Also tried it without backgmusic or anything , it’s not really any better. The ground noise floor in my place is between 38 and 42 db, so pretty average I guess. It’s a bit of a letdown.

I have noticed that some IT devices with multiple microphones and echo cancelling can have problems when the device is placed in front of surfaces that reflect a lot of sound. You want to try a placement away from wall. Even rotating the device so the two microphones pick up reflected sound differently can make a difference

Try the stand I made and posted upstream. Really improves detection.

I just changed my wake word to Hey Jarvis. I could not make it work until I tried to say Hey Jarvis without much pause between Hey and Jarvis. I think Germans and Danes naturally put a long space between the two words when we try to speak clearly and obviously the wake word was trained by more native English speakers. Try and say the wake word faster with very little pause between the two words.