HA Voice Preview vs Pi Zero

I really would like to know how others are finding their voice assistants in regards to reliability and operation. Just over a week I have become disillusioned with the whole Pi Zero 2 & Wyoming satellite (which is almost like a pre-alpha).

One significant issue I am encountering is the Zeros stop responding, can not SSH to reboot, but they do respond to pings. So how are the HA devices in terms of 24/7 operation. Anyone finding need to power cycling, are they reliable?

Second issue I find is I often have to repeat the wake words. Sometimes I see the blinking and think all is good, it hears the question or command…and goes blank. Worse still it definitely has problems understanding. “Kitchen” is OK, but “Bathroom” or “Work Light” can be frustrating and often needs to be repeated. I can clearly say “Bathroom” and it repeats I have no device “Bunroom” or something stupid.

Third and final issue I find more annoying. The one unit that does work OK (Pi3B) is always hearing the TV or Youtube video and think it hears the wake word. So during a typical evening watching a movie it will go off every half hour at least. So how does the HA device actually work with it’s wake word. Is this reliable, does it usually respond the first time? Does it have lot of false positives?

1 Like

I have to say that this mirrors my experience exactly.

In order to try and deal with the PiZero lock ups I used a cron job to reboot the device at midnight, but long term that wasn’t much of an improvement. Repeatedly pulling the plug eventually kills the SD card which them leads to the hassle of re-imaging - but at least it is a re-image and not a re-install!

ESP32 Box and a Pi3B-Wyoming both work exceptionally well on my desk but struggle in the ‘real’ world. Repeated false wake word detection because the TV is on or the kids have music playing rapidly lowered any approval rating for non Amazon devices in the house.

What are peoples experience of the HA Voice Preview in real life - not just in a study or workspace?

I would also really like to hear from real-life experience. I have a wyoming-sattellite running on a Pi Zero 2W with a basic USB conference speaker. I do not have any issues with lock-up, and my satellite runs fine 24/7 for days in a row though.

I did notice that the open-wakeword repo for the wyoming satellite is quite old and not been updated in quite a while (GitHub - rhasspy/wyoming-openwakeword: Wyoming protocol server for openWakeWord wake word detection system). I initially had openwakeword running on the pi zero 2W using this repo and had very poor performance. So I configured my satellite to just stream audio to my HA server and now run open-wakeword on my RPI4B which runs HA.

I have noticeably better performance. It can’t be a compute issues, since openwakeword running on the pi zero 2W was totally not stressing it, I just assume there has been further development on openwakeword which didn’t trickly down to the wyoming sattelite repo?

I connected a USB conference speaker to my Pi Zero 2W for the satellite. The specific conference speaker I use has only 1 microphone and it’s ability to pick up my voice beyond 2m is rather poor. The speaker itself can pick sound well enough from beyond 2m. If I just record an audio sample it is clearly audible, but I suspect the AGC kicks in too much and it picks up any other sounds (like keyboard typing, kitchenware sounds etc.) too strong which I suspect makes openwakenword fail beyond 2m. While the performance for voice activation less than 2m is OK, it is far beyond what a google nest speaker can achieve - that is just eerily, I only need to whisper OK google and it picks it up.

Hence I am mostly wondering how strong the two-speaker setup with this XMOS chip performs in comparison with a fairly basic 1 microphone array USB conference speaker box.

Beyond voice activation further than 2m away, it all works quite well when there is no background noise, but if there is anyone else speaking or TV is playing or radio is playing, the satellite basically picks up everything. I use google STT and that does an admirable job and I can see in the debug logs it actually (mostly) correctly transcribes all the audio, it just is not able to isolate the speaker voice from all the interfering voices and the resulting transcribed messages that get passed along to whatever conversation agent I set up (I mostly use OpenAi) just cannot make sense of it. Granted, that sounds like a very challenging thing to do, but from what I understand at least some of that is what that XMOS chip coupled with decent far-field microphones claims to do better, no?

So any input on real-life experience is appreciated.

I ended up doing a systemd timer to reboot every 12 hours, and restart wyoming-satellite every 6 hours and yet I still find after few hours the Zeros become unresponsive.

And yet the voice assistant through the app on my phone works 100% so far flawlessly, and the Pi3 is more stable (though far from perfect) so I am leaning more towards the idea of using Zeros is just not the best idea. Out of the box so to speak, i.e. following just the tutorials on the Git page will not produce a stable platform.

I think we have the same issue with pi zero 2w wyoming satellite:

  1. My satellite become offline after a period of use. I can not ping it. And I have to reboot by unplug the pi 2w. I think openwakeword running on 64 bit os causes overload on pi2w
  2. The experience with openwakeword is not good. The error rate is quite high when I watch TV.

I don’t have a HA voice kit PE to test, but with what I tested in esp32s3 voice (without xmos chip), the experience in microwakeword is better than openwakeword. So I believe that the HA voice kit will have better wakeword feature than pi2w wyoming satellite. But idk how it will compare to google home mini and amazon echo