Hardware required for Assist with wake word

Are there any hardware recommendations for running an assist pipeline with wake word?
I’ve bought a M5stack atom echo and set up everything according to the blog guide $13 voice assistant for Home Assistant

I’m running my home assistant on a HP Thin Client t520 (AMD GX-212JC 1.2GHz dual core)

It’s “working” but the experience is terrible, enabling the wakeword detection makes my CPU go from < 10% to constantly above 70%, is it expected to be resource hungry or is my server way way way to slow?

Besides that, the rest of the experience is terrible, slow stt-processing and barely ever recognizing a word, sometimes what it picks up isn’t even words and that’s with me articulating very loudly and close to the mic. Using swedish with default settings.

So is my server to blame and I should upgrade or should I just give up on a local voice assistant all together?

Are you doing video processing? Plex, NVR, frigate, etc.

I’d imagine voice processing can eat resources at least intermittently and experience may be impacted if resources are insufficient.

This may be a good guideline to follow.

Thanks for the link, I will look in to the specifics. I can certainly understand that voice processing takes resources and it seems to be requiring more resources when I select a “bigger” language model.

But my first hurdle is really that the wake word is putting a strain on my cpu, from the log it seems that it is just listening, times out and restarts. Took me a bit by surprise.
I don’t know the details on how “hey siri” and other similar processes work but isn’t that a tiny chip listening for the wake word which then activates the main chip? Otherwise my phone should be drained in a few hours

IPhone has dedicated hardware chip for voice processing. The phone and chips are designed around this task. This is not the same as off the shelf mic and PC.

I haven’t spent much time learning about this but hey siri/Alexa is extremely difficult process greatly supported by the audio processing and dedicated hardware along with optimized software. I believe this is why them adding additional wake words took so long.

After several weeks of on & off effort, I finally got all the right chips & the kinks out of my Voice test setup yesterday. I’m using a remote node for processing input and output, and built that node using an ESP32 WROOM, ICS43434 i2s microphone, Max98357 i2s 3W Class D Amplifier, a small speaker, and ESPHome.

This is the second most difficult task I’ve undertaken in quit a while, second only to the 1000 line ESPHome ESP32 app I wrote to control my Pellet Stove. Wiring the voice supporting hardware was challenging only because all the i2s lines are called something different on each device. I thought i2s was a “standard” in the formal sense… guess not…

Then there’s Piper, Whisper, OpenWakeWord, pipelines, etc. And dozens of options for all of those that all need to be set correctly.

It’s a heavy lift.

Don’t get me wrong… this is early stuff, and as a formally trained audio electronics engineer I completely and deeply understand the massive complexity involved here. Most people think that just because Amazon and and Google have cute, little, privacy-invasive devices that mostly work, it should be easy to replicate that on the Home Assistant eco system. Hardly.

While I can say “Hey Jarvis (led recognizing wake word then lights), turn on light” and it responds “Turned on light” and the light goes on, I’m seeing VERY odd behavior in the ESPHome logs, and it takes a few reboots of the ESP for everything to settle down, connect all the pipelines & api’s and be ready to accept text input. And only then does the flakiness of Assistant really show its colors. But… the light did turn on after I told it to.

We are all witnessing the birth of real independence from Amazon and Google. It’s about time. I feel that this proof of concept some of us are now replicating will spawn a massive push to simplify, streamline and improve this amazing core voice functionality of our beloved Home Assistant Platform. We will also start seeing custom i2s voice processing chips that are designed with all of us makers in mind (think Wake Words in a chip) start coming to market. Give it time.

I for one am really excited to see how this all matures and I’ll be experimenting & testing all winter long and for as long as it takes to truly unplug & replace Alexa with Home Assistant. Her days are now actually numbered. --Jeff

4 Likes