Looking for the best hardware for local voice assistant

Hi all, I really like to ditch my Echos and go full-blown local. I got the Voice PE in December (I think I was one the first :D) and I left heavily disappointed. I ordered now Satellite1 in hopes it handles voice a bit better (4 mics there). However, big chunk of the disappointment is Whisper - it’s just extremely hungry for resources if we want to achieve decent results, I get it.

So, I’m in the market for a new Home Assistant server that can handle STT properly. Meaning sub-second responses with accuracy on par with Alexa. Big part of the requirements is however power consumption. It’s 24/7 running machine, so that’s extremely important. Ideally, Whisper (or maybe someone will suggest another STT engine) should be run on an NPU, like AMD’s XDNA2 (there are really nice mobile CPUs coming from AND now, eg. Ryzen AI MAX+ 395) or Intel’s Core Ultra NPU.

Thing is, tll recently everything was NVIDIA and CUDA, which isn’t the case anymore. Question is, which is the best option right now - 23 May 2025? And is anyone playing with it? :slight_smile:

Curious what specifically left you disappointed with Voice PE?

  1. Wake words are recognised 1 out of 5 times (in the case of my wife, it’s 1 out of 10 times). I tried all of them, but difference was small.
  2. Extremely slow response after that. But I’m running Whisper on low-power n5105 CPU now, so that’s kinda expected. I know this is not related to Voice PE.
  3. Poor STT success ratio. It barely understands any spoken command. Could be because of bad sound quality (Voice PE unable to provide clean voice), could be of Whisper. I don’t know.

Let me know you experience with wakeword detection.

p.s. According to the docs (Introduction - FutureProofHomes Docs), while there are 4 mics, the XMOS FW only supports 2 with support for 4 mics expected later via FW update. So keep that in mind for you initial evaluations :slight_smile: (assuming the docs are up to date).

mac mini m4

1 Like

What about a semi-recent USFF system from one of the big four (Dell, Fujitsu, HP, Lenovo)? That should do the job and hopefully remain around 15 watts (or less) idle.

I kinda sad sorry to hear this as I planned to replace all my echos with voice PEs/satellite ones.
I currently plan on waiting a little a bit and wanted to invest into a Mac mini m4 as the server for this