[Interest Check] Building the "Goldilocks" Local Voice Node (Orange Pi 5 vs. Jetson)

Hi everyone,

I’m writing this because, like many of you, I have been chasing the “perfect” local voice assistant setup for Home Assistant—and I’ve been pretty frustrated with the existing options.

I wanted a voice assistant that was fully local (no cloud API fees/privacy leaks) but actually fast.

  • Raspberry Pi 5/N100: I tried these, but waiting 5-10 seconds for a response makes the assistant feel “dumb” and robotic.
  • Gaming PC: I didn’t want to run a 500W GPU server 24/7 just to turn on my lights.
  • Cloud: Fast, but defeats the purpose of self-hosting.

I’ve spent the last few months prototyping dedicated hardware to find the “Goldilocks” zone—devices with dedicated NPUs (Neural Processing Units) that sip power but run LLMs fast enough to feel conversational.

I’ve finally got a setup that works reliably using the Wyoming protocol, and I’m considering building a small batch for the community. I would offer these essentially at-cost (hardware + shipping) for anyone else who is tired of the latency struggle.

I wanted to gauge interest on the two “winning” configurations I’ve found:

Option 1: The “Budget” Sweet Spot (Orange Pi 5 / RK3588)

  • The Hardware: Rockchip RK3588 with 8GB RAM.
  • The Performance: Runs Llama 3.2 3B at ~15–20 tokens/sec.
  • My Take: This is the baseline for a usable voice assistant. It’s significantly faster than a Pi 5. The NPU drivers were a pain to configure, but now that it’s running, the experience is solid. It feels like a smart speaker, not a science experiment.
  • Estimated Cost: ~$130 range.

Option 2: The “Premium” Experience (NVIDIA Jetson Orin Nano)

  • The Hardware: NVIDIA Orin Nano (8GB) with a fast NVMe SSD.
  • The Performance: Runs Llama 3.2 3B at ~40+ tokens/sec.
  • My Take: To be honest, this is my personal favorite. The response is near-instant (sub-second latency). It feels just as snappy as Alexa or Google, but it’s 100% yours.
  • Why NVMe? I strictly use NVMe drives for these builds. I tested SD cards, and the “cold start” delay (loading the model into RAM) was nearly 30 seconds. With NVMe, it’s instant.
  • Estimated Cost: ~$250 range.

(Note: I know used M1 Mac Minis are a popular alternative in the $200+ range. They are great, but since the used market is a gamble, I can’t really “build” a consistent, reliable batch of them for the community, so I’m focusing on new embedded hardware here.)

Why I’m doing this

If you’ve ever tried to set up rkllm or JetPack drivers manually, you know it’s not exactly plug-and-play. My goal is to pre-build these “AI Nodes” so you can just plug them into ethernet, point Home Assistant to the IP, and finally have a voice assistant that doesn’t lag.

Would you be interested in picking one of these up?

If so, does the budget-friendly Orange Pi appeal to you, or is the instant-response of the Jetson worth the extra cost?

Thanks!

Would probably be interested in your faster option.

1 Like

The world seems to move towards bigger LLMs, so 8Gb would be too low for me to even consider it.
16Gb would be the absolute minimum and 32Gb or more would interesting, but that means quite a price jump.

2 Likes

Have you also looked into the option of using a miniPC/nuc for low energy usage and and vram via an eGPU.

I would be interested in the results if any

1 Like

I’ve definitely considered it, but while the nuc would use little energy, you’re essentially moving the power usage to the eGPU. also, you’ll lose a bit of performance because of the connection overhead + PCIe x4 bottleneck. That said, maybe you won’t notice the difference as end user using it as a voice assistant.

I’m running a NUC with EGPU there is ZERO latency. Use thunderbolt or occulink.

Im with wally a pi nor a jetson simply won’t cut it long term for serious llm work. you will be severely limited in context window. 8k is the absolute minimum context window size and realistically youll need a tool user model like gptoss20b for long term success. That Llama model is good for simple light control but can’t tool chain well at all.

Real GPU with at least 16g of vram if you want to do this long term and do anything more than simple voice control.

Im happy it’s working but I also don’t want people thinking the setup is adequate as a general use llm setup long term.

I own a reComputer J4011 with a Jetson Orin NX 8GB but I found LLM’s to be extremely slow and limited due to only 8GB when even providing only a small number of entities to it. As for Whisper/piper I struggled to find information on how to run those using acceleration so I can definitely see use if you got that working. But I doubt I’d be willing to buy the hardware again that I have already. I’m happily running Frigate on it now after adding a beefier NVME. But that too required some hefty trial and error. The Jetpack stuff is a horror to maintain and sudo apt upgrade will break things.

Completely agree that it’s not meant to be a general purpose LLM server. This is simply meant to help you control your home or set up automations. I’m working on a few ideas including storing entities and/or automation script scaffolding on a local vector db with scripts to then handle certains flows deterministically. Basically, the LLM would answer to you in a reasonable manner, but with actual workflows, they would be deterministic and the LLM would simply make a decision one which to run, hopefully keeping the complexity and hallucinations at bay (for the most part).

Happy to share all of this as open-source or just load it on the hardware for free. I just feel we’re starting to see the price of hardware and capabilities of models start to meet at a reasonable place for the casual, privacy-minded user. What I personally want out of this is folks to use these in creative ways and push us further forward since I’m not good at coming up with cool ideas.

1 Like

I’m afraid you probably won’t have enough to do that. You’re going to blow context out of the water with fewer than 30 entities exposed. I’m sorry it’s going to be a rough road.

LLM response is all about context and you’re going to be tight on that from the start. Id urge you to beef the llm side of things before you even try any rag work. Adding rag needs even more context on your llm not less you’ll spend your time fighting an install not enjoying any results

So, for context management, I found cutting up the process into small pieces and storing a plan on a file(s) works fairly well. You do have to include that part as instructions in the system prompt, but completely do-able.

For rag, unless you mean actual embedding, querying the information is a simple db query, which is not complex at all. I’m using postgres with pgvector extension.

I am very familiar with what it takes. And you simply don’t have enough butt in that machine to do anything besides basic light control. Vector makes it worse. (this is my day job)

We’ve been having success with presummarized content. But even there you still need more than what you have.

Bring up the llm to a real GPU with a reasonable amount of vram and that all changes. But please don’t recommend people llm on an orange pi or a jetson nano in low ram conditions it kist leads to pain. They are good voice response and will work great for Wyoming but the llm part is simply lacking for any serious control. You must ground an LLM for decent result and the minute you start throwing anything bigger than you are an assistant at that thing it will fold.

If you want to join the research we’ve been doing in Fridays party to see what it takes please do. All are welcome. But we already determined that the bare minimum to be somewhat successful with local llm is an actual GPU or npu with the absolute minimum of 8g vram but 16 for anything serious. With a context window in GPU of 8k or preferably more.

Can you share the link to the Friday party? Would be interested in joining. My day job is building systems in a highly regulated environment, so there are certainly learnings there (around auditability and risk management), but I have these hobby projects to let me explore beyond that. I am also curious what the goals are for the Friday partys - meaning, is the goal specifically to build a device for Home Assistant or more general?

For more general LLM, I agree with you. In fact, I have an RTX 3060 in my “general LLM” machine and it’s even not good enough. It’s a constant uphill battle because newer and better models come out so quickly and it’s hard to un-see the cutting edge models. I’m looking to do a “minimum upgrade” to 4090, or at least to a 16GB GPU. But I digress.

As long as you accept the limitations of this smaller hardware and happy to use it solely for private control over your home, my tests show adequate results.

certainly - Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools

Almost installable. ALMOST.

1 Like