I have the HA Voice Preview Edition with Whisper STT running on Intel N5105 with very poor performance. Basically it’s pretty much unusable. So I’m thinking to upgrade my HA server to something a bit more decent that can run a local LLM as well. I want the full Jarvis experience.
Problem is, I don’t want to pay the electricity bills for it. I see there are a few Copilot+ NUCs recently introduced (mainly around CES), like for example the ASUS NUC 14 Pro AI. Basically my question here is, is this gonna be enough for the experience I want to achieve?
What do you use? Are you happy with it? What would you recommend? Anything would be of help here, I’m literally at ground zero. And I believe a lot of people are at this point as well now, so it would be a potentially interesting thread.
A NUC pro14 ai runs approximately 100 TOPS. Its split between GPU NPU and CPU processing
For context…
That’s BARELY enough to run Llama3:8b (Llama38b models are typical for local ai running HA… I can currently run it successfully on a Nuc14 AI. but with challenges) if you jump through hoops for the Intel ARC architecture based installation (if you don’t fo this often most everyone uses Nvidia iron so Intel is an odd setup using Intel Ipex and more complex. Doable but complex.)
So for full Jarvis. You’re just BARELY on the leading edge of the capable hardware doable but will require care and feeding.
Thanks for the quick reply, @NathanCu. What kind of challenges, “care and feeding” we are speaking about here? I understand such kind of horse power just touches the surface, but at the end, if the next best thing is a $3K GPU setup, and “divorce level” electricity bills, I’ll take it.
Short version m I have to drastically cut back my expectations on what I can make it do.
Biggest problem… To be successful I had to get into limiting number of entities I can control and limiting how much context I can stuff into the prompt. Torre also working on a very limited chat context window. I may be able to do something about that part with settings but… In short if you have a large complex install local with that hardware will be difficult.
A few number here.
I run a small LLM on a dedicated pc in a 12 Gb vram GPU. I can get acceptable responses witin a 3-9 seconds time frame
The pc draws 54W idle but peaks to 250-300W when processing a request.
Does it work? Yes.
Now for me that idle power consumption is far to high fir a few requests a day and I am looking/waiting for a better solution. Peak power is just a second of 15 so that doesn’t worry me much.
Yes it is. It does nothing more than take voice and convert it to text. That’s the light lift here. Well as long as you have enough horsepower to do it it’s lighter than the llm requirements.
Then you have two choices you can let the HA voice intent system take a stab at trying to match the request. (the part op is disappointed with, and I agree it’s pattern matching and it’s cumbersome)
Or you can hand that text to an LLM to interpret and then do the things. It’s what an LLM does well… If you can give it enough context and enough horsepower.
If i were building a machine for hardware acceleratied ai use RN I would use Nvidia based hardware. Why. Because yes it’s a cheap miniPC but you’ll spend most of your time getting it to do your work because the Intel specific parts of this still need (a lot of) work while Nvidia being the big dog just works.
My rig is totally experimental because I do this sort of thing for a paycheck. The Intel gear requires a LOT of extra work to get the AI acceleration through the Intel IPEX stack and an Alchemist or Battlemage. It totally WORKS but unless you’re a tinkerer or Intel makes ipex WAY easier I wouldn’t recommend it right now. Everything is built for Nvidia CUDA architecture at the moment. Yes whisper/piper is not this but let’s face it if you buy the box. You’re going to try Ai here too.
Ultimately I plan on running the local ai stack on Nvidia Digits hardware
So tl:dr, unless you’re a pro, or otherwise have a reason to id stick to the Nvidia stack for AI stuff right now.
well 3k for a NVIDIA DIGITS could be not affordable…spending 20-40 dollars in ChatGPT plus will be cheaper for more than 6 years…(without considering the bill)
I then can run digits flat out continually processing EVERYTHING whereas I’m being VERY stingy with my tokens at present time.
The operative here being I think as far as usefulness for me a machine capable of running a 100b model not quantized it probably as close to ‘Jarvis’ as anything I’ve ever expected to achieve. To do that with tokens now will cost me well over $100/month.
With local capable hardware I don’t care anymore and a lot of things that I used to write software for I would send EVERYTHING to the ai to analyze on background and have the interactive agent just summarizing those states. Hard to describe but it changes the usage pattern for a very different workload. More churn and burn than transactional.
Right now ChatGPT and cloud model economy favors how much work can I do in this relatively small budget per time scale. Local (even expensive) favors push as much through the pipe as fast and hard as you can all the time. Sublet sure? You want ai? Yeah I need this proc at 80% all the time or I overpaid.
In my current plan it’ll be at 80% or better and I may need the feature for tethering to a second one (I really hope to avoid it.)
I think long term it’s really something like an HX370 based minipc + digits gets a lot of heavy long term use through… 2030ish? Basically $3000-3500USD depending on your best deal at target time frame for probably 5 years of use? Heck yes. Sign me up. Mayyyybe a booster to the minipc at 3 years for another 500 bucks…
I’ve seen people pay that on just the watercooling and I GUARANTEE I’ll pump way more tokens through than I could pay for in the same time period with oAI.
There’s already enough ML pattern data from companies like OTIS to be used as prediction of failure for resistance workloads.
Suddenly you feed in that ML model data and your electrical usage data and tell the LLM that it can get all that with these three tags… Run It hourly…
Id blown the doors off my tokens at oai. But running that locally. It’ll tell me the refrigerator is about to fail months before anyone would know anything is wrong…