I’ve been using HA for the past few years using a Home Assistant Blue. I’ve mostly used it for simple things like controlling lights, volume on different devices, etc. Nothing too demanding or complicated. However, I want to transition away from Google (which I currently use primarily for updating shopping lists and playing music on minis and hubs, using voice commands). Additionally I would like to integrate a local LLM (Ollama).
I have been researching what it will take to make the transition, and there’s just a ton of information and it’s a bit much to wrap my head around. Here is the new hardware that I intend to use:
Intel NUC8 - Core i5 8295U, 16GB DDR4, 500GB M.2 SSD
Home Assistant Voice PE
I had initially planned to keep using HA on Blue, but looking into the requirements for Whisper I suspect that the hardware wouldn’t be powerful enough to give me timely responses. So that means that I’ll need to have HAOS on the NUC that I was planning on installing Ollama on.
How do I go about having both installed on the same machine? Would I need to install Proxmox first and then have HAOS and Ollama both in VMs?
Given the capabilities of the NUC, what model should I choose for Ollama?
Will there be any issues with me having two instances of HA running on the same network? I’d like to keep the current setup on Blue going while I spin up the NUC since I expect it is going to take quite some time to figure everything out.
I’ve never used Proxmox or an LLM before (or even installed HAOS from scratch) so I’m sure this is going to be a learning process.
If you do not have an adequate GPU/NPU do not attempt.
Adequate means BETTER than an Nvidia 3xxx core with 12GB of video / GPU ram or better. Ollama needs valid hardware else inference performance isn’t worth the effort.
Im running local inference in an Intel IPEX-ARC (NUC14ProAI) and it’s ok with careful workload planning. (barely) I run that machine as inference only I could probably run HA on it but the box would be heavily stressed.
If you’re running HASS and Ollama on the same, low-power system, you’ll want to go for something as simple as Qwen 3 (1.7b), Gemma 3 (4b), or DeepSeek R1 (7b). Heck, even a Raspberry Pi 5 can deliver passable results for controlling your home lab once you arm it with one of the 0.6b or 1.5b models.
I don’t plan on pushing the LLM very hard or expecting a lot of out of it, so thought that the NUC8 should be okay since it’s way more powerful than a Pi 5. Unless I’m misunderstanding something?
Ollama. LLMs in general do not use the Cpu. They use a GPU /NPU (think graphics card) even the 4b models need approx 8-12 GB of dedicated video (GPU/NPU) ram to even load with a 4-8k token context window. Which is absolutely the base I’d attempt to control HA with)
So no. It’s not the Cpu. It’s the npu/GPU. If. You took that box and hung a Nvidia based egpu with 12G or better on it sure! Until then you aren’t I the same neighborhood. Sorry.
Its like trying to play far cry without a video accelator.
I detail my inference box in the Friday’s Party post.
I see. But you’re saying I could add an egpu to the NUC? Pardon my ignorance as it’s been a long time since I cared about graphics cards, but how would I connect it? I don’t imagine it’s something I could just plug in to one of the available USB slots of the NUC?
They need a hi bandwidth connection so your board needs to support something like mPCI, present an m.2 or have thunderbolt(thunderbolt is probably easiest but you need a machine less than probably two years old for that.)
Local inference is NOT for the faint of heart and it’s not cheap