WTH: Framework and hardware recommendation for local Inference and RAG

There has been a couple of posts about LLM and RAG’s and there is community work for add-ons like Ollama and N8N however…

For those of us in the community who want to go further with voice, local Inference and advanced automation with AI I would like to see some recommendations for hardware and frameworks.

I quickly outgrew PI’s when I joined the community and have never regretted moving to a NUC. However it is getting a bit old and looking at new platform with more capability but unsure which way to go.

Intel, AMD now have npu’s onboard and 6.14 kernal for Linux will support the AMD npu. Or do I look at the new Jetson Nano Super.

It would be great to have local Inference and RAG to maintain the “local” philosophy of HA whilst providing the more advanced voice provided by fallback. However, it needs to be as good as using cloud fallback perhaps quicker.

Support in the core to use the NPU or offload to GPU to support add-ons to do the LLM and RAG on suitable hardware would be a huge leap for Home Assistant.

2025 Year of AI :smiley:

OpenWebui uses ollama and has a restful API that allows you to submit documents and use them for RAG when sending the query in, all via their API.

You should use a GPU for this, I’m hoping the price of the 4090 will drop soon with the 5090 and 5089 being released this month. I’m running a 3060 with 12gb locally and works fine, but I’d like something with more VRAM.

Thanks Kludge

Possibly I didn’t explain myself. I am asking for a single physical piece of hardware running HAOS that can host a local llm and RAG. Either a new minipc with an AI CPU/GPU/NPU or perhaps the new NVIDIA Jetson Nano Super.