Local Nvidia supercomputer HA with AI LLM

Nvidia has just recently come out with some affordable supercomputers. I would LOOOVE to be able to run HA on one of those, to then be able to include numerous cameras and Frigate, as well as locally housed large language models for AI processing…

Is that possible with any of their new models?

Aren’t they just PCs with nvidia GPU? If yes then it will run HA.

But will HA use the GPU?

I have a gaming machine with a Nvidia card on which I run a fully local voice assistant. The details are way over my head, but my understanding is that it was especially written to exploit the speed of the GPU - which HA and the rest aren’t.

It’s an Nvidia Grace (ARM) CPU married directly to a Blackwell GPU that drives in excess of 5000 TOPS. (100x what qualifies a machine to be a Copilot +PC)

Its more like a gigantic Jetson Nano. And no i don’t expect it to run HA directly. Although I DO expect it to run ollama and 30B models locally (Jensen says it runs Nvidias ai stack…) basically you could in theory run a foundational model at home and if it’s capable of a test time compute or test time training (reasoning) capable model… You almost have something that could theoretically run an o1 class model or comparable on your desk. :exploding_head:

I know 3K is a lot of scratch but for what I think this box will do - it will be a steal.

4 Likes

I been looking at the latest CES reveals and agree. The computing power in such a small form factor is amazing. Considering I have several GPUs I paid 2K for, a price of 3K seems reasonable cheap.

I wouldn’t use it as a dedicated HA device though :grinning:

2 Likes

HA won’t use the GPU, but maybe Whisper, Piper or Ollama can be persuaded to do so. HAOS does not ship with the NVIDIA driver so that ron’t work.

1 Like

Docker or virtualized HA would be best.

HA has no use for such power. Any integration would need seperate service. It is the seperate service that would use processing power and HA integration simply connect to the service.

2 Likes

What is best for home AI? RTX 5090 or wait for the DIGITS box?

A 5090 will push appropriately 3300 TOPS at max throuput will cost nearly as much as a digits but you get to install it in your machine of choice and game on it. It’s not a direct comparison.

The digits is pure ai.all the time and there’s still a lot of questions. A 5090. Well if you like PUBG… It’s gonna be a banger.

HA addons do not support the nvidia-runtime. You would need to (and likely very much want to) run a standard linux installation and run everything in docker

1 Like

Honestly I’d run the box purely as suggested by Nvidia and call to it from off box. It’s a different animal, purpose built for what it does. I don’t think it’s going to run non ai workload worth a crud. Ok yeah it can brute force compute but if you want this box to do what it does you’ll want it to be a purpose built machine.

2 Likes

I have a HA Yellow. Could a set up be to keep HA on the Yellow and add a Digits to my network with a LLM on it, then have the Yellow call to it when needed?

1 Like

Absolutely! That would be supported by the Ollama integration. I have a media center HTPC with a 2080Ti, and that’s how we use it with HA — I run the LLMs there.

2 Likes

Build ollama + open-webui now.

Add digits as another ollama endpoint to the mix when it’s ready.

Profit?

1 Like

Yeah. I would say that it’s more cost-effective to start with something like a used Mac M3 now.

I got a Nvidia Jetson Orin NX 16GB. Nabu and Nvidia worked together to fork piper and whisper to GPU based, using large-V1 model for whisper, ollama already supports the Jetson lineup.

It’s essentially a carrier board with a propriatary socket for the chip which has an ARM CPU, GPU and RAM in one chip. The carrier board is just ports and 2 SSD slots, some MIDI connections for cameras. Typicaly time for a general question is 2 to 5 seconds depending on the response but it sucks in HA right now to actually control HA so i use the fallback to local option for local control but it works great. Small models aren’t good at controlling smart homes compared to using something liked extended openai conversation though. Maybe someday with MCP, which HA already supports. MCP is essentiall a protocol layer between your LLM that “translates” everything so your LLM knows how to handle it for third party API’s. You can run HA in a docker container on the Jetson but easier to just point whisper, piper at openwakeword via IP by adding them manually. Here are the docker containers. I am sure it will use the same OS, Jetson Linux

The digits will basically run like a growed-up Jetson Orin as far as I understand so if you’re there you’re good.

Also Open-webui’s 0.6.0 build yesterday started supporting MCP (stdio, not sse yet) natively. So in the very near future you’ll be ablt to call an openwebui endpoint that’s

Running a local model
Brokering other local models
Connected to mcp tools (one of them could be HA itself for more control options. Or the HA DB for history data without having to create a billion sql sensors)
Running RAG and able to implement LangChain so we can stop storing Data in sensors.
Able to put the chat on the appropriate pipeline (there’s a tool for openwebui that can redirect chats based on workload need) based on complexity and privacy requirements.
Escalate the call to a more powerful model if necessary (digits, openai, etc.)

Oh yeah it’s comin…