Home Assistant OS installation build for NVIDIA Jetson family/series of single-board computers and compute modules with native AI acceleration

Hedda · December 19, 2024, 1:55pm

Home Assistant Operating System already has official installation builds for a few of other ARM-based single-board computers (most notably Raspberry Pi and Hardkernel Odroid hardware models), but wondering if we can get an official Home Asssistant OS build for Nvidia Jetson family of single-board computers and compute modules?

NVIDIA Embedded Systems for Next-Gen Autonomous Machines

Believe I can guess that reasoning we do not have this already is that all Nvidia Jetson series of products have historically simply been too costly in combinatin that the demand for generative AI acceleration has been low, but guessing that will chang now with the first official Home Assistant voice hardware about to be launched later today there will surely be a growing interest using a AI conversation agent via an LLM (Large-Language Model) locally via the Ollama integration (instead of via cloud services) and then it would be convinient to run everything nativly on the same computer that runs Home Assistant OS.

And maybe more interestingly, NVIDIA just launched their NVIDIA Jetson Orin Nano Super Developer Kit for only $249(US) which include a Jetson Orin Nano compute module and a carrier board for it:

NVIDIA Unveils Its Most Affordable Generative AI Supercomputer | NVIDIA Blog
- Jetson Orin Nano Super Developer Kit

At the announcent NVIDIA’s CEO where marketing this kit as ”The World’s Most Affordable Generative AI Computer”:

That specific development kit is based on the 8GB variant model of new NVIDIA Jetson Orin Nano module series which could potentially be good choice for use in a such suggested mid-range model with buitl-in AI acceleration capabilities to run a smaller LLM model locally.

NVIDIA also makes a 4GB variant, two 8GB variants (with different memory bandwidth), and a 16GB variant in that same NVIDIA Jetson Orin Nano series, but the other modules looks to not yet be available or there is at least no price for the other variants that I could find, however I believe a compute module like that 16GB model could potentially be good enough run a larger LLM model locally.

$249 NVIDIA Jetson Orin Nano Super Developer Kit targets generative AI applications at the edge - CNX Software

The same same NVIDIA Jetson Orin Nano module is by the way also used in the reComputer J3011 (Edge AI Computer with NVIDIA Jetson Orin Nano 8GB) sold for $599(US) by Seeed Studio:

reComputer J3011-Edge AI Device with Jetson Orin™ Nano 8GB module - Seeed Studio

zook · December 20, 2024, 12:23am

…could be a good device to run local LLMs for voice assistant as well as help video processing from surveillance cameras to be able to see real-time video stream on HA…?

BlaXun · December 26, 2024, 6:33pm

Any thoughts on this? Also thinking bout getting one of these.

Psi-nz · December 29, 2024, 4:33am

It would be useful to know just how much better the AI voice performance might be with HA running on this.
Knowing that might allow us to better make the call if it’s worth the effort to get it running. I expect there would be some effort involved in getting HA to use the Jetson AI core.

Hedda · December 29, 2024, 2:31pm

The thing I think would be really relative to know is which exact LLM models you could run locally on the 8GB Nvidia Jetson Orin Nano, because on hardware like that with embedded AI accelerator it is for our purpose with wanting to use Home Assistant and get accurate generative AI responses results today not the raw performance but instead the amount of VRAM (Video RAM or Unified Memory if available) that is the bottleneck and practical showstopper that will prevent you from running larger LLM models.

That is, if we could have some other inexpensive similar single-board computer hardware similar with both some kind of LLM compatible embedded AI accelerator and more unified RAM then that would be much better. Having a relativly slower AI server with 96GB of unified RAM would be more effective for accurate if a larger LLM model could be loaded in RAM where it can be accessed directly by the AI accelerator, (such design if for example used in the unified RAM design for the Apple Mac Mini M1/M2).

Home Assistant core developer’s demos (including a demo shown in the Home Assistant Voice Preview Edition release video) has already showed that using Whisper for STT (Speech-To-Text) in local Wyoming voice pipeline is practically too slow to be useful when running on Home Assistant Green and Raspberry Pi 4 type hardware. As even if just using STT for short command sentenses on such hardware is only 5-10 seconds that is still annoyingly slow. While running the same Whisper STT engine on Nvidia Jetson is hundreds of times faster which makes it more than fast enough to be useful.

As for having generative AI conversations with a small LLM model in combination with STT and TTS (Text-To-Speech) all running locally then you can simple forget about using Home Assistant Green and Raspberry Pi 4 type hardware. Even if running on a Raspberry Pi 5 with 8GB RAM the smallest LLM model will likely not even be able to generate more a token/word or two per second if the same hardware is also running Home Assistant Operating System and STT + TTS locally too.

There is a good reason why the Home Assistant core developer’s are now officially recommending running Home Assistant OS on at least Intel N100 type hardware or better if you want to run fully local SST + TTS for voice control using just the built-in Home Assistant Assist intents, and that is without also running an LLM locally (as the Intel N100 is too slow for that unless you would add some type of additional ollama compatible AI accelerator hardware to it).

Suggest that you check out this video review of the Nvidia Jetson Orin Nano Super development board which has a good demo of a small LLM running locally:

To compare also check out this reference of running smaller LLMs locally on Raspberry Pi 5 with 8GB of RAM:

Hedda · December 29, 2024, 2:44pm

No that part is actually already relativly easy with the Ollama integration in combination with Ollama server addon, but of course it could be streamlined even further.

Hedda · December 30, 2024, 11:21am

@dumbdevice What type of hardware did you run the llama and mistral models on? And did you try running on the same computer where you run Home Assistant?

Asking to further help answer the questions posted above by @Psi-nz to explain that running an LLM locally on the same computer as Home Assistant OS will be very limited unless have better hardware than what Home Assistant Green and Home Assistant Yellow (Raspberry Pi 4/5) can offer.

dumbdevice · December 30, 2024, 6:09pm

It’s pretty old PC with relatively new AMD GPU 6000 series. I am running Ollama in docker container and GPU does most of the work.

And did you try running on the same computer where you run Home Assistant?

No, my Home Assistant is running on mini PC, I don’t think it would handle Ollama.

Psi-nz · December 31, 2024, 12:47pm

yeah.

I’m starting to think a much better approach is for HA to offload in real-time all AI/TTS stuff to another PC on the LAN that has a proper GPU.
eg, you have one PC in the house that has a RTX4070 or something and every device in the house that wants to do any AI/TTS tasks just uses that.
Doesn’t make sense for everything in the house that wants AI/TTS to require its own high end GPU