Home Assistant OS installation build for NVIDIA Jetson family/series of single-board computers and compute modules with native AI acceleration

Home Assistant Operating System already has official installation builds for a few of other ARM-based single-board computers (most notably Raspberry Pi and Hardkernel Odroid hardware models), but wondering if we can get an official Home Asssistant OS build for Nvidia Jetson family of single-board computers and compute modules?

Believe I can guess that reasoning we do not have this already is that all Nvidia Jetson series of products have historically simply been too costly in combinatin that the demand for generative AI acceleration has been low, but guessing that will chang now with the first official Home Assistant voice hardware about to be launched later today there will surely be a growing interest using a AI conversation agent via an LLM (Large-Language Model) locally via the Ollama integration (instead of via cloud services) and then it would be convinient to run everything nativly on the same computer that runs Home Assistant OS.

And maybe more interestingly, NVIDIA just launched their NVIDIA Jetson Orin Nano Super Developer Kit for only $249(US) which include a Jetson Orin Nano compute module and a carrier board for it:

At the announcent NVIDIA’s CEO where marketing this kit as ”The World’s Most Affordable Generative AI Computer”:

That specific development kit is based on the 8GB variant model of new NVIDIA Jetson Orin Nano module series which could potentially be good choice for use in a such suggested mid-range model with buitl-in AI acceleration capabilities to run a smaller LLM model locally.

NVIDIA also makes a 4GB variant, two 8GB variants (with different memory bandwidth), and a 16GB variant in that same NVIDIA Jetson Orin Nano series, but the other modules looks to not yet be available or there is at least no price for the other variants that I could find, however I believe a compute module like that 16GB model could potentially be good enough run a larger LLM model locally.

The same same NVIDIA Jetson Orin Nano module is by the way also used in the reComputer J3011 (Edge AI Computer with NVIDIA Jetson Orin Nano 8GB) sold for $599(US) by Seeed Studio:

…could be a good device to run local LLMs for voice assistant as well as help video processing from surveillance cameras to be able to see real-time video stream on HA…?

3 Likes

Any thoughts on this? Also thinking bout getting one of these.

It would be useful to know just how much better the AI voice performance might be with HA running on this.
Knowing that might allow us to better make the call if it’s worth the effort to get it running. I expect there would be some effort involved in getting HA to use the Jetson AI core.

The thing I think would be really relative to know is which exact LLM models you could run locally on the 8GB Nvidia Jetson Orin Nano, because on hardware like that with embedded AI accelerator it is for our purpose with wanting to use Home Assistant and get accurate generative AI responses results today not the raw performance but instead the amount of VRAM (Video RAM or Unified Memory if available) that is the bottleneck and practical showstopper that will prevent you from running larger LLM models.

That is, if we could have some other inexpensive similar single-board computer hardware similar with both some kind of LLM compatible embedded AI accelerator and more unified RAM then that would be much better. Having a relativly slower AI server with 96GB of unified RAM would be more effective for accurate if a larger LLM model could be loaded in RAM where it can be accessed directly by the AI accelerator, (such design if for example used in the unified RAM design for the Apple Mac Mini M1/M2).

Home Assistant core developer’s demos (including a demo shown in the Home Assistant Voice Preview Edition release video) has already showed that using Whisper for STT (Speech-To-Text) in local Wyoming voice pipeline is practically too slow to be useful when running on Home Assistant Green and Raspberry Pi 4 type hardware. As even if just using STT for short command sentenses on such hardware is only 5-10 seconds that is still annoyingly slow. While running the same Whisper STT engine on Nvidia Jetson is hundreds of times faster which makes it more than fast enough to be useful.

As for having generative AI conversations with a small LLM model in combination with STT and TTS (Text-To-Speech) all running locally then you can simple forget about using Home Assistant Green and Raspberry Pi 4 type hardware. Even if running on a Raspberry Pi 5 with 8GB RAM the smallest LLM model will likely not even be able to generate more a token/word or two per second if the same hardware is also running Home Assistant Operating System and STT + TTS locally too.

There is a good reason why the Home Assistant core developer’s are now officially recommending running Home Assistant OS on at least Intel N100 type hardware or better if you want to run fully local SST + TTS for voice control using just the built-in Home Assistant Assist intents, and that is without also running an LLM locally (as the Intel N100 is too slow for that unless you would add some type of additional ollama compatible AI accelerator hardware to it).

Suggest that you check out this video review of the Nvidia Jetson Orin Nano Super development board which has a good demo of a small LLM running locally:

To compare also check out this reference of running smaller LLMs locally on Raspberry Pi 5 with 8GB of RAM:

No that part is actually already relativly easy with the Ollama integration in combination with Ollama server addon, but of course it could be streamlined even further.

@dumbdevice What type of hardware did you run the llama and mistral models on? And did you try running on the same computer where you run Home Assistant?

Asking to further help answer the questions posted above by @Psi-nz to explain that running an LLM locally on the same computer as Home Assistant OS will be very limited unless have better hardware than what Home Assistant Green and Home Assistant Yellow (Raspberry Pi 4/5) can offer.

It’s pretty old PC with relatively new AMD GPU 6000 series. I am running Ollama in docker container and GPU does most of the work.

And did you try running on the same computer where you run Home Assistant?

No, my Home Assistant is running on mini PC, I don’t think it would handle Ollama.

1 Like

yeah.

I’m starting to think a much better approach is for HA to offload in real-time all AI/TTS stuff to another PC on the LAN that has a proper GPU.
eg, you have one PC in the house that has a RTX4070 or something and every device in the house that wants to do any AI/TTS tasks just uses that.
Doesn’t make sense for everything in the house that wants AI/TTS to require its own high end GPU