Returning AI to Home Assistant Running LLMs Locally

With cloud AI getting more expensive, less private, and increasingly rate-limited, I wanted to share hard, verified data on what it actually takes to bring LLMs back inside Home Assistant fully local, offline-capable, and exposed via your own private API.

This post compares Jetson Orin Nano (8GB) and Jetson AGX Orin (32GB / 64GB) with proven model size limits, power draw, and real feasibility not guesses.

If your goal is:

  • Voice assistants
  • Natural-language automations
  • Vision + language (cameras, doorbells, robots)
  • Zero cloud dependency

…this is the current state of the art for local edge AI.


Why Jetson for Home Assistant?

Jetson Orin devices are one of the few platforms that:

  • Run real LLMs locally
  • Stay within home-friendly power budgets
  • Support TensorRT-LLM / CUDA acceleration
  • Can expose models via local REST / OpenAI-compatible APIs
  • Work well as a dedicated HA AI coprocessor

Think of this as the AI equivalent of moving from cloud MQTT back to local Mosquitto.


Hardware Comparison (Verified Specs Only)

Feature Jetson Orin Nano (8GB) Jetson AGX Orin 32GB Jetson AGX Orin 64GB
CPU 6-core Cortex-A78AE 8-core Cortex-A78AE 12-core Cortex-A78AE
GPU Ampere, 1024 CUDA / 32 Tensor Ampere, 1792 CUDA / 56 Tensor Ampere, 2048 CUDA / 64 Tensor
RAM 8GB LPDDR5 (128-bit) 32GB LPDDR5 (256-bit) 64GB LPDDR5 (256-bit)
Memory Bandwidth 102 GB/s 204.8 GB/s 204.8 GB/s
AI Performance 67 TOPS (INT8 sparse) 200 TOPS (INT8 sparse) 275 TOPS (INT8 sparse)
Power Envelope 7–25 W 15–40 W 15–60 W (MAXN)
Typical Use Small local LLMs Medium LLMs + vision Large multimodal models

Definitive LLM Size Limits (On-Device, Proven)

This is the part most posts get wrong. Below are models that are confirmed to run fully on-device, not “might work” they are also compressed/quantized.

Jetson Orin Nano (8GB)

Maximum practical LLM size:
:arrow_right: Up to ~4 billion parameters (quantized)

  • NVIDIA officially demonstrates 3B–4B class models on Orin Nano
  • Memory ceiling is the hard limit, not compute
  • Larger models fail without swap/offloading (not HA-friendly)

Examples that fit well:

  • Gemma 2–4B (INT4/INT8)
  • Qwen 2.5 3B
  • Phi-3 Mini
  • Small VLMs for camera summaries

HA use case fit:

  • Voice intent parsing
  • Natural-language automations
  • Local chat agent
  • Camera event summaries

Jetson AGX Orin (32GB / 64GB)

Confirmed working range:
:arrow_right: 4B up to ~20B+ parameters (quantized)

  • NVIDIA showcases 7B–13B models on AGX Orin
  • Academic benchmarks confirm 20B+ class inference
  • 64GB model allows much larger KV cache + context windows

Examples:

  • LLaMA / Qwen / Gemma 7B–13B
  • Vision-language models (LLaVA, VILA)
  • Multi-camera + LLM reasoning pipelines

HA use case fit:

  • Conversational voice assistant
  • Multi-room context awareness
  • Camera + language fusion
  • “Jarvis-style” home reasoning

Power Draw (Realistic for a Home)

Device Idle / Light Typical AI Load Peak
Orin Nano ~7–10 W ~15–20 W 25 W
AGX Orin 32GB ~15–20 W ~30 W 40 W
AGX Orin 64GB ~20–25 W ~40–50 W 60 W

For comparison:
A cloud-dependent AI assistant = always on + always external
A Jetson-based one = local, predictable, private


Hosting Your Own Private AI API

Both platforms can expose models as:

  • Local REST API
  • OpenAI-compatible endpoint
  • gRPC or WebSocket

This lets Home Assistant:

  • Call the model directly
  • Keep voice/audio/camera data local
  • Work offline
  • Avoid API costs forever

Architecture example:

Home Assistant
   ↓
Local AI API (Jetson Orin)
   ↓
LLM / VLM / ASR / TTS

No cloud. No telemetry. No surprises.


Which One Should You Choose?

Choose Orin Nano if:

  • You want affordable, low-power local AI
  • Your models are ≤ 4B parameters
  • Voice + automations are the main goal

Choose AGX Orin if:

  • You want “real assistant” behavior
  • You use cameras + language together
  • You want larger context windows
  • You plan to grow over time

Final Thought

Home Assistant started as a reaction against cloud lock-in.

Running local LLMs on Jetson feels like the next logical step:

  • Your house
  • Your data
  • Your AI

No subscriptions. No rate limits. No external dependency.

If people are interested, I can follow up with:

  • Exact models + memory footprints
  • OpenAI-compatible API setup
  • HA voice assistant integration examples
  • Camera → LLM pipelines

Curious to hear what others are running locally :eyes:

4 Likes

A Mac may be a better choice for running LLM inference with home assistant. Jetson is a better choice for power contained edge computing and where CUDA is needed.

Neither the Jetson or Mac Minis are as fast as cloud when running local medium size models. The small models (8B) are pretty dumb.

Hey, your response may be giving people an unbiased opinion regarding models and tech while that said you do make a point and i actually should add mac specs to the list as they can be a good option for more novice user but from a developers perspective like myself im always gonna lean in to what gives me more freedom…


“The small models (8B) are pretty dumb.”

Not at all, majority of home assistant task are really “dumb”, task which dont require a 20b parameter model. (Which a Jetson AGX Orin can run moderately well btw, its common on youtube people compare mac mini to a jetson orin nano (the baby of the line up) to better illustrate how foolish it is mac mini a $600-$1,400 device vs Jetson orin nano a $235 device from micro center :slight_smile: when it should be compared to the jetson agx orin ) Now theres more, if your running things vanilla yeah espect numbers like 10-20 tokens but real user feedback as proven you can push to agx orin to output 300/tokens per second with a 7b mistreal model


“A Mac may be a better choice for running LLM inference”

hmm…


“Neither the Jetson or Mac Minis are as fast as cloud”

True, a sigulaur device cannot compete with a network of graphics cards.

In my opinion id still go agx jetson if i wanted truly local for home assistant, in my situation because even the way i use home assistant is pretty un common i have many servers which work injuction with home assistant and communicating via mqtt, my smart home is far outside of home assistant’s bubble. And i want a device which is open enough to let me tinker around so i can practice my uncommon ways lol. But if you do run mac mini you can do cool things like run this sort of server

Well, for one thing I am glad you added also the power consumption when running idle because that is what it will do most of the time (here I mean). I have the impression that the Jetsons are okay but just a tinny bit short in computing power for a LLM. To me they look like great entry level machines but not for future growth when you start to find out all the neat things you can do with it.
At the moment I am more looking towards the miniforum line of AI pc’s. But they are still hefty on the price tag and I do not believe that will change soon with the constantly rising ram prices at the moment.

1 Like