With cloud AI getting more expensive, less private, and increasingly rate-limited, I wanted to share hard, verified data on what it actually takes to bring LLMs back inside Home Assistant fully local, offline-capable, and exposed via your own private API.
This post compares Jetson Orin Nano (8GB) and Jetson AGX Orin (32GB / 64GB) with proven model size limits, power draw, and real feasibility not guesses.
If your goal is:
- Voice assistants
- Natural-language automations
- Vision + language (cameras, doorbells, robots)
- Zero cloud dependency
…this is the current state of the art for local edge AI.
Why Jetson for Home Assistant?
Jetson Orin devices are one of the few platforms that:
- Run real LLMs locally
- Stay within home-friendly power budgets
- Support TensorRT-LLM / CUDA acceleration
- Can expose models via local REST / OpenAI-compatible APIs
- Work well as a dedicated HA AI coprocessor
Think of this as the AI equivalent of moving from cloud MQTT back to local Mosquitto.
Hardware Comparison (Verified Specs Only)
| Feature | Jetson Orin Nano (8GB) | Jetson AGX Orin 32GB | Jetson AGX Orin 64GB |
|---|---|---|---|
| CPU | 6-core Cortex-A78AE | 8-core Cortex-A78AE | 12-core Cortex-A78AE |
| GPU | Ampere, 1024 CUDA / 32 Tensor | Ampere, 1792 CUDA / 56 Tensor | Ampere, 2048 CUDA / 64 Tensor |
| RAM | 8GB LPDDR5 (128-bit) | 32GB LPDDR5 (256-bit) | 64GB LPDDR5 (256-bit) |
| Memory Bandwidth | 102 GB/s | 204.8 GB/s | 204.8 GB/s |
| AI Performance | 67 TOPS (INT8 sparse) | 200 TOPS (INT8 sparse) | 275 TOPS (INT8 sparse) |
| Power Envelope | 7–25 W | 15–40 W | 15–60 W (MAXN) |
| Typical Use | Small local LLMs | Medium LLMs + vision | Large multimodal models |
Definitive LLM Size Limits (On-Device, Proven)
This is the part most posts get wrong. Below are models that are confirmed to run fully on-device, not “might work” they are also compressed/quantized.
Jetson Orin Nano (8GB)
Maximum practical LLM size:
Up to ~4 billion parameters (quantized)
- NVIDIA officially demonstrates 3B–4B class models on Orin Nano
- Memory ceiling is the hard limit, not compute
- Larger models fail without swap/offloading (not HA-friendly)
Examples that fit well:
- Gemma 2–4B (INT4/INT8)
- Qwen 2.5 3B
- Phi-3 Mini
- Small VLMs for camera summaries
HA use case fit:
- Voice intent parsing
- Natural-language automations
- Local chat agent
- Camera event summaries
Jetson AGX Orin (32GB / 64GB)
Confirmed working range:
4B up to ~20B+ parameters (quantized)
- NVIDIA showcases 7B–13B models on AGX Orin
- Academic benchmarks confirm 20B+ class inference
- 64GB model allows much larger KV cache + context windows
Examples:
- LLaMA / Qwen / Gemma 7B–13B
- Vision-language models (LLaVA, VILA)
- Multi-camera + LLM reasoning pipelines
HA use case fit:
- Conversational voice assistant
- Multi-room context awareness
- Camera + language fusion
- “Jarvis-style” home reasoning
Power Draw (Realistic for a Home)
| Device | Idle / Light | Typical AI Load | Peak |
|---|---|---|---|
| Orin Nano | ~7–10 W | ~15–20 W | 25 W |
| AGX Orin 32GB | ~15–20 W | ~30 W | 40 W |
| AGX Orin 64GB | ~20–25 W | ~40–50 W | 60 W |
For comparison:
A cloud-dependent AI assistant = always on + always external
A Jetson-based one = local, predictable, private
Hosting Your Own Private AI API
Both platforms can expose models as:
- Local REST API
- OpenAI-compatible endpoint
- gRPC or WebSocket
This lets Home Assistant:
- Call the model directly
- Keep voice/audio/camera data local
- Work offline
- Avoid API costs forever
Architecture example:
Home Assistant
↓
Local AI API (Jetson Orin)
↓
LLM / VLM / ASR / TTS
No cloud. No telemetry. No surprises.
Which One Should You Choose?
Choose Orin Nano if:
- You want affordable, low-power local AI
- Your models are ≤ 4B parameters
- Voice + automations are the main goal
Choose AGX Orin if:
- You want “real assistant” behavior
- You use cameras + language together
- You want larger context windows
- You plan to grow over time
Final Thought
Home Assistant started as a reaction against cloud lock-in.
Running local LLMs on Jetson feels like the next logical step:
- Your house
- Your data
- Your AI
No subscriptions. No rate limits. No external dependency.
If people are interested, I can follow up with:
- Exact models + memory footprints
- OpenAI-compatible API setup
- HA voice assistant integration examples
- Camera → LLM pipelines
Curious to hear what others are running locally ![]()