Hi everyone,
I was wondering if anyone has thoughts on the NVIDIA Jetson AGX Orin 64GB Developer Kit for local ai? If you are using it, I would love to hear how. Is there something better/cheaper in the horizon?
Hi everyone,
I was wondering if anyone has thoughts on the NVIDIA Jetson AGX Orin 64GB Developer Kit for local ai? If you are using it, I would love to hear how. Is there something better/cheaper in the horizon?
I have this exact device.
Currently, it only runs ollama, which I connected Home Assistant to; Home Assistant runs on different hardware.
It takes quite some time when working with Voice Assistant, as it seems to go Voice Assistant (STT) → Ollama → Home Assistant → TTS → Voice Assistant.
I am planning on installing the jetson-containers (link) regarding Home Assistant (HA itself, whisper, piper, OWW) to see if things speed up significantly if everything runs on the Orin. However, I am currently still setting up other AI projects, which might require me to re-flash the Orin via nvidia-sdk, so I don’t want to set everything up at the moment.
Once everything else works as expected, I am hoping to give it a try.
About the “how”… What exactly do you mean? You can install ollama on the Orin (either as a service, or via docker). Then you can use the Ollama integration in Home Assistant to communicate with it. I guess there are many parameters you can modify, for example, which LLM to use, the prompt in Home Assistant, count of text tokens, … Unfortunately, I cannot tell you anything about this at the moment. I used the defaults and llama3.1:8b. I am not that happy with the replies at the moment. Perhaps this is due to speaking German, not English? Maybe another model would be better?
Sometimes, I get one reply when asking something, then another when asking right again. For example “Who is Joe Biden?” - “Nobody” 10 seconds later “Who is Joe Biden?” - “Joe Biden is (…)”. This is kinda weird. Why won’t I get the reply right away, if the model obviously knows it (since it can answer it the second time I asked)?
Perhaps somebody else on here has a good workflow already and might be willing to share it?
@prankousky - Did you ever do anything more with this? If so, I’d love to hear any further experience you’ve had.
I’ve been wanting to speed up my local Ollama-based Voice Assistant (as well as dabble in a few other AI projects) and was looking at picking up one of the Jetson AGX Orin 64GB Developer Kits when I found this thread.
I’ve been running my Ollama container on an Unraid server with an NVIDA Tesla P4 but the Voice Assistant responses are too slow for my better half’s liking (or, at least that’s my excuse for buying more tech toys).
Sorry, no I did not.
I am running ollama on the device, with open-webui; then I have comfyUI running in a conda enviroment.
The little voice stuff I tried simply took too long. I don’t remember details, but let’s say I did OK Nabu, what causes the tides, then I’d wait 20 or 30 seconds for a reply. The reply was alright, but that is too long considering I would need quite a lot of testing if I wanted to implement voice commands like this (“how man lights are on?”, “in which room was the most recent movement detected?”, etc.)
While I could in theory wait a few seconds more than usual for a reply, this is not something I would want to test with. Ask something, wait half a minute, see the result, modify, ask again, wait again, see if it worked - if necessary, modify again, wait again, etc. etc.
I guess another issue is that I want different things from the machine. If it were to only handle ollama/voice, perhaps that’d be fine. But when I want to use comfyUI to generate content, the entire VRAM is needed. In these moments, ollama doesn’t have enough resources to load the models as well. Gave up on voice on this device. Perhaps there will be a dedicated device for voice/local AI only at some point that is more affordable than the jetson, then I’ll go for that. Until then, I need to stick with regular voice without AI ![]()
But I hope you have more success with it than I did. I expected things to work much faster on the Jeston; I saw a video on youtube where a developer only had the jetson mini and his voice stuff worked faster than mine (already looked, wasn’t able to find the video link just now). So either we are both doing something wrong, or the device is just not the way to go.
I am using this same device. Ollama+open-webui+kokoro fast api deepseek-r1:14b+deepseek-r1:7b_q4_k_m+llava7b even managed to get Orpheus running kinda but not recommended right now working jetson-voice for STT ASR running pretty smoothly I did try a 32b deepseek not recomended. conversational response time is faily decent. but I do have reasoning for ollama turned off in HA. with any reasoning model you are not going to beat the reasoning no matter what you do avg for deepseek is 12s btw all 3 llms are loaded in mem at one time and used in HA. if you don’t really have a special need for your own STT stick with HA same with TTS in HA. not saying don’t do it just saying its a pain in the @$$ I am only doing it because I need it for an additional project addon that I am working on. and bam try using
GitHub - tboby/wyoming-onnx-asr: Wyoming protocol server for onnx asr speech to text system I ran the docker install no problem