Information I have found lately is that I have two choices.
1- Make the user run Ollama on their own computer, and connect it to a Raspberry Pi 4 (where HA is located). So the heavy thinking is done by the computer, acting as a server. However, this would need to fix this problem to reach the address IPv4 to know how to connect using the ZeroTier app.
2- Make the ollama run inside the Raspberry alongside the HA. I believe this is where the endopoints would enter, but I guess some configuration should be done?
Probably the address for Ollama would be the same (localhost) always inside the Raspberry Pi 4.
If i was You, I would start “In one End” First … You are now “talking” about Zero Tier ! ( Maybe that should be in Your Topic Header
Have You even got Zero Tier Network, Working in Your Local Network, Not to mention from “Outside” ?
The Pics You show of HA-Net-Info shows your HA is on ethernet, and have a Local address from your Lan-Router. ( And Your “obsure” IP, In Pic Above ( I guess it’s from Your So Called installed " Zero Tier ") , For HA and Your Router, This IS on Another Sub, Lan than both your Router AND HA … No they will Not see it ( Nor less register it on their NIC’s )
For Your Windows i Assume this is also on your Local LAN Using YOur Router and mDNS …
Try to get Zero Tier Working with 2 Local Machines, Before you Include i.e HA, And another “Software”> Ollama " , And an ADD-On in HA for that matter
Let me help you put there. This won’t be happening.
You MUST have a valid GPU powerful enough to drive the llm… and rpi simply don’t have it. If your consumer is using ollama, vllm, Llama cp or something to serve a model I guarantee it is NOT on the pi. If they have it on the same iron as ha they most likely have them as separate VMs. This i(llm) s not a job for an rpi. It won’t be on the same box. You’re looking for an off box ip address.
And on my network you wouldn’t be able to do f it unless you asked me what you’re looking for and have credentials.
Let it be the case I upgrade my Raspberry to the most powerful version available on the market. I have been reading about them being able to at least, run small models (which I would train to do specific tasks) rather than using bigger or the latest models.
For example, I have been reading about this here: Adding Ollama to the Raspberry
If I have enough space for both HA and Ollama, and its powerful enough to run both at the same time (due to really small models) is it actually unsuported?
Has anyone tried it?
Both at the same time isn’t gonna happen… Even if you get the hottest pi you can find rn.
Those models won’t do for anything except base utility. They won’t hold enough context to use tools, you might get one to do small utility jobs like tiny summaries or a small single use vision job that’s all. You might get it to drive a camera for a single use job… Maybe.
So then you’re taking one of the most expensive per Cpu cycle ways to run (an un accelerated pi) with the least effect. So yes you can RUN a tiny model but tiny model then takes the entire box the what? It’s not gonna get you where you want to be. Save the money for a future gpu.
I don’t mean to sound harsh but people need to have realistic expectations. Pi aren’t suitable for LLM work in most cases. It ends in pain in hobby land.
Yes you can call ollama running on the network. But you should plan on something other than a Pi
For Ha you need a tool using model that supports at LEAST 7-8b parameters for that you must have (like NOT optional) a GPU with at least 8GIB of dedicated GPU vram so this is NOT a pi. By any stretch of the imagination…
Mac mimi, pc with GPU, sure. Not going to be a pi because see my previous post.
I have been thinking about all your answers. Is it actually possible to prepare everything so my computer (or any device where my Ollama is ready) to act as the “server”. So each user can download the addon and when wanting to talk with my LLM assistant, they wont need to have Ollama installed on their Raspberry / computer. They will just send a request to my “server”.
And thus making it possible? If not, is there any chance to keep it without relying on the internet connection to make it work?
But you see there’s a whole lot of nuance. There’s plenty out there but I didn’t want you chasing trying to run all of that on a pi… Remember you’re ultimately replacing all of an Amazon datacenter. It need a little chooch.The HA box isn’t where the big crunching happens…
You essentially have to account for…
Device / wake word processing,
STT / ASR
Inference (the actual llm job)
TTS
Any or all of those pieces can be local but the inference job needs special hw.
I do
-wake word - on device, MicroWakeWord on a Voice PE
STT, server running parakeet on one of my servers, Taran. HA has a perfectly good piper app. (You could use on the HA box for this part)
Open ai inference (cloud currently, this is being moved tk a local server metis which is a dgx spark. Overkill for most people and where you would make a smaller hw choice.)
TTS (Taran agan) kokoro you could also use whisper or a service…
As soon as the inference moves 100% to metis I’m 100% local. A year and $6000 later… . Do you need 6000 in gear no. But you also will spend some cash on a graphics accelerator if you want the inference part of that chain above locally…