Is it practical or even possible to run Ollama on the same PC as I’m running HA? My HA is running under HA OS on an Intel i5 NUC.
Yes, you can. You could run two virtual machines on the computer hardware. One machine could be HAOS another could be some other OS with ollama running on it.
I haven’t tried this and have no plans to do so, that’s as far as my help can go.
While it is possible, and actually fairly easy to set up, without a CUDA compatible GPU ollama will likely take too long to be useful for use with the voice assistant. If you wanted to use it for generating text for a less time sensitive use case that might be tenable.
As mentioned previously, running Ollama in a way that is remotely usable depends completely on the hardware you have available. I think that for you to get useful output, you’ll need at a minimum 20tok/sec of performance. Otherwise, latency will make your experience quite poor. I have Llama 3.1 8b running on my little Lenovo m75q with an AMD APU in it, and I can get about 12tok/sec with that model running with GPU enabled. Its just too slow and has me looking at more narrowly scoped 4b parameter models that can actually work in an assistant pipeline.
would using an eGPU help in this case?
I have really no idea how it would setup a NUC with HAOS on it and using a eGPU for the LLM