Load on CPU using Ollama with llama 3.2 - TPU instead possible

Dear Community,

I recently installed Ollama integration and added llama 3.2.
HAOS runs on an HP EliteDesk G3 800g | Intel®Pentium® 4400T 2x2.90GHz | 16GB DDR4 | 256 GB M.2| with Coral TPU USB (originally for Frigate).
The overall system performance is excellent and stable but Ollama with llama is just impossible.
Answers to questions are not even generated in several minutes, CPU load is maxed.
Is there a way to make use of the Coral TPU to improve performance?

No. You need a gpu to run a LLM on. The main requirement is a minimum 3gb video ram for running the smallest model. 12gb or more to run a more capable model.

2 Likes

Here is an insightful discussion, why it does not work: