Load on CPU using Ollama with llama 3.2 - TPU instead possible

thorsten0909 · November 29, 2024, 1:44pm

Dear Community,

I recently installed Ollama integration and added llama 3.2.
HAOS runs on an HP EliteDesk G3 800g | Intel®Pentium® 4400T 2x2.90GHz | 16GB DDR4 | 256 GB M.2| with Coral TPU USB (originally for Frigate).
The overall system performance is excellent and stable but Ollama with llama is just impossible.
Answers to questions are not even generated in several minutes, CPU load is maxed.
Is there a way to make use of the Coral TPU to improve performance?

austinsr · November 30, 2024, 1:43am

No. You need a gpu to run a LLM on. The main requirement is a minimum 3gb video ram for running the smallest model. 12gb or more to run a more capable model.

chris-kuhr · December 21, 2024, 4:54pm

Here is an insightful discussion, why it does not work: