Anyone know how to see how many tokens are being sent to Ollama for context?

I’m erring on the side of caution and using 30K context which ends up with Ollama splitting model between system ram and GPU ram which isn’t ideal.

Is there a way to see how big the call is in tokens? So I can set context to be more precise and maybe get everything onto the GPU?

Thanks!

You can make your own copy of the integration (or find one on GitHub) and add more logging to it. Then use debug mode.

1 Like