[SOLVED] Cerebras/llama-3.3: token length(`150`) exceeded. Increase maximum token to avoid the issue

PieBru · February 17, 2025, 9:44am

Hi there,
I’m using llama3.3-70B as the LLM. IMO it’s very good and open-source.

Normally, I use Cerebras/llama-3.3-70b, that’s blazing fast and free (generous daily limits). When internet eventually goes offline, the same LLM is slowly served by my old i7 notebook, giving the same functionality, except the delay, which I tolerate given the “emergency” state.

No problems while offline, but online it works only for short replys, otherwise it returns this error:

Is that “150” value tunable on my side?
Thank you,
Piero

WallyR · February 17, 2025, 10:04am

Probably yes.
With a credit card.

Tokens are the currency computing power is oft measured in.
So you “generous” daily limit is too low to handle that request.

PieBru · February 17, 2025, 11:28am

Don’t think so. I tried using curl with the same free API, it works even for larger replys:

export CEREBRAS_API_KEY=csk-.............
curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "llama-3.3-70b",
  "stream": false,
  "messages": [{"content": "Tell me a 100 words fairy tale.", "role": "user"}],
  "temperature": 0,
  "max_completion_tokens": -1,
  "seed": 0,
  "top_p": 1
}'

Reply:

{"id":"chatcmpl-01976f3b-3b82-4660-bd06-f6038a7de47d","choices":[{"finish_reason":"stop","index":0,"message":{"content":"In a tiny village, a kind fairy named Luna lived. She had a magical flower that bloomed only once a year, granting a single wish to whoever possessed it. One day, a poor girl named Sophia found the flower and wished for the ability to heal any sickness. Luna appeared, granting Sophia's wish and tasking her with helping those in need. Together, they brought joy and health to the village, and Sophia's heart remained full of kindness and love, inspiring others to do the same. The village prospered, and Luna's magic lived on through Sophia.","role":"assistant"}}],"created":1739791211,"model":"llama-3.3-70b","system_fingerprint":"fp_be75108397","object":"chat.completion","usage":{"prompt_tokens":44,"completion_tokens":116,"total_tokens":160},"time_info":{"queue_time":8.2611e-05,"prompt_time":0.002378067,"completion_time":0.08181674,"total_time":0.0855550765991211,"created":1739791211}}

PieBru · February 17, 2025, 11:33am

Got it! Now I feel dumb Sorry for the silly question.

The “150” is defined in the Extended OpenAI Conversation configuration:

dvbit · February 17, 2025, 9:58pm

Ciao Piero
I cannot find their offer .
Could you share the link?

mterry63 · February 17, 2025, 11:40pm

It’s in the curl command: https://cerebras.ai

PieBru · February 18, 2025, 11:15am

@dvbit Now I login from https://cloud.cerebras.ai/

I’m not affiliated in any way. I activated the free dev tier a while ago by subscribing to a waitlist, few days later arrived the confirmation email.
Due to the great request, I don’t know if the free dev tier is still available.

Hope you all can use it, because it’s fast and “generous”:

dvbit · April 16, 2025, 11:29am

Hi
I was admitted.
Any specific guide to follow?

PieBru · April 22, 2025, 2:35pm

@dvbi sorry for my delay. Did you solve?

Anyway:

API endpoint: https://api.cerebras.ai/v1
Model: llama-3.3-70b
API Key: csk-**************