[SOLVED] Cerebras/llama-3.3: token length(`150`) exceeded. Increase maximum token to avoid the issue

Hi there,
I’m using llama3.3-70B as the LLM. IMO it’s very good and open-source.

Normally, I use Cerebras/llama-3.3-70b, that’s blazing fast and free (generous daily limits). When internet eventually goes offline, the same LLM is slowly served by my old i7 notebook, giving the same functionality, except the delay, which I tolerate given the “emergency” state.

No problems while offline, but online it works only for short replys, otherwise it returns this error:

Is that “150” value tunable on my side?
Thank you,
Piero

Probably yes.
With a credit card.

Tokens are the currency computing power is oft measured in.
So you “generous” daily limit is too low to handle that request.

Don’t think so. I tried using curl with the same free API, it works even for larger replys:

export CEREBRAS_API_KEY=csk-.............
curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "llama-3.3-70b",
  "stream": false,
  "messages": [{"content": "Tell me a 100 words fairy tale.", "role": "user"}],
  "temperature": 0,
  "max_completion_tokens": -1,
  "seed": 0,
  "top_p": 1
}'

Reply:

{"id":"chatcmpl-01976f3b-3b82-4660-bd06-f6038a7de47d","choices":[{"finish_reason":"stop","index":0,"message":{"content":"In a tiny village, a kind fairy named Luna lived. She had a magical flower that bloomed only once a year, granting a single wish to whoever possessed it. One day, a poor girl named Sophia found the flower and wished for the ability to heal any sickness. Luna appeared, granting Sophia's wish and tasking her with helping those in need. Together, they brought joy and health to the village, and Sophia's heart remained full of kindness and love, inspiring others to do the same. The village prospered, and Luna's magic lived on through Sophia.","role":"assistant"}}],"created":1739791211,"model":"llama-3.3-70b","system_fingerprint":"fp_be75108397","object":"chat.completion","usage":{"prompt_tokens":44,"completion_tokens":116,"total_tokens":160},"time_info":{"queue_time":8.2611e-05,"prompt_time":0.002378067,"completion_time":0.08181674,"total_time":0.0855550765991211,"created":1739791211}}

Got it! Now I feel dumb :frowning: Sorry for the silly question.

The “150” is defined in the Extended OpenAI Conversation configuration:

Ciao Piero
I cannot find their offer .
Could you share the link?

It’s in the curl command: https://cerebras.ai

@dvbit Now I login from https://cloud.cerebras.ai/

I’m not affiliated in any way. I activated the free dev tier a while ago by subscribing to a waitlist, few days later arrived the confirmation email.
Due to the great request, I don’t know if the free dev tier is still available.

Hope you all can use it, because it’s fast and “generous”: