Skip to main content

Context truncation

Instead of returning an error for prompts that exceed the context window, truncate gracefully:
response = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=long_messages,
    context_length_exceeded_behavior="truncate",  # or "error"
    max_tokens=2048,
)
ValueBehavior
"truncate"Truncate the prompt to fit the context window. Process continues normally.
"error"Return an error if the prompt exceeds the context limit.
truncate is the default. It removes the oldest messages first (system message preserved), giving you a better UX than a hard error. Use "error" when you need to know exactly when context limits are hit.

Request metadata

Attach key-value metadata for cost attribution across teams and projects:
response = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"metadata": {
        "user_id": "u_12345",
        "session_id": "s_abc",
        "team": "search",
        "project": "ranker",
    }},
)
Metadata appears in usage logs and the usage API, letting you attribute costs to specific teams, projects, or users.
Metadata is a Forii extension adopted from Fireworks’ parameters. Values must be strings. Useful for cost attribution, A/B test labeling, and debugging.

Additional Forii extensions

ParameterTypeDescription
reasoning_effortstringnone | low | medium | high — see Reasoning Models
context_length_exceeded_behaviorstring"truncate" (default) or "error"
repetition_penaltyfloat0–2, applies to both prompt and output
top_kintegerTop-K sampling
metadataobjectKey-value string metadata for tracing and cost attribution