Context truncation
Instead of returning an error for prompts that exceed the context window, truncate gracefully:
response = client.chat.completions.create(
model="forii/deepseek-v3",
messages=long_messages,
context_length_exceeded_behavior="truncate", # or "error"
max_tokens=2048,
)
| Value | Behavior |
|---|
"truncate" | Truncate the prompt to fit the context window. Process continues normally. |
"error" | Return an error if the prompt exceeds the context limit. |
truncate is the default. It removes the oldest messages first (system message preserved), giving you a better UX than a hard error. Use "error" when you need to know exactly when context limits are hit.
Attach key-value metadata for cost attribution across teams and projects:
response = client.chat.completions.create(
model="forii/deepseek-v3",
messages=[{"role": "user", "content": "Hello"}],
extra_body={"metadata": {
"user_id": "u_12345",
"session_id": "s_abc",
"team": "search",
"project": "ranker",
}},
)
Metadata appears in usage logs and the usage API, letting you attribute costs to specific teams, projects, or users.
Metadata is a Forii extension adopted from Fireworks’ parameters. Values must be strings. Useful for cost attribution, A/B test labeling, and debugging.
Additional Forii extensions
| Parameter | Type | Description |
|---|
reasoning_effort | string | none | low | medium | high — see Reasoning Models |
context_length_exceeded_behavior | string | "truncate" (default) or "error" |
repetition_penalty | float | 0–2, applies to both prompt and output |
top_k | integer | Top-K sampling |
metadata | object | Key-value string metadata for tracing and cost attribution |