Skip to main content
The chat completions endpoint is the heart of Forii. It’s fully OpenAI-compatible — every parameter OpenAI supports, Forii supports.
POST https://api.forii.in/inference/v1/chat/completions

Structured Outputs

JSON schema and JSON object modes for guaranteed structured responses

Function Calling

Connect LLMs to external tools, APIs, and databases

Reasoning Models

Control chain-of-thought depth for DeepSeek-R1 and Qwen3

Context & Metadata

Graceful truncation for long prompts, cost attribution metadata

Basic usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.forii.in/inference/v1",
    api_key=os.environ["FORII_API_KEY"],
)

response = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

Response

{
  "id": "chatcmpl-forii-abc123",
  "object": "chat.completion",
  "created": 1719000000,
  "model": "forii/deepseek-v3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing uses quantum bits..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 156,
    "total_tokens": 174,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

Parameters

Core parameters

ParameterTypeRequiredDefaultDescription
modelstringYesModel ID, e.g. forii/deepseek-v3
messagesarrayYesConversation messages with role and content
temperaturefloatNo0.7Sampling randomness (0 = deterministic, 2 = creative)
max_tokensintegerNo2048Maximum tokens in the completion
top_pfloatNo1Nucleus sampling threshold
streambooleanNofalseStream tokens as they arrive
stopstring|arrayNoUp to 4 stop sequences
nintegerNo1Number of completions
frequency_penaltyfloatNo-2 to 2
presence_penaltyfloatNo-2 to 2
seedintegerNoDeterministic sampling
logprobsbooleanNoReturn log probabilities
top_logprobsintegerNo0–5 top log probs per position
userstringNoEnd-user identifier

Structured output & tool parameters

ParameterTypeDescription
response_formatobject{"type": "json_object"} or {"type": "json_schema", "json_schema": {...}}
toolsarrayFunction/tool definitions
tool_choicestring|objectauto, none, required, or {"type": "function", "name": "..."}
parallel_tool_callsbooleanEnable parallel function calls
See Structured Outputs and Function Calling for full details.

Forii extensions

ParameterTypeDescription
reasoning_effortstringnone | low | medium | high — see Reasoning Models
context_length_exceeded_behaviorstring"truncate" (default) or "error" — see Context & Metadata
repetition_penaltyfloat0–2, applies to both prompt and output
top_kintegerTop-K sampling
metadataobjectKey-value string metadata for cost attribution

Streaming

Stream tokens as they arrive using Server-Sent Events (SSE).
stream = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Usage stats in final chunk
# if chunk.usage:
#     print(f"\nTokens: {chunk.usage.total_tokens}")

Streaming response format

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{},"finish_reason":"stop"}],
  "usage":{"prompt_tokens":18,"completion_tokens":156,"total_tokens":174}}

data: [DONE]