Chat Completions - Forii — India's Sovereign Inference Platform

The chat completions endpoint is the heart of Forii. It’s fully OpenAI-compatible — every parameter OpenAI supports, Forii supports.

POST https://api.forii.in/inference/v1/chat/completions

Structured Outputs

JSON schema and JSON object modes for guaranteed structured responses

Function Calling

Connect LLMs to external tools, APIs, and databases

Reasoning Models

Control chain-of-thought depth for DeepSeek-R1 and Qwen3

Context & Metadata

Graceful truncation for long prompts, cost attribution metadata

Basic usage

Python
JavaScript
cURL

from openai import OpenAI

client = OpenAI(
    base_url="https://api.forii.in/inference/v1",
    api_key=os.environ["FORII_API_KEY"],
)

response = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FORII_API_KEY,
  baseURL: "https://api.forii.in/inference/v1",
});

const response = await client.chat.completions.create({
  model: "forii/deepseek-v3",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms" },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

curl https://api.forii.in/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FORII_API_KEY" \
  -d '{
    "model": "forii/deepseek-v3",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Response

{
  "id": "chatcmpl-forii-abc123",
  "object": "chat.completion",
  "created": 1719000000,
  "model": "forii/deepseek-v3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing uses quantum bits..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 156,
    "total_tokens": 174,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

Parameters

Core parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Model ID, e.g. `forii/deepseek-v3`
`messages`	array	Yes	—	Conversation messages with `role` and `content`
`temperature`	float	No	0.7	Sampling randomness (0 = deterministic, 2 = creative)
`max_tokens`	integer	No	2048	Maximum tokens in the completion
`top_p`	float	No	1	Nucleus sampling threshold
`stream`	boolean	No	false	Stream tokens as they arrive
`stop`	string\|array	No	—	Up to 4 stop sequences
`n`	integer	No	1	Number of completions
`frequency_penalty`	float	No	—	-2 to 2
`presence_penalty`	float	No	—	-2 to 2
`seed`	integer	No	—	Deterministic sampling
`logprobs`	boolean	No	—	Return log probabilities
`top_logprobs`	integer	No	—	0–5 top log probs per position
`user`	string	No	—	End-user identifier

Structured output & tool parameters

Parameter	Type	Description
`response_format`	object	`{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}`
`tools`	array	Function/tool definitions
`tool_choice`	string\|object	`auto`, `none`, `required`, or `{"type": "function", "name": "..."}`
`parallel_tool_calls`	boolean	Enable parallel function calls

See Structured Outputs and Function Calling for full details.

Forii extensions

Parameter	Type	Description
`reasoning_effort`	string	`none` \| `low` \| `medium` \| `high` — see Reasoning Models
`context_length_exceeded_behavior`	string	`"truncate"` (default) or `"error"` — see Context & Metadata
`repetition_penalty`	float	0–2, applies to both prompt and output
`top_k`	integer	Top-K sampling
`metadata`	object	Key-value string metadata for cost attribution

Streaming

Stream tokens as they arrive using Server-Sent Events (SSE).

Python
JavaScript

stream = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Usage stats in final chunk
# if chunk.usage:
#     print(f"\nTokens: {chunk.usage.total_tokens}")

const stream = await client.chat.completions.create({
  model: "forii/deepseek-v3",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Streaming response format

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{},"finish_reason":"stop"}],
  "usage":{"prompt_tokens":18,"completion_tokens":156,"total_tokens":174}}

data: [DONE]

Structured Outputs

Function Calling

Reasoning Models

Context & Metadata

​Basic usage

​Response

​Parameters

​Core parameters

​Structured output & tool parameters

​Forii extensions

​Streaming

​Streaming response format