> ## Documentation Index
> Fetch the complete documentation index at: https://docs.forii.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions

> POST /inference/v1/chat/completions — The core inference endpoint. OpenAI-compatible.

The chat completions endpoint is the heart of Forii. It's fully OpenAI-compatible — every parameter OpenAI supports, Forii supports.

```
POST https://api.forii.in/inference/v1/chat/completions
```

<CardGroup cols={2}>
  <Card title="Structured Outputs" href="/docs/api-reference/chat-completions/structured-outputs">
    JSON schema and JSON object modes for guaranteed structured responses
  </Card>

  <Card title="Function Calling" href="/docs/api-reference/chat-completions/function-calling">
    Connect LLMs to external tools, APIs, and databases
  </Card>

  <Card title="Reasoning Models" href="/docs/api-reference/chat-completions/reasoning">
    Control chain-of-thought depth for DeepSeek-R1 and Qwen3
  </Card>

  <Card title="Context & Metadata" href="/docs/api-reference/chat-completions/context-and-metadata">
    Graceful truncation for long prompts, cost attribution metadata
  </Card>
</CardGroup>

## Basic usage

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from openai import OpenAI

    client = OpenAI(
        base_url="https://api.forii.in/inference/v1",
        api_key=os.environ["FORII_API_KEY"],
    )

    response = client.chat.completions.create(
        model="forii/deepseek-v3",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing in simple terms"},
        ],
        temperature=0.7,
        max_tokens=512,
    )

    print(response.choices[0].message.content)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: process.env.FORII_API_KEY,
      baseURL: "https://api.forii.in/inference/v1",
    });

    const response = await client.chat.completions.create({
      model: "forii/deepseek-v3",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain quantum computing in simple terms" },
      ],
      temperature: 0.7,
      max_tokens: 512,
    });

    console.log(response.choices[0].message.content);
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl https://api.forii.in/inference/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $FORII_API_KEY" \
      -d '{
        "model": "forii/deepseek-v3",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Explain quantum computing"}
        ],
        "temperature": 0.7,
        "max_tokens": 512
      }'
    ```
  </Tab>
</Tabs>

### Response

```json theme={null}
{
  "id": "chatcmpl-forii-abc123",
  "object": "chat.completion",
  "created": 1719000000,
  "model": "forii/deepseek-v3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing uses quantum bits..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 156,
    "total_tokens": 174,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}
```

## Parameters

### Core parameters

| Parameter           | Type          | Required | Default | Description                                           |
| ------------------- | ------------- | -------- | ------- | ----------------------------------------------------- |
| `model`             | string        | Yes      | —       | Model ID, e.g. `forii/deepseek-v3`                    |
| `messages`          | array         | Yes      | —       | Conversation messages with `role` and `content`       |
| `temperature`       | float         | No       | 0.7     | Sampling randomness (0 = deterministic, 2 = creative) |
| `max_tokens`        | integer       | No       | 2048    | Maximum tokens in the completion                      |
| `top_p`             | float         | No       | 1       | Nucleus sampling threshold                            |
| `stream`            | boolean       | No       | false   | Stream tokens as they arrive                          |
| `stop`              | string\|array | No       | —       | Up to 4 stop sequences                                |
| `n`                 | integer       | No       | 1       | Number of completions                                 |
| `frequency_penalty` | float         | No       | —       | -2 to 2                                               |
| `presence_penalty`  | float         | No       | —       | -2 to 2                                               |
| `seed`              | integer       | No       | —       | Deterministic sampling                                |
| `logprobs`          | boolean       | No       | —       | Return log probabilities                              |
| `top_logprobs`      | integer       | No       | —       | 0–5 top log probs per position                        |
| `user`              | string        | No       | —       | End-user identifier                                   |

### Structured output & tool parameters

| Parameter             | Type           | Description                                                                  |
| --------------------- | -------------- | ---------------------------------------------------------------------------- |
| `response_format`     | object         | `{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}` |
| `tools`               | array          | Function/tool definitions                                                    |
| `tool_choice`         | string\|object | `auto`, `none`, `required`, or `{"type": "function", "name": "..."}`         |
| `parallel_tool_calls` | boolean        | Enable parallel function calls                                               |

See [Structured Outputs](/docs/api-reference/chat-completions/structured-outputs) and [Function Calling](/docs/api-reference/chat-completions/function-calling) for full details.

### Forii extensions

| Parameter                          | Type    | Description                                                                                                               |
| ---------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `reasoning_effort`                 | string  | `none` \| `low` \| `medium` \| `high` — see [Reasoning Models](/docs/api-reference/chat-completions/reasoning)            |
| `context_length_exceeded_behavior` | string  | `"truncate"` (default) or `"error"` — see [Context & Metadata](/docs/api-reference/chat-completions/context-and-metadata) |
| `repetition_penalty`               | float   | 0–2, applies to both prompt and output                                                                                    |
| `top_k`                            | integer | Top-K sampling                                                                                                            |
| `metadata`                         | object  | Key-value string metadata for cost attribution                                                                            |

## Streaming

Stream tokens as they arrive using Server-Sent Events (SSE).

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    stream = client.chat.completions.create(
        model="forii/deepseek-v3",
        messages=[{"role": "user", "content": "Tell me a story"}],
        stream=True,
    )

    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

    # Usage stats in final chunk
    # if chunk.usage:
    #     print(f"\nTokens: {chunk.usage.total_tokens}")
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const stream = await client.chat.completions.create({
      model: "forii/deepseek-v3",
      messages: [{ role: "user", content: "Tell me a story" }],
      stream: true,
    });

    for await (const chunk of stream) {
      process.stdout.write(chunk.choices[0]?.delta?.content || "");
    }
    ```
  </Tab>
</Tabs>

### Streaming response format

```
data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-forii-abc123","object":"chat.completion.chunk",
  "choices":[{"index":0,"delta":{},"finish_reason":"stop"}],
  "usage":{"prompt_tokens":18,"completion_tokens":156,"total_tokens":174}}

data: [DONE]
```
