> ## Documentation Index
> Fetch the complete documentation index at: https://docs.forii.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Models

> Available models, context windows, quantization, and pricing

Forii serves curated frontier models on Indian infrastructure at 30% lower cost. Your data stays in India under Indian jurisdiction — no US routing, no CLOUD Act exposure. Every model is quantized, evaluated on quality benchmarks — including Hindi — before deployment.

## Available models

| Model                 | Family    | Parameters            | Context Window | ₹/1K Input | ₹/1K Output |
| --------------------- | --------- | --------------------- | -------------- | ---------- | ----------- |
| `forii/deepseek-v3`   | DeepSeek  | 671B MoE (37B active) | 128K           | ₹0.14      | ₹0.42       |
| `forii/llama-4-scout` | LLaMA     | 17B MoE (16 experts)  | 10M            | ₹0.28      | ₹0.84       |
| `forii/gemma-3`       | Gemma     | 27B                   | 128K           | ₹0.22      | ₹0.66       |
| `forii/qwen3`         | Qwen      | 32B                   | 128K           | ₹0.18      | ₹0.54       |
| `forii/embed-v3`      | Embedding | —                     | 8K             | ₹0.003     | —           |

<Info>
  Pricing is in Indian Rupees (₹) per 1,000 tokens. All costs are shown in INR on the dashboard — no USD conversion needed.
</Info>

## Model selection guide

| If you need...                 | Use this model        | Why                                                               |
| ------------------------------ | --------------------- | ----------------------------------------------------------------- |
| Best overall quality           | `forii/deepseek-v3`   | MoE architecture, strong on Hindi + English, best value per token |
| Long context (documents, code) | `forii/llama-4-scout` | 10M token context window, handles massive documents               |
| Fast, efficient responses      | `forii/gemma-3`       | Smaller model, lower latency, good multilingual support           |
| Reasoning tasks                | `forii/qwen3`         | Built-in thinking mode, strong on math and logic                  |
| Semantic search / RAG          | `forii/embed-v3`      | Purpose-built for embeddings, low latency                         |

## Quantization

All chat models are served in AWQ 4-bit quantization. This provides:

* **Quality within 1–2% of FP16** — verified on MMLU, HumanEval, GSM8K, and HellaSwag benchmarks
* **\~75% memory reduction** — enables multi-model packing on a single GPU
* **Higher throughput** — more tokens per second per GPU

<Info>
  Quality is non-negotiable. Every quantized model passes our evaluation suite before deployment. Models that regress beyond quality thresholds are rejected.
</Info>

## Reasoning models

For models that support chain-of-thought (DeepSeek-R1, Qwen3), use the `reasoning_effort` parameter:

```python theme={null}
response = client.chat.completions.create(
    model="forii/deepseek-r1",
    messages=[{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}],
    reasoning_effort="high",  # low | medium | high
)
```

The `reasoning_content` field in the response contains the model's chain-of-thought reasoning.

## Listing models programmatically

```bash theme={null}
curl https://api.forii.in/inference/v1/models \
  -H "Authorization: Bearer $FORII_API_KEY"
```

```json theme={null}
{
  "object": "list",
  "data": [
    {"id": "forii/deepseek-v3", "object": "model", "owned_by": "forii"},
    {"id": "forii/qwen3", "object": "model", "owned_by": "forii"},
    {"id": "forii/embed-v3", "object": "model", "owned_by": "forii"}
  ]
}
```

## Coming soon

| Model                  | Type                                | Expected    |
| ---------------------- | ----------------------------------- | ----------- |
| `forii/saarika-v2`     | Speech-to-text (22 Indic languages) | Coming soon |
| `forii/bulbul-v2`      | Text-to-speech (30+ voices)         | Coming soon |
| `forii/qwen2.5-vl-72b` | Vision/multimodal                   | Coming soon |
| `forii/rerank-v3`      | Reranking                           | Coming soon |