Forii serves curated frontier models on Indian infrastructure at 30% lower cost. Your data stays in India under Indian jurisdiction — no US routing, no CLOUD Act exposure. Every model is quantized, evaluated on quality benchmarks — including Hindi — before deployment.
Available models
| Model | Family | Parameters | Context Window | ₹/1K Input | ₹/1K Output |
|---|
forii/deepseek-v3 | DeepSeek | 671B MoE (37B active) | 128K | ₹0.14 | ₹0.42 |
forii/llama-4-scout | LLaMA | 17B MoE (16 experts) | 10M | ₹0.28 | ₹0.84 |
forii/gemma-3 | Gemma | 27B | 128K | ₹0.22 | ₹0.66 |
forii/qwen3 | Qwen | 32B | 128K | ₹0.18 | ₹0.54 |
forii/embed-v3 | Embedding | — | 8K | ₹0.003 | — |
Pricing is in Indian Rupees (₹) per 1,000 tokens. All costs are shown in INR on the dashboard — no USD conversion needed.
Model selection guide
| If you need… | Use this model | Why |
|---|
| Best overall quality | forii/deepseek-v3 | MoE architecture, strong on Hindi + English, best value per token |
| Long context (documents, code) | forii/llama-4-scout | 10M token context window, handles massive documents |
| Fast, efficient responses | forii/gemma-3 | Smaller model, lower latency, good multilingual support |
| Reasoning tasks | forii/qwen3 | Built-in thinking mode, strong on math and logic |
| Semantic search / RAG | forii/embed-v3 | Purpose-built for embeddings, low latency |
Quantization
All chat models are served in AWQ 4-bit quantization. This provides:
- Quality within 1–2% of FP16 — verified on MMLU, HumanEval, GSM8K, and HellaSwag benchmarks
- ~75% memory reduction — enables multi-model packing on a single GPU
- Higher throughput — more tokens per second per GPU
Quality is non-negotiable. Every quantized model passes our evaluation suite before deployment. Models that regress beyond quality thresholds are rejected.
Reasoning models
For models that support chain-of-thought (DeepSeek-R1, Qwen3), use the reasoning_effort parameter:
response = client.chat.completions.create(
model="forii/deepseek-r1",
messages=[{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}],
reasoning_effort="high", # low | medium | high
)
The reasoning_content field in the response contains the model’s chain-of-thought reasoning.
Listing models programmatically
curl https://api.forii.in/inference/v1/models \
-H "Authorization: Bearer $FORII_API_KEY"
{
"object": "list",
"data": [
{"id": "forii/deepseek-v3", "object": "model", "owned_by": "forii"},
{"id": "forii/qwen3", "object": "model", "owned_by": "forii"},
{"id": "forii/embed-v3", "object": "model", "owned_by": "forii"}
]
}
Coming soon
| Model | Type | Expected |
|---|
forii/saarika-v2 | Speech-to-text (22 Indic languages) | Coming soon |
forii/bulbul-v2 | Text-to-speech (30+ voices) | Coming soon |
forii/qwen2.5-vl-72b | Vision/multimodal | Coming soon |
forii/rerank-v3 | Reranking | Coming soon |