Skip to main content
Forii serves curated frontier models on Indian infrastructure at 30% lower cost. Your data stays in India under Indian jurisdiction — no US routing, no CLOUD Act exposure. Every model is quantized, evaluated on quality benchmarks — including Hindi — before deployment.

Available models

ModelFamilyParametersContext Window₹/1K Input₹/1K Output
forii/deepseek-v3DeepSeek671B MoE (37B active)128K₹0.14₹0.42
forii/llama-4-scoutLLaMA17B MoE (16 experts)10M₹0.28₹0.84
forii/gemma-3Gemma27B128K₹0.22₹0.66
forii/qwen3Qwen32B128K₹0.18₹0.54
forii/embed-v3Embedding8K₹0.003
Pricing is in Indian Rupees (₹) per 1,000 tokens. All costs are shown in INR on the dashboard — no USD conversion needed.

Model selection guide

If you need…Use this modelWhy
Best overall qualityforii/deepseek-v3MoE architecture, strong on Hindi + English, best value per token
Long context (documents, code)forii/llama-4-scout10M token context window, handles massive documents
Fast, efficient responsesforii/gemma-3Smaller model, lower latency, good multilingual support
Reasoning tasksforii/qwen3Built-in thinking mode, strong on math and logic
Semantic search / RAGforii/embed-v3Purpose-built for embeddings, low latency

Quantization

All chat models are served in AWQ 4-bit quantization. This provides:
  • Quality within 1–2% of FP16 — verified on MMLU, HumanEval, GSM8K, and HellaSwag benchmarks
  • ~75% memory reduction — enables multi-model packing on a single GPU
  • Higher throughput — more tokens per second per GPU
Quality is non-negotiable. Every quantized model passes our evaluation suite before deployment. Models that regress beyond quality thresholds are rejected.

Reasoning models

For models that support chain-of-thought (DeepSeek-R1, Qwen3), use the reasoning_effort parameter:
response = client.chat.completions.create(
    model="forii/deepseek-r1",
    messages=[{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}],
    reasoning_effort="high",  # low | medium | high
)
The reasoning_content field in the response contains the model’s chain-of-thought reasoning.

Listing models programmatically

curl https://api.forii.in/inference/v1/models \
  -H "Authorization: Bearer $FORII_API_KEY"
{
  "object": "list",
  "data": [
    {"id": "forii/deepseek-v3", "object": "model", "owned_by": "forii"},
    {"id": "forii/qwen3", "object": "model", "owned_by": "forii"},
    {"id": "forii/embed-v3", "object": "model", "owned_by": "forii"}
  ]
}

Coming soon

ModelTypeExpected
forii/saarika-v2Speech-to-text (22 Indic languages)Coming soon
forii/bulbul-v2Text-to-speech (30+ voices)Coming soon
forii/qwen2.5-vl-72bVision/multimodalComing soon
forii/rerank-v3RerankingComing soon