Models - Forii — India's Sovereign Inference Platform

Forii serves curated frontier models on Indian infrastructure at 30% lower cost. Your data stays in India under Indian jurisdiction — no US routing, no CLOUD Act exposure. Every model is quantized, evaluated on quality benchmarks — including Hindi — before deployment.

Available models

Model	Family	Parameters	Context Window	₹/1K Input	₹/1K Output
`forii/deepseek-v3`	DeepSeek	671B MoE (37B active)	128K	₹0.14	₹0.42
`forii/llama-4-scout`	LLaMA	17B MoE (16 experts)	10M	₹0.28	₹0.84
`forii/gemma-3`	Gemma	27B	128K	₹0.22	₹0.66
`forii/qwen3`	Qwen	32B	128K	₹0.18	₹0.54
`forii/embed-v3`	Embedding	—	8K	₹0.003	—

Pricing is in Indian Rupees (₹) per 1,000 tokens. All costs are shown in INR on the dashboard — no USD conversion needed.

Model selection guide

If you need…	Use this model	Why
Best overall quality	`forii/deepseek-v3`	MoE architecture, strong on Hindi + English, best value per token
Long context (documents, code)	`forii/llama-4-scout`	10M token context window, handles massive documents
Fast, efficient responses	`forii/gemma-3`	Smaller model, lower latency, good multilingual support
Reasoning tasks	`forii/qwen3`	Built-in thinking mode, strong on math and logic
Semantic search / RAG	`forii/embed-v3`	Purpose-built for embeddings, low latency

Quantization

All chat models are served in AWQ 4-bit quantization. This provides:

Quality within 1–2% of FP16 — verified on MMLU, HumanEval, GSM8K, and HellaSwag benchmarks
~75% memory reduction — enables multi-model packing on a single GPU
Higher throughput — more tokens per second per GPU

Quality is non-negotiable. Every quantized model passes our evaluation suite before deployment. Models that regress beyond quality thresholds are rejected.

Reasoning models

For models that support chain-of-thought (DeepSeek-R1, Qwen3), use the reasoning_effort parameter:

response = client.chat.completions.create(
    model="forii/deepseek-r1",
    messages=[{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}],
    reasoning_effort="high",  # low | medium | high
)

The reasoning_content field in the response contains the model’s chain-of-thought reasoning.

Listing models programmatically

curl https://api.forii.in/inference/v1/models \
  -H "Authorization: Bearer $FORII_API_KEY"

{
  "object": "list",
  "data": [
    {"id": "forii/deepseek-v3", "object": "model", "owned_by": "forii"},
    {"id": "forii/qwen3", "object": "model", "owned_by": "forii"},
    {"id": "forii/embed-v3", "object": "model", "owned_by": "forii"}
  ]
}

Coming soon

Model	Type	Expected
`forii/saarika-v2`	Speech-to-text (22 Indic languages)	Coming soon
`forii/bulbul-v2`	Text-to-speech (30+ voices)	Coming soon
`forii/qwen2.5-vl-72b`	Vision/multimodal	Coming soon
`forii/rerank-v3`	Reranking	Coming soon

​Available models

​Model selection guide

​Quantization

​Reasoning models

​Listing models programmatically

​Coming soon

Available models

Model selection guide

Quantization

Reasoning models

Listing models programmatically

Coming soon