FAQ - Forii — India's Sovereign Inference Platform

General

What is Forii?

Forii is India’s Sovereign Inference Platform. Run any frontier model — DeepSeek-V3, LLaMA-4-Scout, Gemma-3, Qwen3 — on Indian infrastructure with full data sovereignty, independent of US providers. 30% lower cost, INR pricing.

How is Forii different from OpenAI?

Three things: (1) Frontier models at 30% lower cost — DeepSeek-V3, LLaMA-4, Gemma-3, Qwen3, (2) Data sovereignty — Indian jurisdiction, no US CLOUD Act, no data routed through American servers, (3) INR pricing. You switch by changing base_url and api_key.

How is Forii different from Fireworks?

Fireworks is US-based with USD pricing. Forii offers the same frontier models on Indian infrastructure with data sovereignty — no US CLOUD Act exposure. INR pricing, Razorpay payments. STT/TTS for 22 Indic languages coming soon.

Is my data processed in India?

Yes. All inference runs in Indian data centers (Delhi NCR). No data leaves India, no US jurisdiction applies.

API & Compatibility

Do I need a Forii SDK?

No. Forii is OpenAI-compatible. Use the official OpenAI SDK and change two lines: base_url and api_key.

Which frameworks work with Forii?

LangChain, LlamaIndex, Vercel AI SDK, LiteLLM — any framework that supports OpenAI’s API format. See Overview → Framework compatibility.

Does Forii support streaming?

Yes. Set stream=True in your request. Works identically to OpenAI’s SSE streaming.

Does Forii support structured outputs?

Yes. Use response_format with json_object or json_schema. See Chat Completions → Structured outputs.

Does Forii support function calling?

Yes. Use the tools parameter. See Chat Completions → Function calling.

Pricing & Plans

How does pricing work?

Forii is on the Free Plan today. No credits, no payment — just sign up and start building. Per-minute rate limits apply (60 RPM, 100K prompt TPM, 10K completion TPM). See Pricing for model rates and roadmap tiers.

What payment methods do you accept?

None yet. Paid tiers and payments (UPI, cards, netbanking, wallets via Razorpay) are on the roadmap.

Can I get GST invoices?

Not yet. GST-compliant invoices ship with paid tiers. Today there is nothing to invoice.

What happens if I hit a limit?

Per-minute limits reset automatically at the start of the next minute (UTC). Wait for the reset, or spread your load more evenly. If you consistently need more, paid tiers are coming.

Models

Which models are available?

See Models for the full catalog and pricing.

Why are models quantized?

AWQ 4-bit quantization reduces memory usage by ~75% while keeping quality within 1–2% of FP16. This enables multi-model packing on a single GPU, which is how Forii achieves 30% lower COGS.

Are quantized models less capable?

No. Every model passes evaluation benchmarks (MMLU, HumanEval, GSM8K, HellaSwag) before deployment. If quality regresses beyond the threshold, the model is rejected.

What about Hindi quality?

Every model is also evaluated on MMMU-Hindi. If quantization degrades Hindi quality beyond 5%, the variant is rejected. See Models → Quantization.

Rate Limits

What are the rate limits?

The Free Plan gives you 60 RPM, 100K prompt TPM, and 10K completion TPM. See Authentication → Rate limits.

What happens when I hit a rate limit?

You receive a 429 Too Many Requests response with a Retry-After header. The OpenAI SDK retries automatically. Per-minute counters reset at the start of the next minute (UTC).

Can I increase my rate limits?

Not yet. Paid tiers with higher limits (Starter 600 RPM, Pro 6,000 RPM, Enterprise custom) are on the roadmap.

​General

​What is Forii?

​How is Forii different from OpenAI?

​How is Forii different from Fireworks?

​Is my data processed in India?

​API & Compatibility

​Do I need a Forii SDK?

​Which frameworks work with Forii?

​Does Forii support streaming?

​Does Forii support structured outputs?

​Does Forii support function calling?

​Pricing & Plans

​How does pricing work?

​What payment methods do you accept?

​Can I get GST invoices?

​What happens if I hit a limit?

​Models

​Which models are available?

​Why are models quantized?

​Are quantized models less capable?

​What about Hindi quality?

​Rate Limits

​What are the rate limits?

​What happens when I hit a rate limit?

​Can I increase my rate limits?