Skip to main content

What is Forii?

Forii is India’s Sovereign Inference Platform — run any frontier model on Indian infrastructure, independent of US cloud providers. Your data stays in India, your costs drop 30%, and you pay in rupees. DeepSeek-V3, LLaMA-4-Scout, Gemma-3, Qwen3 — all production-ready. Change two lines of code, deploy in minutes.

Why Forii?

30% Lower Cost

Continuous batching, INT4/AWQ quantization, and prompt caching compound to deliver frontier inference at 30% below self-deployment cost.

OpenAI-Compatible

Swap base_url and api_key — that’s it. Works with LangChain, LlamaIndex, Vercel AI SDK, and every OpenAI SDK out of the box.

Data Sovereignty

Indian data centers, Indian jurisdiction. No data routed through US servers, no US subpoenas, no CLOUD Act exposure. INR pricing.

Core Capabilities

Chat Completions

Text generation with streaming, structured outputs, and function calling. The endpoint every framework calls first.

Embeddings

Semantic search over Hindi and English documents. Power RAG pipelines with low-latency embeddings from Indian servers.

Frontier Models

DeepSeek-V3, LLaMA-4-Scout, Gemma-3, Qwen3, and forii/embed-v3. Curated, quantized, quality-verified before deployment.

Who is this for?

  • Developers building AI products for Indian users — chatbots, document processors, voice agents, RAG systems
  • Startups paying in USD for inference routed through US data centers
  • Enterprises that need data sovereignty — Indian jurisdiction, no US cloud dependency, INR pricing

Make your first request

from openai import OpenAI

client = OpenAI(
    base_url="https://api.forii.in/inference/v1",
    api_key=os.environ["FORII_API_KEY"],
)

response = client.chat.completions.create(
    model="forii/deepseek-v3",
    messages=[{"role": "user", "content": "नमस्ते, कैसे हो?"}],
    max_tokens=512,
)

print(response.choices[0].message.content)
Follow the Quick Start guide to get your API key and make your first request.