Batch Inference - Forii — India's Sovereign Inference Platform

This endpoint is not yet available. It is planned for a future release.

Process large volumes of requests asynchronously — benchmarking, document backlogs, bulk embeddings. Matches OpenAI’s batch API format.

Planned endpoints

POST /inference/v1/batch — Create a batch job
GET /inference/v1/batch/{id} — Check batch status and retrieve results

How it will work

Upload a JSONL file where each line is a request: {"custom_id": "...", "body": {...}}
Forii processes all requests asynchronously
Retrieve results when the batch completes

Batch pricing will be 50% lower than real-time inference, matching industry standard.

India use cases

Document digitization backlogs — Process millions of Hindi/regional-language documents
Bulk embeddings — Generate embeddings for existing document collections
Model evaluation — Benchmark models on custom datasets

Chat Completions — Real-time inference
Roadmap — feature timeline

Reranking Fine-Tuning