Skip to main content
This endpoint is not yet available. It is planned for a future release.
Process large volumes of requests asynchronously — benchmarking, document backlogs, bulk embeddings. Matches OpenAI’s batch API format.

Planned endpoints

  • POST /inference/v1/batch — Create a batch job
  • GET /inference/v1/batch/{id} — Check batch status and retrieve results

How it will work

  1. Upload a JSONL file where each line is a request: {"custom_id": "...", "body": {...}}
  2. Forii processes all requests asynchronously
  3. Retrieve results when the batch completes
Batch pricing will be 50% lower than real-time inference, matching industry standard.

India use cases

  • Document digitization backlogs — Process millions of Hindi/regional-language documents
  • Bulk embeddings — Generate embeddings for existing document collections
  • Model evaluation — Benchmark models on custom datasets