Planned endpoints
POST /inference/v1/batch— Create a batch jobGET /inference/v1/batch/{id}— Check batch status and retrieve results
How it will work
- Upload a JSONL file where each line is a request:
{"custom_id": "...", "body": {...}} - Forii processes all requests asynchronously
- Retrieve results when the batch completes
Batch pricing will be 50% lower than real-time inference, matching industry standard.
India use cases
- Document digitization backlogs — Process millions of Hindi/regional-language documents
- Bulk embeddings — Generate embeddings for existing document collections
- Model evaluation — Benchmark models on custom datasets
Related
- Chat Completions — Real-time inference
- Roadmap — feature timeline