2025-06 — Initial Release
Inference API
- Chat completions (basic, streaming, structured outputs, function calling)
- Embeddings (forii/embed-v3)
- Model listing (GET /inference/v1/models)
- OpenAI SDK compatibility (Python, JavaScript, cURL)
- Reasoning effort parameter for reasoning models
- Context length exceeded behavior (truncate)
- Request metadata for cost attribution
Models
- forii/deepseek-v3 (671B MoE, 128K context, AWQ-4bit)
- forii/llama-4-scout (17B MoE, 10M context, AWQ-4bit)
- forii/gemma-3 (27B, 128K context, AWQ-4bit)
- forii/qwen3 (32B, 128K context, AWQ-4bit)
- forii/embed-v3 (8K context, FP16)
Platform
- Control panel at app.forii.in (API keys, usage, recent requests)
- Free Plan (60 RPM, 100K prompt TPM, 10K completion TPM)
- Rate limiting (RPM/TPM, per-minute reset in UTC)
Observability
- Token counts in every response
- Rate limit headers
- Error codes (OpenAI format)
- Control panel (API keys, usage by model, recent requests)