Skip to main content

2025-06 — Initial Release

Inference API

  • Chat completions (basic, streaming, structured outputs, function calling)
  • Embeddings (forii/embed-v3)
  • Model listing (GET /inference/v1/models)
  • OpenAI SDK compatibility (Python, JavaScript, cURL)
  • Reasoning effort parameter for reasoning models
  • Context length exceeded behavior (truncate)
  • Request metadata for cost attribution

Models

  • forii/deepseek-v3 (671B MoE, 128K context, AWQ-4bit)
  • forii/llama-4-scout (17B MoE, 10M context, AWQ-4bit)
  • forii/gemma-3 (27B, 128K context, AWQ-4bit)
  • forii/qwen3 (32B, 128K context, AWQ-4bit)
  • forii/embed-v3 (8K context, FP16)

Platform

  • Control panel at app.forii.in (API keys, usage, recent requests)
  • Free Plan (60 RPM, 100K prompt TPM, 10K completion TPM)
  • Rate limiting (RPM/TPM, per-minute reset in UTC)

Observability

  • Token counts in every response
  • Rate limit headers
  • Error codes (OpenAI format)
  • Control panel (API keys, usage by model, recent requests)