All errors follow the OpenAI format:
{
"error": {
"message": "Model 'forii/invalid-model' not found",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}
Error codes
| Code | Meaning | Retry? | Notes |
|---|
| 400 | Bad request (bad parameters, missing fields) | No | Check your request body |
| 401 | Invalid or missing API key | No | Verify your FORII_API_KEY |
| 402 | Payment required (quota exceeded) | No | Free Plan limit reached — wait for the next reset |
| 404 | Model not found | No | Check model name in Models |
| 429 | Rate limit exceeded | Yes (backoff) | Includes Retry-After header |
| 500 | Internal server error | Yes | Retry with exponential backoff |
| 503 | Model temporarily unavailable | Yes | Retry after brief wait |
The OpenAI SDK has built-in retry for 429, 500, and 503 errors. If you’re using the SDK, retries happen automatically.
Rate limits
Request-level limits
| Plan | RPM | TPM (prompt) | TPM (completion) |
|---|
| Free | 60 | 100K | 10K |
| Starter | 600 | 1M | 100K |
| Pro | 6,000 | 10M | 1M |
Every response includes rate limit headers:
X-Ratelimit-Limit-Requests: 600
X-Ratelimit-Remaining-Requests: 543
X-Ratelimit-Reset: 1705312200
| Header | Description |
|---|
X-Ratelimit-Limit-Requests | Maximum RPM for your plan |
X-Ratelimit-Remaining-Requests | Requests remaining in current window |
X-Ratelimit-Reset | Unix timestamp when the window resets |
Handling 429 errors
import time
try:
response = client.chat.completions.create(
model="forii/deepseek-v3",
messages=[{"role": "user", "content": "Hello"}],
)
except openai.RateLimitError as e:
retry_after = int(e.response.headers.get("Retry-After", 5))
time.sleep(retry_after)
# Retry the request
These response headers are not yet available. They are planned for a future release.
| Header | Description |
|---|
x-forii-prompt-tokens | Token count verification without parsing body |
x-forii-completion-tokens | Token count verification without parsing body |
x-forii-cached-tokens | Cache hit visibility (when prompt caching ships) |
x-forii-ttft-ms | Time to first token — key latency metric |
x-forii-total-ms | Total request time |
x-forii-model | Actual model served (resolves aliases) |
x-forii-request-id | For debugging — trace individual requests |
x-forii-region | Which data center served the request |
x-forii-region is an India-specific addition. Forii runs in India under Indian jurisdiction — developers need to verify their requests are served locally, not routed to US servers subject to the CLOUD Act.