Every Forii response includes token counts. Every error follows the OpenAI format. Rate limit headers tell you when to back off.
Available now
Token counts in response
{
"usage": {
"prompt_tokens": 18,
"completion_tokens": 156,
"total_tokens": 174,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}
Every response includes token counts. cached_tokens enables prompt caching discounts later.
Error codes
| Code | Meaning | Retry? |
|---|
| 400 | Bad request | No |
| 401 | Invalid API key | No |
| 402 | Quota exceeded | No |
| 404 | Model not found | No |
| 429 | Rate limit | Yes (backoff) |
| 500 | Internal error | Yes |
| 503 | Unavailable | Yes |
Full details: Errors & Rate Limits
X-Ratelimit-Limit-Requests: 600
X-Ratelimit-Remaining-Requests: 543
X-Ratelimit-Reset: 1705312200
Dashboard
The control panel provides:
- API keys — create and delete keys
- Usage — token limit and token counter by model, next reset time
- Recent requests — last few API requests and responses
Coming soon
| Header | Description |
|---|
x-forii-prompt-tokens | Token count verification without parsing body |
x-forii-completion-tokens | Token count verification without parsing body |
x-forii-cached-tokens | Cache hit visibility (when prompt caching ships) |
x-forii-ttft-ms | Time to first token |
x-forii-total-ms | Total request time |
x-forii-model | Actual model served (resolves aliases) |
x-forii-request-id | Trace individual requests |
x-forii-region | Which data center served the request |
x-forii-region lets you verify your requests are served from India, not routed to US servers. Indian jurisdiction applies to all request data — no US CLOUD Act exposure.
CLI observability
forii chat --verbose # TTFT, total time, tokens, cost, model, region
forii chat --save-response resp.json # Dump full response with headers
forii models list # Available models, context windows, pricing
forii usage # Token usage this period, cost estimate
forii usage --by-model # Break down per model
forii usage --by-key # Break down per API key
Request annotations
curl https://api.forii.in/inference/v1/chat/completions \
-H "Authorization: Bearer $FORII_API_KEY" \
-H "x-forii-annotations: team=search,project=ranker,environment=prod" \
-d '{"model":"forii/deepseek-v3","messages":[...]}'
Attribute costs to teams, projects, and environments.
Planned
Advanced dashboard
- Latency percentiles (p50/p90/p99) by model
- Error rate trends
- Cache hit rate visualization
- Rate limit utilization
- Region breakdown
- Annotations filtering
- CSV/JSON export
Prometheus metrics endpoint
global:
scrape_interval: 60s
scrape_configs:
- job_name: 'forii'
metrics_path: 'v1/accounts/{account_id}/metrics'
authorization:
type: Bearer
credentials: YOUR_FORII_API_KEY
static_configs:
- targets: ['api.forii.in']
scheme: https
External integrations
| Integration | How |
|---|
| Prometheus | Scrape metrics endpoint |
| Grafana Cloud | Direct ingestion |
| Datadog | Agent Prometheus receiver |
| OpenTelemetry | Prometheus receiver → OTel exporter |
| LangSmith / Langfuse | Callback/tracing SDK integration |