Skip to main content
Observability Dashboard Every Forii response includes token counts. Every error follows the OpenAI format. Rate limit headers tell you when to back off.

Available now

Token counts in response

{
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 156,
    "total_tokens": 174,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}
Every response includes token counts. cached_tokens enables prompt caching discounts later.

Error codes

CodeMeaningRetry?
400Bad requestNo
401Invalid API keyNo
402Quota exceededNo
404Model not foundNo
429Rate limitYes (backoff)
500Internal errorYes
503UnavailableYes
Full details: Errors & Rate Limits

Rate limit headers

X-Ratelimit-Limit-Requests: 600
X-Ratelimit-Remaining-Requests: 543
X-Ratelimit-Reset: 1705312200

Dashboard

The control panel provides:
  • API keys — create and delete keys
  • Usage — token limit and token counter by model, next reset time
  • Recent requests — last few API requests and responses

Coming soon

Response headers

HeaderDescription
x-forii-prompt-tokensToken count verification without parsing body
x-forii-completion-tokensToken count verification without parsing body
x-forii-cached-tokensCache hit visibility (when prompt caching ships)
x-forii-ttft-msTime to first token
x-forii-total-msTotal request time
x-forii-modelActual model served (resolves aliases)
x-forii-request-idTrace individual requests
x-forii-regionWhich data center served the request
x-forii-region lets you verify your requests are served from India, not routed to US servers. Indian jurisdiction applies to all request data — no US CLOUD Act exposure.

CLI observability

forii chat --verbose          # TTFT, total time, tokens, cost, model, region
forii chat --save-response resp.json  # Dump full response with headers
forii models list              # Available models, context windows, pricing
forii usage                    # Token usage this period, cost estimate
forii usage --by-model        # Break down per model
forii usage --by-key           # Break down per API key

Request annotations

curl https://api.forii.in/inference/v1/chat/completions \
  -H "Authorization: Bearer $FORII_API_KEY" \
  -H "x-forii-annotations: team=search,project=ranker,environment=prod" \
  -d '{"model":"forii/deepseek-v3","messages":[...]}'
Attribute costs to teams, projects, and environments.

Planned

Advanced dashboard

  • Latency percentiles (p50/p90/p99) by model
  • Error rate trends
  • Cache hit rate visualization
  • Rate limit utilization
  • Region breakdown
  • Annotations filtering
  • CSV/JSON export

Prometheus metrics endpoint

global:
  scrape_interval: 60s
scrape_configs:
  - job_name: 'forii'
    metrics_path: 'v1/accounts/{account_id}/metrics'
    authorization:
      type: Bearer
      credentials: YOUR_FORII_API_KEY
    static_configs:
      - targets: ['api.forii.in']
    scheme: https

External integrations

IntegrationHow
PrometheusScrape metrics endpoint
Grafana CloudDirect ingestion
DatadogAgent Prometheus receiver
OpenTelemetryPrometheus receiver → OTel exporter
LangSmith / LangfuseCallback/tracing SDK integration