Observability Overview - Forii — India's Sovereign Inference Platform

Every Forii response includes token counts. Every error follows the OpenAI format. Rate limit headers tell you when to back off.

Available now

Token counts in response

{
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 156,
    "total_tokens": 174,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

Every response includes token counts. cached_tokens enables prompt caching discounts later.

Error codes

Code	Meaning	Retry?
400	Bad request	No
401	Invalid API key	No
402	Quota exceeded	No
404	Model not found	No
429	Rate limit	Yes (backoff)
500	Internal error	Yes
503	Unavailable	Yes

Full details: Errors & Rate Limits

Rate limit headers

X-Ratelimit-Limit-Requests: 600
X-Ratelimit-Remaining-Requests: 543
X-Ratelimit-Reset: 1705312200

Dashboard

The control panel provides:

API keys — create and delete keys
Usage — token limit and token counter by model, next reset time
Recent requests — last few API requests and responses

Coming soon

Response headers

Header	Description
`x-forii-prompt-tokens`	Token count verification without parsing body
`x-forii-completion-tokens`	Token count verification without parsing body
`x-forii-cached-tokens`	Cache hit visibility (when prompt caching ships)
`x-forii-ttft-ms`	Time to first token
`x-forii-total-ms`	Total request time
`x-forii-model`	Actual model served (resolves aliases)
`x-forii-request-id`	Trace individual requests
`x-forii-region`	Which data center served the request

x-forii-region lets you verify your requests are served from India, not routed to US servers. Indian jurisdiction applies to all request data — no US CLOUD Act exposure.

CLI observability

forii chat --verbose          # TTFT, total time, tokens, cost, model, region
forii chat --save-response resp.json  # Dump full response with headers
forii models list              # Available models, context windows, pricing
forii usage                    # Token usage this period, cost estimate
forii usage --by-model        # Break down per model
forii usage --by-key           # Break down per API key

Request annotations

curl https://api.forii.in/inference/v1/chat/completions \
  -H "Authorization: Bearer $FORII_API_KEY" \
  -H "x-forii-annotations: team=search,project=ranker,environment=prod" \
  -d '{"model":"forii/deepseek-v3","messages":[...]}'

Attribute costs to teams, projects, and environments.

Planned

Advanced dashboard

Latency percentiles (p50/p90/p99) by model
Error rate trends
Cache hit rate visualization
Rate limit utilization
Region breakdown
Annotations filtering
CSV/JSON export

Prometheus metrics endpoint

global:
  scrape_interval: 60s
scrape_configs:
  - job_name: 'forii'
    metrics_path: 'v1/accounts/{account_id}/metrics'
    authorization:
      type: Bearer
      credentials: YOUR_FORII_API_KEY
    static_configs:
      - targets: ['api.forii.in']
    scheme: https

External integrations

Integration	How
Prometheus	Scrape metrics endpoint
Grafana Cloud	Direct ingestion
Datadog	Agent Prometheus receiver
OpenTelemetry	Prometheus receiver → OTel exporter
LangSmith / Langfuse	Callback/tracing SDK integration

Usage API — Programmatic access to usage data
Errors & Rate Limits — Error codes and headers
Dashboard — Usage and requests in the UI

​Available now

​Token counts in response

​Error codes

​Rate limit headers

​Dashboard

​Coming soon

​Response headers

​CLI observability

​Request annotations

​Planned

​Advanced dashboard

​Prometheus metrics endpoint

​External integrations

​Related

Available now

Token counts in response

Error codes

Rate limit headers

Dashboard

Coming soon

Response headers

CLI observability

Request annotations

Planned

Advanced dashboard

Prometheus metrics endpoint

External integrations

Related