Deployments - Forii — India's Sovereign Inference Platform

This endpoint is not yet available. It is planned for a future release.

Dedicated GPU deployments with autoscaling, scale-to-zero, and per-deployment configuration. For enterprises that need guaranteed throughput, custom models, or data residency guarantees.

Planned endpoints

Operation	Method	Path
Create deployment	POST	`/v1/accounts/{id}/deployments`
Get deployment	GET	`/v1/accounts/{id}/deployments/{id}`
List deployments	GET	`/v1/accounts/{id}/deployments`
Update deployment	PATCH	`/v1/accounts/{id}/deployments/{id}`
Delete deployment	DELETE	`/v1/accounts/{id}/deployments/{id}`
Scale deployment	POST	`/v1/accounts/{id}/deployments/{id}:scale`

Planned autoscaling options

Flag	Default	Description
`min_replica_count`	0	Scale to zero when idle
`max_replica_count`	1	Maximum replicas
`scale_up_window`	30s	Wait before scaling up
`scale_down_window`	10m	Wait before scaling down

Serverless inference is the right choice for most developers. Deployments are for enterprises with guaranteed throughput requirements.

Models — Available models for deployment
Roadmap — feature timeline

Completions (Legacy)Dashboard

​Planned endpoints

​Planned autoscaling options

​Related

Planned endpoints

Planned autoscaling options

Related