> ## Documentation Index
> Fetch the complete documentation index at: https://docs.forii.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deployments

> Dedicated GPU deployments for enterprises — Coming Soon

<Warning>
  This endpoint is not yet available. It is planned for a future release.
</Warning>

Dedicated GPU deployments with autoscaling, scale-to-zero, and per-deployment configuration. For enterprises that need guaranteed throughput, custom models, or data residency guarantees.

## Planned endpoints

| Operation         | Method | Path                                       |
| ----------------- | ------ | ------------------------------------------ |
| Create deployment | POST   | `/v1/accounts/{id}/deployments`            |
| Get deployment    | GET    | `/v1/accounts/{id}/deployments/{id}`       |
| List deployments  | GET    | `/v1/accounts/{id}/deployments`            |
| Update deployment | PATCH  | `/v1/accounts/{id}/deployments/{id}`       |
| Delete deployment | DELETE | `/v1/accounts/{id}/deployments/{id}`       |
| Scale deployment  | POST   | `/v1/accounts/{id}/deployments/{id}:scale` |

## Planned autoscaling options

| Flag                | Default | Description              |
| ------------------- | ------- | ------------------------ |
| `min_replica_count` | 0       | Scale to zero when idle  |
| `max_replica_count` | 1       | Maximum replicas         |
| `scale_up_window`   | 30s     | Wait before scaling up   |
| `scale_down_window` | 10m     | Wait before scaling down |

<Info>
  Serverless inference is the right choice for most developers. Deployments are for enterprises with guaranteed throughput requirements.
</Info>

## Related

* [Models](/docs/concepts/models) — Available models for deployment
* [Roadmap](/docs/support/roadmap) — feature timeline