> ## Documentation Index
> Fetch the complete documentation index at: https://docs.forii.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech-to-Text & Text-to-Speech

> STT and TTS endpoints for Indic languages — Coming Soon

<Warning>
  This endpoint is not yet available. It is planned for a future release.
</Warning>

India's 800M+ voice-first internet users need STT and TTS in their language. Forii will offer both, powered by Indic-optimized models.

## Speech-to-Text (STT)

```
POST https://api.forii.in/inference/v1/audio/transcriptions
```

OpenAI-compatible format. Supports 22+ Indic languages and 8kHz telephony audio.

### Planned parameters

| Parameter                   | Type   | Description                                  |
| --------------------------- | ------ | -------------------------------------------- |
| `file`                      | file   | Audio file (MP3, WAV, FLAC, OGG)             |
| `model`                     | string | `forii/saarika-v2`                           |
| `language`                  | string | Language code (e.g. `hi`, `ta`, `bn`)        |
| `response_format`           | string | `json`, `verbose_json`, `text`, `srt`, `vtt` |
| `timestamp_granularities[]` | array  | `word`, `segment`                            |

### Example (planned)

```bash theme={null}
curl https://api.forii.in/inference/v1/audio/transcriptions \
  -H "Authorization: Bearer $FORII_API_KEY" \
  -F file="@recording_hindi.mp3" \
  -F model="forii/saarika-v2" \
  -F language="hi" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="word"
```

## Text-to-Speech (TTS)

```
POST https://api.forii.in/inference/v1/audio/speech
```

### Planned parameters

| Parameter         | Type   | Description                                |
| ----------------- | ------ | ------------------------------------------ |
| `model`           | string | `forii/bulbul-v2`                          |
| `input`           | string | Text to synthesize                         |
| `voice`           | string | Voice name (e.g. `amrita`, `pratik`)       |
| `response_format` | string | `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` |
| `speed`           | float  | 0.25 to 4.0                                |

### Example (planned)

```python theme={null}
response = client.audio.speech.create(
    model="forii/bulbul-v2",
    voice="amrita",
    input="नमस्ते, आपका खाता शेष रुपये पाँच हजार है।",
)
response.stream_to_file("output_hindi.mp3")
```

## India differentiation

This is Forii's biggest differentiator vs Fireworks and OpenAI. Neither offers production-grade STT/TTS for Indic languages.

* **Saarika v2**: STT for 22+ Indian languages, 8kHz telephony audio optimization
* **Bulbul v2**: TTS with 30+ voices across Hindi, Tamil, Telugu, Bengali, and more
* **Voice-first India**: IVR systems, WhatsApp bots, accessibility features

## Related

* [Roadmap](/docs/support/roadmap) — feature timeline
