Speech-to-Text & Text-to-Speech - Forii — India's Sovereign Inference Platform

This endpoint is not yet available. It is planned for a future release.

India’s 800M+ voice-first internet users need STT and TTS in their language. Forii will offer both, powered by Indic-optimized models.

Speech-to-Text (STT)

POST https://api.forii.in/inference/v1/audio/transcriptions

OpenAI-compatible format. Supports 22+ Indic languages and 8kHz telephony audio.

Planned parameters

Parameter	Type	Description
`file`	file	Audio file (MP3, WAV, FLAC, OGG)
`model`	string	`forii/saarika-v2`
`language`	string	Language code (e.g. `hi`, `ta`, `bn`)
`response_format`	string	`json`, `verbose_json`, `text`, `srt`, `vtt`
`timestamp_granularities[]`	array	`word`, `segment`

Example (planned)

curl https://api.forii.in/inference/v1/audio/transcriptions \
  -H "Authorization: Bearer $FORII_API_KEY" \
  -F file="@recording_hindi.mp3" \
  -F model="forii/saarika-v2" \
  -F language="hi" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="word"

Text-to-Speech (TTS)

POST https://api.forii.in/inference/v1/audio/speech

Planned parameters

Parameter	Type	Description
`model`	string	`forii/bulbul-v2`
`input`	string	Text to synthesize
`voice`	string	Voice name (e.g. `amrita`, `pratik`)
`response_format`	string	`mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
`speed`	float	0.25 to 4.0

Example (planned)

response = client.audio.speech.create(
    model="forii/bulbul-v2",
    voice="amrita",
    input="नमस्ते, आपका खाता शेष रुपये पाँच हजार है।",
)
response.stream_to_file("output_hindi.mp3")

India differentiation

This is Forii’s biggest differentiator vs Fireworks and OpenAI. Neither offers production-grade STT/TTS for Indic languages.

Saarika v2: STT for 22+ Indian languages, 8kHz telephony audio optimization
Bulbul v2: TTS with 30+ voices across Hindi, Tamil, Telugu, Bengali, and more
Voice-first India: IVR systems, WhatsApp bots, accessibility features

Roadmap — feature timeline

​Speech-to-Text (STT)

​Planned parameters

​Example (planned)

​Text-to-Speech (TTS)

​Planned parameters

​Example (planned)

​India differentiation

​Related

Speech-to-Text (STT)

Planned parameters

Example (planned)

Text-to-Speech (TTS)

Planned parameters

Example (planned)

India differentiation

Related