> ## Documentation Index
> Fetch the complete documentation index at: https://docs.forii.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Vision

> Multimodal chat with image understanding — Coming Soon

<Warning>
  This endpoint is not yet available. It is planned for a future release.
</Warning>

Understand images alongside text — document digitization, ID card reading, invoice OCR. Powered by models like Qwen-VL.

## Planned usage

```python theme={null}
response = client.chat.completions.create(
    model="forii/qwen2.5-vl-72b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all details from this Aadhaar card"},
            {"type": "image_url", "image_url": {
                "url": "data:image/jpeg;base64,/9j/4AAQ...",
                "detail": "high"
            }}
        ]
    }],
)
```

## India use cases

* **Aadhaar / PAN card extraction** — Parse ID documents into structured data
* **GST invoice parsing** — Extract line items, totals, GSTIN from invoices
* **Handwritten form digitization** — Convert Hindi handwritten forms to structured JSON
* **Screenshot understanding** — Debug UI issues from screenshots

## Related

* [Chat Completions](/docs/api-reference/chat-completions) — Text-only inference
* [Structured Outputs](/docs/api-reference/chat-completions#structured-outputs) — Guaranteed JSON from vision responses