Saaras v3
by Sarvam AI · Released 2025
Sarvam AI's flagship speech-to-text model. Industry-leading accuracy for 22 Indian languages plus English. Handles code-mixed speech (e.g. switching between Hindi and English mid-sentence) natively. Supports real-time streaming via WebSocket and batch transcription via REST.
Saaras v3
Powered by Sarvam AI · Proprietary ASR model
Context Window
N/A
Parameters
Undisclosed
Max Output
N/A
Category
Speech to Text
Overview
Saaras v3 is Sarvam AI's flagship speech-to-text model, delivering industry-leading accuracy for 22 Indian languages plus English. It is specifically designed to handle the linguistic complexity of India — where speakers routinely switch between languages mid-sentence (code-mixing), use regional accents, and speak in noisy environments like call centers and public spaces.
The model supports two deployment modes: real-time streaming via WebSocket for live transcription (voice agents, live captioning, meeting transcription) and batch transcription via REST API for processing recorded audio files. Both modes deliver high accuracy across all 22 supported Indian languages, with particularly strong performance on code-mixed speech like Hinglish (Hindi-English) and Tanglish (Tamil-English).
Saaras v3 is production-ready for enterprise deployments, with robust handling of telephony audio quality, background noise, and multiple speakers. It is the go-to choice for Indian market applications that need accurate, real-time speech recognition across the country's diverse linguistic landscape.
Pricing
| Metric | Price |
|---|---|
| Price /hour | ₹53.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- 22 Indian languages + English
- Native code-mixed speech handling (Hinglish, etc.)
- Real-time streaming via WebSocket
- Batch transcription via REST API
Benchmarks
| Benchmark | Score |
|---|---|
| Hindi WER | <8% |
| Code-Mixed WER | <12% |
| English WER | <6% |
| Languages | 23 |
Technical Details
- Supports 22 Indian languages + English with native code-mixed handling
- Real-time streaming via WebSocket for live transcription
- Batch transcription via REST API for recorded audio
- Handles telephony audio quality, background noise, and multiple speakers
- Optimized for Indian accents and regional pronunciation variations
- Production-ready for call center and enterprise deployments
Strengths
- Industry-leading accuracy for 22 Indian languages
- Native code-mixed speech handling — unique capability for Indian market
- Real-time WebSocket streaming for live applications
- Robust handling of telephony audio and noisy environments
Limitations
- Focused on Indian languages — not a general-purpose multilingual STT
- Accuracy may vary across less common Indian languages
- WebSocket streaming requires persistent connection management
Use Cases
API Example
curl https://api.callmissed.com/v1/audio/transcriptions \ -H "Authorization: Bearer cm_YOUR_KEY" \ -F file=@audio.wav \ -F model=saaras:v3 \ -F language=hi
Endpoint: POST /v1/audio/transcriptions · Model ID: saaras:v3