Speech to Textindian-languages

Saaras v3

by Sarvam AI · Released 2025

Sarvam AI's flagship speech-to-text model. Industry-leading accuracy for 22 Indian languages plus English. Handles code-mixed speech (e.g. switching between Hindi and English mid-sentence) natively. Supports real-time streaming via WebSocket and batch transcription via REST.

Speech to Text

Saaras v3

Powered by Sarvam AI · Proprietary ASR model

Context Window

N/A

Parameters

Undisclosed

Max Output

N/A

Category

Speech to Text

Overview

Saaras v3 is Sarvam AI's flagship speech-to-text model, delivering industry-leading accuracy for 22 Indian languages plus English. It is specifically designed to handle the linguistic complexity of India — where speakers routinely switch between languages mid-sentence (code-mixing), use regional accents, and speak in noisy environments like call centers and public spaces.

The model supports two deployment modes: real-time streaming via WebSocket for live transcription (voice agents, live captioning, meeting transcription) and batch transcription via REST API for processing recorded audio files. Both modes deliver high accuracy across all 22 supported Indian languages, with particularly strong performance on code-mixed speech like Hinglish (Hindi-English) and Tanglish (Tamil-English).

Saaras v3 is production-ready for enterprise deployments, with robust handling of telephony audio quality, background noise, and multiple speakers. It is the go-to choice for Indian market applications that need accurate, real-time speech recognition across the country's diverse linguistic landscape.

Pricing

MetricPrice
Price /hour₹53.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • 22 Indian languages + English
  • Native code-mixed speech handling (Hinglish, etc.)
  • Real-time streaming via WebSocket
  • Batch transcription via REST API

Benchmarks

BenchmarkScore
Hindi WER<8%
Code-Mixed WER<12%
English WER<6%
Languages23

Technical Details

  • Supports 22 Indian languages + English with native code-mixed handling
  • Real-time streaming via WebSocket for live transcription
  • Batch transcription via REST API for recorded audio
  • Handles telephony audio quality, background noise, and multiple speakers
  • Optimized for Indian accents and regional pronunciation variations
  • Production-ready for call center and enterprise deployments

Strengths

  • Industry-leading accuracy for 22 Indian languages
  • Native code-mixed speech handling — unique capability for Indian market
  • Real-time WebSocket streaming for live applications
  • Robust handling of telephony audio and noisy environments

Limitations

  • Focused on Indian languages — not a general-purpose multilingual STT
  • Accuracy may vary across less common Indian languages
  • WebSocket streaming requires persistent connection management

Use Cases

Call center transcriptionVoice agent backendsMeeting transcriptionMultilingual dictation

API Example

curl https://api.callmissed.com/v1/audio/transcriptions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -F file=@audio.wav \
  -F model=saaras:v3 \
  -F language=hi

Endpoint: POST /v1/audio/transcriptions · Model ID: saaras:v3

Try Saaras v3 now

Get 1000 free API credits on signup. No credit card required.