Text to Speechindian-languages

Bulbul v3

by Sarvam AI · Released February 5, 2026

Sarvam AI's natural text-to-speech model. 39 voices across 11 Indian languages with production-ready quality. Supports SSML for fine-grained control over speed, pitch, pauses, and emphasis. Handles code-mixed text and number normalization out of the box.

Text to Speech

Bulbul v3

Powered by Sarvam AI · Proprietary TTS model

Context Window

N/A

Parameters

Undisclosed

Max Output

N/A

Category

Text to Speech

Overview

Bulbul v3, released February 5, 2026, is Sarvam AI's production-ready text-to-speech model offering 39 natural-sounding voices across 11 Indian languages. The voices are designed to sound natural and conversational rather than robotic, making them suitable for customer-facing applications like IVR systems, voice agents, and telephony platforms.

The model supports SSML (Speech Synthesis Markup Language) for fine-grained control over prosody — developers can adjust speed, pitch, volume, add pauses, and emphasize specific words. It handles code-mixed text natively, correctly pronouncing Hindi-English mixed sentences without requiring language tags. Number normalization, date formatting, and currency reading are handled automatically.

Bulbul v3 is production-ready for telephony and call center deployments, with consistent quality across all 39 voices and 11 languages. The voices cover a range of genders, ages, and regional accents, allowing applications to match the voice to their target audience. At $0.53 per 10K characters, it is competitively priced for high-volume TTS workloads.

Pricing

MetricPrice
Price /10K chars₹53.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • 39 natural voices across 11 Indian languages
  • SSML support for prosody, breaks, emphasis
  • Code-mixed text handling (Hinglish, etc.)
  • Production-ready for call centers and telephony

Benchmarks

BenchmarkScore
MOS Score4.2/5
Voices39
Languages11
SSML SupportFull

Technical Details

  • 39 natural-sounding voices across 11 Indian languages
  • SSML support: speed, pitch, volume, pauses, emphasis, phonemes
  • Native code-mixed text handling (Hinglish, Tanglish, etc.)
  • Automatic number normalization, date formatting, currency reading
  • Production-ready for telephony and call center deployments
  • Consistent quality across all voices and languages

Strengths

  • 39 natural voices — widest selection for Indian languages
  • Full SSML support for fine-grained prosody control
  • Native code-mixed text handling without language tags
  • Production-ready quality for telephony and call centers

Limitations

  • Limited to 11 Indian languages — no global language coverage
  • Voice cloning and custom voice creation not yet supported
  • Audio output quality may vary with very long text inputs

Use Cases

Voice agentsIVR systemsAudiobook generationAccessibility applications

API Example

curl https://api.callmissed.com/v1/audio/speech \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "bulbul:v3", "input": "Namaste, aapka order confirm ho gaya hai.", "voice": "meera"}' \
  --output speech.mp3

Endpoint: POST /v1/audio/speech · Model ID: bulbul:v3

Try Bulbul v3 now

Get 1000 free API credits on signup. No credit card required.