Bulbul v3
by Sarvam AI · Released February 5, 2026
Sarvam AI's natural text-to-speech model. 39 voices across 11 Indian languages with production-ready quality. Supports SSML for fine-grained control over speed, pitch, pauses, and emphasis. Handles code-mixed text and number normalization out of the box.
Bulbul v3
Powered by Sarvam AI · Proprietary TTS model
Context Window
N/A
Parameters
Undisclosed
Max Output
N/A
Category
Text to Speech
Overview
Bulbul v3, released February 5, 2026, is Sarvam AI's production-ready text-to-speech model offering 39 natural-sounding voices across 11 Indian languages. The voices are designed to sound natural and conversational rather than robotic, making them suitable for customer-facing applications like IVR systems, voice agents, and telephony platforms.
The model supports SSML (Speech Synthesis Markup Language) for fine-grained control over prosody — developers can adjust speed, pitch, volume, add pauses, and emphasize specific words. It handles code-mixed text natively, correctly pronouncing Hindi-English mixed sentences without requiring language tags. Number normalization, date formatting, and currency reading are handled automatically.
Bulbul v3 is production-ready for telephony and call center deployments, with consistent quality across all 39 voices and 11 languages. The voices cover a range of genders, ages, and regional accents, allowing applications to match the voice to their target audience. At $0.53 per 10K characters, it is competitively priced for high-volume TTS workloads.
Pricing
| Metric | Price |
|---|---|
| Price /10K chars | ₹53.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- 39 natural voices across 11 Indian languages
- SSML support for prosody, breaks, emphasis
- Code-mixed text handling (Hinglish, etc.)
- Production-ready for call centers and telephony
Benchmarks
| Benchmark | Score |
|---|---|
| MOS Score | 4.2/5 |
| Voices | 39 |
| Languages | 11 |
| SSML Support | Full |
Technical Details
- 39 natural-sounding voices across 11 Indian languages
- SSML support: speed, pitch, volume, pauses, emphasis, phonemes
- Native code-mixed text handling (Hinglish, Tanglish, etc.)
- Automatic number normalization, date formatting, currency reading
- Production-ready for telephony and call center deployments
- Consistent quality across all voices and languages
Strengths
- 39 natural voices — widest selection for Indian languages
- Full SSML support for fine-grained prosody control
- Native code-mixed text handling without language tags
- Production-ready quality for telephony and call centers
Limitations
- Limited to 11 Indian languages — no global language coverage
- Voice cloning and custom voice creation not yet supported
- Audio output quality may vary with very long text inputs
Use Cases
API Example
curl https://api.callmissed.com/v1/audio/speech \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "bulbul:v3", "input": "Namaste, aapka order confirm ho gaya hai.", "voice": "meera"}' \
--output speech.mp3Endpoint: POST /v1/audio/speech · Model ID: bulbul:v3