Sarvam AI's flagship 105B MoE model with 128K context. The largest Indian-language-optimized LLM, offering superior reasoning and generation quality across 11 Indian languages while maintaining strong English performance.

How much does Sarvam 105B cost?

Sarvam 105B costs $0.35/1M tokens for input and $0.35/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Sarvam 105B via API?

Send a POST request to POST /v1/chat/completions with model "sarvam-105b" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Sarvam 105B?

Sarvam 105B supports a 128K token context window with up to 8K output tokens.

सभी मॉडल पर वापस जाएं

LLM चैटindian-languagesflagship

Sarvam 105B

द्वारा Sarvam AI · रिलीज़ 2025

Sarvam AI का फ़्लैगशिप 105B MoE मॉडल, 128K context के साथ। सबसे बड़ा Indian-language-optimized LLM, 11 भारतीय भाषाओं में superior reasoning और generation quality, मज़बूत अंग्रेज़ी प्रदर्शन के साथ।

LLM चैट

Sarvam 105B

द्वारा संचालित Sarvam AI · Mixture-of-Experts (MoE)

कॉन्टेक्स्ट विंडो

128K

पैरामीटर

105B (MoE)

अधिकतम आउटपुट

श्रेणी

LLM चैट

अवलोकन

Sarvam 105B Sarvam AI lineup का फ़्लैगशिप मॉडल है, Sarvam-M (30B) पर सिद्ध post-training pipeline को 105-billion-parameter Mixture-of-Experts architecture तक scale करता है। 128K context के साथ यह पूरे legal documents, financial reports और codebases एक बार में ingest कर सकता है — native script और romanized दोनों में 11 प्रमुख भाषाओं पर best-in-class Indian language understanding बनाए रखते हुए।

Training methodology छोटे sibling जैसी तीन-चरण pipeline: quality-scored, culturally curated prompts पर supervised fine-tuning; instruction-following, math और programming curricula पर GRPO algorithm वाला RLVR; और quantization के साथ inference optimization। MoE architecture प्रति token केवल experts का subset activate करती है, बड़े total parameter count के बावजूद inference costs manageable रखती है।

सबसे बड़े Indian-language-optimized LLM के रूप में Sarvam 105B enterprise use cases — complex document analysis, regional languages में long-form content generation, accuracy और cultural sensitivity महत्वपूर्ण government/public-sector AI deployments — के लिए सर्वोच्च quality outputs देता है। मॉडल DPDP Act-aligned infrastructure पर India data residency के साथ deployed है।

प्राइसिंग

मेट्रिक	कीमत
इनपुट /1M tokens	₹35.0000
आउटपुट /1M tokens	₹35.0000

1 क्रेडिट = ₹1 = $0.01 USD। कीमतें प्रोवाइडर से दिखाई गई हैं; CallMissed ~35% मार्कअप के साथ पास-थ्रू करता है।

मुख्य बातें

फ़्लैगशिप मॉडल — भारतीय भाषाओं के लिए सर्वश्रेष्ठ quality
लंबे दस्तावेज़ों के लिए 128K कॉन्टेक्स्ट
Hindi, Tamil, Telugu, Bengali आदि में superior reasoning
India data residency के साथ DPDP Act-aligned infrastructure

बेंचमार्क

बेंचमार्क	स्कोर	नोट्स
MMLU	0.89	सामान्य ज्ञान (30B से बेहतर)
MMLU-IN	0.83	Indian language knowledge
MMLU-IN-R	0.71	Romanized Indian language knowledge
HumanEval	0.90	कोड जनरेशन
GSM-8K	0.96	गणित तर्क
GSM-8K-IN-R	0.87	Romanized Indian math
MTBench	8.45	Conversation quality
AlpacaEval	65.3	निर्देश पालन

तकनीकी विवरण

Architecture: 105B total parameters वाला Mixture-of-Experts (MoE)
Training pipeline: SFT → RLVR (GRPO) → Inference optimization (Sarvam-M जैसा)
Context window: long-document processing के लिए 128K tokens
Languages: Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, Assamese + English
सभी 11 भारतीय भाषाओं के लिए native script और romanized input सपोर्ट
MoE routing efficient inference के लिए प्रति token experts का subset activate करती है
India data residency के साथ DPDP Act-aligned infrastructure पर deployed

ताकतें

उपलब्ध सर्वोच्च quality Indian language model — 11 भाषाओं में best reasoning और generation
superior accuracy के साथ code-mixed text (Hinglish, Tanglish) मूल रूप से संभालता है
128K context legal, financial और government use cases के लिए full-document analysis सक्षम
काफ़ी अधिक capability के बावजूद 30B variant जितनी affordable pricing
DPDP Act alignment और India data residency के साथ enterprise-grade infrastructure

सीमाएं

MoE architecture deployment पर similar active parameter count वाले dense models की तुलना में अधिक memory चाहती है
मुख्यतः 11 भारतीय भाषाओं के लिए अनुकूलित — 100+ भाषाएँ सपोर्ट करने वाले models से कम coverage
बड़े model size के कारण छोटे Sarvam 30B से अधिक latency

उपयोग के मामले

Enterprise Indian language AIComplex document analysisभारतीय भाषाओं में long-form contentसरकार और public sector AI

API उदाहरण

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "sarvam-105b", "messages": [{"role": "user", "content": "Explain quantum computing in Hindi"}]}'

एंडपॉइंट: POST /v1/chat/completions · मॉडल ID: sarvam-105b

Sarvam 105B अभी आज़माएं

साइनअप पर 1000 फ्री API क्रेडिट पाएं। कोई क्रेडिट कार्ड ज़रूरी नहीं।

फ्री शुरू करें डॉक्स पढ़ें