A 30B MoE language model built on Mistral Small, post-trained on Indian languages alongside English with +20% improvement on Indian language benchmarks.

How much does Sarvam 30B cost?

Sarvam 30B costs $0.35/1M tokens for input and $0.35/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Sarvam 30B via API?

Send a POST request to POST /v1/chat/completions with model "sarvam-30b" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Sarvam 30B?

Sarvam 30B supports a 64K token context window with up to 8K output tokens.

सभी मॉडल पर वापस जाएं

LLM चैटindian-languagesrecommended

Sarvam 30B

द्वारा Sarvam AI · रिलीज़ May 2025

Mistral Small पर आधारित 30B MoE भाषा मॉडल, अंग्रेज़ी के साथ भारतीय भाषाओं पर post-trained, भारतीय भाषा benchmarks पर +20% सुधार के साथ।

LLM चैट

Sarvam 30B

द्वारा संचालित Sarvam AI · Dense Transformer (24B), post-trained on Mistral Small with SFT + RLVR

कॉन्टेक्स्ट विंडो

64K

पैरामीटर

24B (dense, based on Mistral Small)

अधिकतम आउटपुट

श्रेणी

LLM चैट

अवलोकन

Sarvam-M ("M" का अर्थ Mistral) Sarvam AI द्वारा Mistral Small 24B के ऊपर बनाया गया 24B-पैरामीटर hybrid reasoning मॉडल है, जो Apache 2.0 लाइसेंस के तहत उपलब्ध है। टीम ने मूल Mistral Small से vision encoder हटाकर text-only foundation तैयार की, फिर कड़े तीन-चरण post-training pipeline लागू किए: supervised fine-tuning (SFT), reinforcement learning with verifiable rewards (RLVR), और inference optimization। परिणाम: भारतीय भाषा benchmarks पर +20%, गणित पर +21.6%, प्रोग्रामिंग पर +17.6%, और romanized भारतीय भाषा GSM-8K पर base Mistral Small की तुलना में उल्लेखनीय +86% — अधिकांश benchmarks पर Llama-4 Scout से आगे और कहीं बड़े Llama-3.3 70B के बराबर।

SFT चरण की शुरुआत अंग्रेज़ी और भारतीय भाषाओं में इकट्ठे 11.5 मिलियन prompts से हुई। deduplication के बाद 7 मिलियन बचे, जिन्हें और filter कर 5.2 मिलियन अंग्रेज़ी prompts रखे गए। प्रत्येक prompt की quality और hardness Llama 3.3 70B से classify की गई, फिर gte-Qwen2-7B embeddings और FAISS के 100,000 clusters से semantic deduplication। अंतिम curated training set में 3.7 मिलियन high-quality samples थे। Completions को custom "real-value scorer" — fine-tuned Llama 3.3 70B — ने 0-9 digits पर probability-weighted scoring से score किया, 11 समर्थित भारतीय भाषाओं में 85%+ accuracy। Deepseek R1 ने सर्वोच्च Indic completions दिए, औसत score 9 में से 8+।

Character training जानबूझकर focus रहा। लगभग 0.5% completions political bias के लिए flag होकर Perplexity R1 1776 से regenerate किए गए। अतिरिक्त 5% भारतीय cultural relevance के लिए regenerate — local context, idioms और values वाले responses। SFT दो चरणों में: 2 epochs non-think mode, फिर 2 epochs think mode; दोनों के बीच Slerp model merging।

RLVR चरण में GRPO algorithm छह task curricula पर: multilingual GSM8K, MATH, Big Math, Extended IFEval, Code Understanding via Synthetic-1, Code Generation via PrimeIntellect। सातवां curriculum chrF++ reward signal से translation quality। Prompt sampling ~20% pass-through rate target करता है। Code tasks partial rewards — passed test cases का fraction + full completion bonus; translation relative reward chrF++ vs baseline। Learning rates अधिकांश tasks पर 3e-7, कठिन reasoning पर 2e-7।

Inference optimization production deployment के लिए महत्वपूर्ण। FP8 quantization TensorRT-LLM के ज़रिए; calibration dataset का चुनाव quantized quality पर असर डालता है। Lookahead decoding ~2x throughput। दो deployment configs: high-concurrency ~100 tokens/sec, low-concurrency latency-sensitive ~300 tokens/sec।

मॉडल 11 प्रमुख भारतीय भाषाएँ समर्थित — Hindi (training data का 28%), Bengali, Gujarati, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu (प्रत्येक 8%) — तीन रूपों में: formal native script, code-mixed (जैसे Hinglish, Tanglish), romanized transliteration। Indic Vibe Check पर Sarvam-M सभी 11 भाषाओं में औसत 8.12/9, Llama 4 Scout 7.58, Llama 3.3 70B 6.93। Wikipedia RAG से SimpleQA accuracy 5% से 72% — OpenAI o3 (49%) से भी आगे।

टीम ने failed experiments भी दस्तावेज़ किए: tokenizer extension से knowledge drop, tokenizer transplant SFT alone से बेहतर नहीं, LLM-based rewards वाला RL non-deterministic। Benchmark table में Sarvam-M Mistral Small, Gemma 3 27B, Llama 4 Scout, Llama 3.3 70B के साथ competitive या बेहतर। think और non-think modes — think chain-of-thought, non-think fast direct responses।

$0.35 प्रति million tokens input और output दोनों पर, Sarvam-M सबसे सस्ते frontier-class मॉडलों में — government deployments, vernacular education, regional content, multilingual customer support के लिए सुलभ।

प्राइसिंग

मेट्रिक	कीमत
इनपुट /1M tokens	₹35.0000
आउटपुट /1M tokens	₹35.0000

1 क्रेडिट = ₹1 = $0.01 USD। कीमतें प्रोवाइडर से दिखाई गई हैं; CallMissed ~35% मार्कअप के साथ पास-थ्रू करता है।

मुख्य बातें

11 प्रमुख भारतीय भाषाएँ, native script और romanized सपोर्ट के साथ
romanized भारतीय भाषा गणित benchmarks (GSM-8K) पर +86% सुधार
Hybrid reasoning: chain-of-thought के लिए "think" mode, तेज़ responses के लिए "non-think"
छोटा होते हुए भी अधिकांश benchmarks पर Llama-4 Scout से आगे
भारतीय सांस्कृतिक मूल्यों को दर्शाने के लिए character-trained
TensorRT-LLM के ज़रिए H100 deployment के लिए FP8 quantized

बेंचमार्क

बेंचमार्क	स्कोर	नोट्स
MMLU	0.87	सामान्य ज्ञान
MMLU-IN	0.79	भारतीय भाषा ज्ञान (+23% vs base)
MMLU-IN-R	0.66	Romanized Indian (+35% vs base)
HumanEval	0.88	कोड जनरेशन
GSM-8K	0.94	गणित तर्क
GSM-8K-IN-R	0.82	Romanized Indian math (+86% vs base)
LiveCodeBench	0.44	Competitive programming (+91% vs base)
MTBench	8.14	Conversation quality
AlpacaEval	60.92	Instruction following (+21% vs base)

तकनीकी विवरण

Base model: Mistral Small (24B, Apache 2.0)
Training: SFT → RLVR (GRPO algorithm) → Inference optimization
SFT data: quality/hardness scoring, clustering और sampling वाले curated prompts
RLVR: instruction following, math और programming datasets पर curriculum
Quantization: TensorRT-LLM के ज़रिए FP8, negligible accuracy loss
Inference: H100 पर throughput gains के लिए Lookahead decoding
Languages: Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, Assamese

ताकतें

इस parameter count पर सर्वश्रेष्ठ भारतीय भाषा प्रदर्शन
code-mixed text (Hinglish, Tanglish) मूल रूप से संभालता है
भारतीय context पर trained culturally aware responses
$0.35/1M tokens पर अत्यंत सस्ता
लचीले reasoning के लिए hybrid think/non-think modes

सीमाएं

base model की तुलना में अंग्रेज़ी knowledge benchmarks (MMLU) पर मामूली गिरावट (~1%)
64K context window frontier models से छोटा
मुख्यतः 11 भारतीय भाषाओं के लिए अनुकूलित — multilingual models से कम coverage

उपयोग के मामले

भारतीय भाषा chatbotsबहुभाषी customer supportcode-mixed conversationsक्षेत्रीय content generationसरकार और public sector AIvernacular education platforms

API उदाहरण

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sarvam-30b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant fluent in Indian languages."},
      {"role": "user", "content": "Mujhe quantum computing ke baare mein Hindi mein samjhao"}
    ],
    "temperature": 0.7
  }'

एंडपॉइंट: POST /v1/chat/completions · मॉडल ID: sarvam-30b

Sarvam 30B अभी आज़माएं

साइनअप पर 1000 फ्री API क्रेडिट पाएं। कोई क्रेडिट कार्ड ज़रूरी नहीं।

फ्री शुरू करें डॉक्स पढ़ें