LLM Chatindian-languagesrecommended

Sarvam 30B

by Sarvam AI · Released May 2025

A 30B MoE language model built on Mistral Small, post-trained on Indian languages alongside English with +20% improvement on Indian language benchmarks.

LLM Chat

Sarvam 30B

Powered by Sarvam AI · Dense Transformer (24B), post-trained on Mistral Small with SFT + RLVR

Context Window

64K

Parameters

24B (dense, based on Mistral Small)

Max Output

8K

Category

LLM Chat

Overview

Sarvam-M (the "M" stands for Mistral) is a 24B-parameter hybrid reasoning model built by Sarvam AI on top of Mistral Small 24B, which ships under the Apache 2.0 license. The team stripped the vision encoder from the original Mistral Small to create a text-only foundation, then applied a rigorous three-stage post-training pipeline: supervised fine-tuning (SFT), reinforcement learning with verifiable rewards (RLVR), and inference optimization. The result is a model that delivers +20% on Indian language benchmarks, +21.6% on math, +17.6% on programming, and a remarkable +86% on romanized Indian language GSM-8K over the base Mistral Small, outperforming Llama-4 Scout on most benchmarks and rivaling the much larger Llama-3.3 70B.

The SFT stage began with 11.5 million prompts collected across English and Indian languages. After deduplication, 7 million remained, which were further filtered to 5.2 million English prompts. Each prompt was classified for quality and hardness by Llama 3.3 70B, then embedded with gte-Qwen2-7B and clustered into 100,000 clusters via FAISS for semantic deduplication within each cluster. The final curated training set comprised 3.7 million high-quality samples. Prompt completions were scored by a custom "real-value scorer" — a fine-tuned Llama 3.3 70B that uses probability-weighted scoring across digits 0-9, achieving 85%+ accuracy across all 11 supported Indian languages. Deepseek R1 generated the highest-quality Indic completions, averaging a score of 8+ out of 9.

Character training was a deliberate focus. Approximately 0.5% of completions were flagged for political bias and regenerated using Perplexity R1 1776 to remove slant. An additional 5% were regenerated specifically for Indian cultural relevance, producing responses that reflect local context, idioms, and values. SFT itself was conducted in two phases: 2 epochs of non-think mode training followed by 2 epochs of think mode training, with Slerp model merging applied between the two phases to blend the capabilities smoothly.

The RLVR stage used the GRPO algorithm across six distinct task curricula: multilingual GSM8K (math reasoning across languages), MATH (competition-level math), Big Math (extended math corpus), Extended IFEval (instruction following), Code Understanding via Synthetic-1, and Code Generation via PrimeIntellect. A seventh curriculum targeted translation quality using chrF++ as the reward signal. Prompt sampling was calibrated to target approximately a 20% pass-through rate to keep the learning signal informative. Code tasks used partial rewards — the fraction of test cases passed plus a bonus for full completion — while translation used a relative reward comparing chrF++ scores against a baseline. Learning rates were set at 3e-7 for most tasks and reduced to 2e-7 for harder reasoning tasks.

Inference optimization was critical for production deployment. The team applied FP8 quantization via TensorRT-LLM, noting that the choice of calibration dataset significantly impacts quantized model quality. Lookahead decoding was implemented for approximately 2x throughput improvement. Two deployment configurations are offered: a high-concurrency setup delivering around 100 tokens per second, and a low-concurrency setup achieving approximately 300 tokens per second for latency-sensitive applications.

The model supports 11 major Indian languages — Hindi (28% of training data), Bengali, Gujarati, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu (8% each) — in three forms: formal native script, code-mixed (e.g., Hinglish, Tanglish), and romanized transliteration. On the Indic Vibe Check benchmark, Sarvam-M averages 8.12 out of 9 across all 11 languages, compared to 7.58 for Llama 4 Scout and 6.93 for Llama 3.3 70B. When augmented with Wikipedia RAG, SimpleQA accuracy jumps from 5% to 72% correct — outperforming even OpenAI o3 at 49%.

The team also documented several failed experiments for transparency: tokenizer extension caused a knowledge drop, tokenizer transplant did not outperform SFT alone, and RL with LLM-based rewards proved non-deterministic and unreliable. The full benchmark table shows Sarvam-M competitive with or exceeding Mistral Small, Gemma 3 27B, Llama 4 Scout, and Llama 3.3 70B across English and Indian language tasks. Both think and non-think modes are supported — think mode enables chain-of-thought reasoning while non-think mode provides fast, direct responses.

At $0.35 per million tokens for both input and output, Sarvam-M is among the most affordable frontier-class models available, making it accessible for government deployments, vernacular education platforms, regional content generation, and multilingual customer support across the Indian subcontinent.

Pricing

MetricPrice
Input /1M tokens₹35.0000
Output /1M tokens₹35.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • 11 major Indian languages with native script and romanized support
  • +86% improvement on romanized Indian language math benchmarks (GSM-8K)
  • Hybrid reasoning: "think" mode for chain-of-thought, "non-think" for fast responses
  • Outperforms Llama-4 Scout on most benchmarks despite being smaller
  • Character-trained to reflect Indian cultural values
  • FP8 quantized for efficient H100 deployment via TensorRT-LLM

Benchmarks

BenchmarkScore
MMLU0.87
MMLU-IN0.79
MMLU-IN-R0.66
HumanEval0.88
GSM-8K0.94
GSM-8K-IN-R0.82
LiveCodeBench0.44
MTBench8.14
AlpacaEval60.92

Technical Details

  • Base model: Mistral Small (24B, Apache 2.0)
  • Training: SFT → RLVR (GRPO algorithm) → Inference optimization
  • SFT data: Curated prompts with quality/hardness scoring, clustering, and sampling
  • RLVR: Curriculum across instruction following, math, and programming datasets
  • Quantization: FP8 via TensorRT-LLM with negligible accuracy loss
  • Inference: Lookahead decoding for throughput gains on H100
  • Languages: Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, Assamese

Strengths

  • Best-in-class Indian language performance at this parameter count
  • Handles code-mixed text (Hinglish, Tanglish) natively
  • Culturally aware responses trained on Indian context
  • Extremely affordable at $0.35/1M tokens
  • Hybrid think/non-think modes for flexible reasoning

Limitations

  • Slight drop (~1%) on English knowledge benchmarks (MMLU) vs base model
  • 64K context window is smaller than frontier models
  • Primarily optimized for 11 Indian languages — less coverage than multilingual models

Use Cases

Indian language chatbotsMultilingual customer supportCode-mixed conversationsRegional content generationGovernment and public sector AIVernacular education platforms

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sarvam-30b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant fluent in Indian languages."},
      {"role": "user", "content": "Mujhe quantum computing ke baare mein Hindi mein samjhao"}
    ],
    "temperature": 0.7
  }'

Endpoint: POST /v1/chat/completions · Model ID: sarvam-30b

Try Sarvam 30B now

Get 1000 free API credits on signup. No credit card required.