India's Public AI Stack: BharatGPT, Sarvam, Krutrim

CallMissedMay 8, 2026

·6 min readArticle

India AI Sovereign AI Foundation Models Public Sector Indic NLP

India's "public AI stack" in 2026 is no longer a slogan. There is now a real budget line, a real compute pool, and a real catalog of foundation models that were trained from scratch on Indian languages. The shape it has taken — public-private, mission-funded, ecosystem-led — is meaningfully different from how the US, EU, or China have approached the same problem.

The IndiaAI Mission, in one paragraph

Approved in 2024 and operational from 2025 onward, the IndiaAI Mission is a ₹10,372 crore (roughly US$1.25B) multi-year program with seven pillars: compute capacity, foundation models, datasets, applications, talent, startup financing, and safe/trusted AI (Indian government program summary, 2026). The thing builders care about most is the foundation-model pillar and the GPU pool that backs it.

The Ministry of Electronics and Information Technology (MeitY) selected an initial cohort of startups to build indigenous foundation models with subsidized compute access. Sarvam AI, SoketAI, Gan AI, and Gnani AI were the founding selections in early 2025, with Sarvam announced first in April 2025 (Wikipedia, "Sarvam AI"). Additional cohorts have been added since.

Sarvam AI

Sarvam, headquartered in Bengaluru, is the highest-profile of the IndiaAI cohort. In February 2026 it announced two foundational models trained from scratch with Indic-first datasets:

Sarvam-30B — a 30B-parameter dense model.

Sarvam-105B — a 105B-parameter Mixture-of-Experts model targeting advanced reasoning and multilingual tasks (per Sarvam's announcement, summarized on Wikipedia).

Sarvam also ships smaller production-friendly models (the Sarvam-1 / Sarvam-2 family) plus speech models for Indian-language STT and TTS. Their distinguishing bet is treating Indic languages as first-class citizens during pre-training, rather than as a fine-tune layer on top of a primarily-English base. [Inference: this is consistent with their public statements but the exact training-data ratios are not all public.]

Krutrim

Krutrim, founded by Bhavish Aggarwal (also of Ola), launched in 2023 and operates separately from the IndiaAI Mission cohort. The flagship Krutrim model handles 22 Indian languages and was trained on 2 trillion+ tokens (Rest of World, 2026). Krutrim has emphasized building the underlying infrastructure stack — compute, cloud, model serving — alongside the model itself, and has positioned itself as a vertically integrated alternative.

BharatGPT and BharatGen

BharatGPT is the conversational-AI product line developed primarily by CoRover (the AI assistant vendor behind IRCTC's "AskDisha"). By the IRCTC deployment, BharatGPT is already handling millions of queries per month in Hindi and 11+ other Indian languages, which is one of the largest production Indic NLP workloads in the country (Organiser, 2026). [Unverified at the exact volume — public claims; not externally audited.]

BharatGen is the parallel academic-led foundation-model initiative anchored at IIT Bombay and TIH-Foundation for IoT and IoE, focused on building large multimodal models for Indian languages with public datasets and open-research norms.

Bhashini

Bhashini is the Government of India's National Language Translation Mission — a public dataset, model, and API platform for translation, ASR, and TTS across Indian languages. It is the layer most non-AI software companies actually integrate, because the API is free at modest tiers and the datasets are publicly available. Bhashini sits below the foundation-model layer and above the application layer, and a lot of the indigenous AI activity in 2026 is some combination of "fine-tune a foundation model on Bhashini-derived data and ship a vertical product."

Compute: the IndiaAI Compute Portal

The compute pillar has been the most operationally important piece of the mission. The IndiaAI Compute Portal aggregates GPU capacity from empanelled cloud providers, and selected startups, researchers, and government projects can apply for subsidized hours. The targeted scale is in the 10,000+ GPU range as of 2026, dominated by H100 and H200-class hardware. [Unverified — the portal's published capacity has been moving up through 2026; check the official IndiaAI announcements for current numbers.]

What the stack actually does well

Indic-language coverage. Models trained on Indic data from scratch handle code-mixed Hindi/English, Tamil, Bengali, Marathi, and Telugu meaningfully better than retrofitted English-base models. This is the strongest comparative advantage of the indigenous stack.

Government-scale deployments. IRCTC, MyGov, and several state-government chatbots are now running on the indigenous stack. The volume is real even if the technical novelty is modest.

Lower-cost inference. Frugal-AI engineering — quantization, distillation, lightweight architectures — is a recurring theme. The narrative is "good enough at a fraction of the cost," which fits the Indian market's affordability constraint.

What it has not yet matched

Two honest gaps:

Frontier reasoning. None of the indigenous models match GPT-4-class or Claude-class reasoning on hard benchmarks in 2026. [Inference] The closest competitors are still the global labs.

Developer ecosystem. The number of third-party tutorials, fine-tunes, agent integrations, and downstream community projects around Sarvam or Krutrim is still small relative to Llama or Qwen. This is a chicken-and-egg problem that the IndiaAI Mission is explicitly trying to solve via the talent and applications pillars.

What this means for builders

If you are building for an Indian-market user base in 2026:

Test the indigenous models on your actual workload. Quality on code-mixed and low-resource Indic prompts is often a strict improvement.

Use Bhashini for non-AI-core translation tasks. It is free, sovereign, and "good enough" for a large class of use cases.

Stack pragmatically. A common production shape is "Sarvam or Krutrim for Indic-heavy turns, a frontier global model for English reasoning, route based on language detection." That is allowed and increasingly common.

The shorter version: India's public AI stack in 2026 is not theoretical anymore. It is a real set of models, a real compute pool, and a real set of customers. The honest thing to say is that the global frontier is still elsewhere, but the gap on Indic-language production workloads has closed materially, and the funding ramp suggests it will keep closing.

Frequently Asked Questions

What is the IndiaAI Mission's budget?

₹10,372 crore (approximately US$1.25 billion) over the multi-year mission period, covering compute, foundation models, datasets, applications, talent, startup financing, and safe/trusted AI (source).

How does Sarvam-105B differ from Sarvam-30B?

Sarvam-105B uses a Mixture-of-Experts architecture aimed at higher reasoning quality, while Sarvam-30B is a dense model. Both were announced in February 2026 and were trained from scratch with Indic-first datasets.

Can I use Krutrim or Sarvam without being in India?

Yes. Both expose API access internationally where regulatory rules allow, though their primary product-market fit is Indian-language and Indian-market workloads. Capability on English-only frontier benchmarks is generally below the largest US frontier labs as of 2026.