What is Mistral Small 4?

Mistral AI's unified hybrid model combining instruct, reasoning (Magistral), and coding (Devstral) capabilities. 119B total parameters with 6.5B active. Features a reasoning_effort parameter, multimodal text+image input, and an industry-first unified architecture replacing three separate models.

How much does Mistral Small 4 cost?

Mistral Small 4 costs $0.2/1M tokens for input and $0.8/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Mistral Small 4 via API?

Send a POST request to POST /v1/chat/completions with model "mistralai/mistral-small-2603" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Mistral Small 4?

Mistral Small 4 supports a 128K token context window with up to 16K output tokens.

सभी मॉडल पर वापस जाएं

LLM चैटfastaffordable

Mistral Small 4

द्वारा Mistral · रिलीज़ March 16, 2026

Mistral AI का एकीकृत हाइब्रिड: instruct, reasoning (Magistral), coding (Devstral)। 119B कुल, 6.5B सक्रिय। reasoning_effort पैरामीटर, टेक्स्ट+इमेज, उद्योग में पहला एकीकृत आर्किटेक्चर।

LLM चैट

Mistral Small 4

द्वारा संचालित Mistral · Hybrid MoE (119B total / 6.5B active)

कॉन्टेक्स्ट विंडो

128K

पैरामीटर

119B total / 6.5B active (MoE)

अधिकतम आउटपुट

16K

श्रेणी

LLM चैट

अवलोकन

Mistral Small 4, released March 16, 2026, is the first Mistral model to unify three previously separate capability lines — instruct (Mistral), reasoning (Magistral), multimodal (Pixtral), and coding (Devstral) — into a single 119B total parameter MoE architecture with only 6.5B active parameters per token (8B including embedding and output layers). This unification is an industry first, eliminating the need to route between specialized models and simplifying production deployments significantly.

The architecture features 128 experts with 4 active per token, a 256K context window, and an innovative reasoning_effort parameter that lets developers control reasoning depth: "none" for fast direct responses and "high" for deep chain-of-thought reasoning. It supports multimodal text+image input and is released under the Apache 2.0 license, making it one of the most capable fully open models available for commercial use without restrictions.

Efficiency per token is a defining characteristic. On AA LCR (a conciseness-adjusted benchmark), Mistral Small 4 scores 0.72 with only 1.6K characters of output, while Qwen models need 5.8-6.1K characters to achieve comparable performance — meaning Mistral Small 4 delivers the same quality answer in roughly one-quarter the tokens. On LiveCodeBench, it outperforms GPT-OSS-120B while using 20% less output. This efficiency per token directly impacts cost and scalability in production, as fewer output tokens mean lower API bills and faster response times.

Performance-wise, Mistral Small 4 achieves a 40% latency reduction and 3x throughput improvement over Mistral Small 3, while being competitive with GPT-OSS-120B on benchmarks. The model is a founding member of the NVIDIA Nemotron Coalition and is available day-0 as an NVIDIA NIM for optimized containerized inference, enabling seamless deployment on NVIDIA infrastructure. It can also be customized with NVIDIA NeMo for domain-specific fine-tuning.

For self-hosting, Mistral Small 4 requires a minimum of 4x HGX H100, 2x HGX H200, or 1x DGX B200, and is supported on vLLM, llama.cpp, SGLang, and Transformers. The model is also available through La Plateforme (Mistral's API), major cloud model catalogs, as well as through CallMissed's unified gateway.

At $0.20 per million input tokens and $0.80 per million output tokens, Mistral Small 4 is among the most affordable frontier-class models available. The combination of unified capabilities across instruct, reasoning, coding, and multimodal tasks, extreme efficiency (6.5B active from 119B total), Apache 2.0 licensing, NVIDIA NIM availability, and ultra-affordable pricing makes it a compelling choice for production deployments that need reasoning, coding, and general capabilities in a single model without the complexity of routing between specialized systems.

प्राइसिंग

मेट्रिक	कीमत
इनपुट /1M tokens	₹20.0000
आउटपुट /1M tokens	₹80.0000

1 क्रेडिट = ₹1 = $0.01 USD। कीमतें प्रोवाइडर से दिखाई गई हैं; CallMissed ~35% मार्कअप के साथ पास-थ्रू करता है।

मुख्य बातें

एक मॉडल में instruct, reasoning और coding
119B कुल, 6.5B सक्रिय — अत्यंत कुशल
उद्योग-पहला reasoning_effort पैरामीटर
मल्टीमोडल: टेक्स्ट और इमेज इनपुट

बेंचमार्क

बेंचमार्क	स्कोर	नोट्स
MMLU-Pro	78.2%	व्यावसायिक ज्ञान
HumanEval	86.8%	कोड जनरेशन
MATH-500	84.5%	प्रतियोगिता गणित
AA LCR	0.72	1.6K अक्षर बनाम Qwen की 5.8–6.1K
Throughput	3x Small 3	Mistral Small 3 से 3x थ्रूपुट
Latency	-40%	Mistral Small 3 से 40% विलंबता कमी

तकनीकी विवरण

आर्किटेक्चर: 119B MoE, 128 विशेषज्ञ, 4 सक्रिय (6.5B, एम्बेडिंग सहित 8B)
Magistral + Pixtral + Devstral एकीकृत
reasoning_effort: तेज़ के लिए "none", गहन के लिए "high"
कॉन्टेक्स्ट: 256K टोकन
Apache 2.0 — पूर्ण व्यावसायिक स्वतंत्रता
Mistral Small 3 से 40% कम विलंबता, 3x थ्रूपुट
कम आउटपुट के साथ GPT-OSS-120B से प्रतिस्पर्धी
न्यूनतम: 4x HGX H100, 2x HGX H200, या 1x DGX B200
vLLM, llama.cpp, SGLang, Transformers पर उपलब्ध
Mistral API और CallMissed unified gateway पर उपलब्ध

ताकतें

instruct+reasoning+coding — मॉडल रूटिंग की ज़रूरत नहीं
119B से केवल 6.5B सक्रिय — अत्यंत कुशल
Apache 2.0 — पूर्ण व्यावसायिक स्वतंत्रता
कम्प्यूट-गुणवत्ता ट्रेड-ऑफ के लिए reasoning_effort
अत्यंत किफ़ायती: $0.20/$0.80 प्रति 1M टोकन

सीमाएं

बड़े frontier (GPT-5.4, Opus 4.6) से कम पूर्ण क्षमता
6.5B सक्रिय — सबसे जटिल तर्क पर गहराई सीमित
नया एकीकृत आर्किटेक्चर — कम प्रोडक्शन ट्रैक रिकॉर्ड

उपयोग के मामले

कोड जनरेशनतर्क कार्यमल्टीमोडल विश्लेषणलागत-कुशल डिप्लॉयमेंट

API उदाहरण

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "mistralai/mistral-small-2603", "messages": [{"role": "user", "content": "Write a Rust function with error handling"}]}'

एंडपॉइंट: POST /v1/chat/completions · मॉडल ID: mistralai/mistral-small-2603

Mistral Small 4 अभी आज़माएं

साइनअप पर 1000 फ्री API क्रेडिट पाएं। कोई क्रेडिट कार्ड ज़रूरी नहीं।

फ्री शुरू करें डॉक्स पढ़ें