How much does Mistral Small 4 cost?

Mistral Small 4 costs $0.2/1M tokens for input and $0.8/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Mistral Small 4 via API?

Send a POST request to POST /v1/chat/completions with model "mistralai/mistral-small-2603" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Mistral Small 4?

Mistral Small 4 supports a 128K token context window with up to 16K output tokens.

Back to all models

LLM Chatfastaffordable

Mistral Small 4

by Mistral · Released March 16, 2026

Mistral AI's unified hybrid model combining instruct, reasoning (Magistral), and coding (Devstral) capabilities. 119B total parameters with 6.5B active. Features a reasoning_effort parameter, multimodal text+image input, and an industry-first unified architecture replacing three separate models.

LLM Chat

Mistral Small 4

Context Window

128K

Parameters

119B total / 6.5B active (MoE)

Max Output

16K

Overview

Mistral Small 4, released March 16, 2026, is the first Mistral model to unify three previously separate capability lines — instruct (Mistral), reasoning (Magistral), multimodal (Pixtral), and coding (Devstral) — into a single 119B total parameter MoE architecture with only 6.5B active parameters per token (8B including embedding and output layers). This unification is an industry first, eliminating the need to route between specialized models and simplifying production deployments significantly.

The architecture features 128 experts with 4 active per token, a 256K context window, and an innovative reasoning_effort parameter that lets developers control reasoning depth: "none" for fast direct responses and "high" for deep chain-of-thought reasoning. It supports multimodal text+image input and is released under the Apache 2.0 license, making it one of the most capable fully open models available for commercial use without restrictions.

Efficiency per token is a defining characteristic. On AA LCR (a conciseness-adjusted benchmark), Mistral Small 4 scores 0.72 with only 1.6K characters of output, while Qwen models need 5.8-6.1K characters to achieve comparable performance — meaning Mistral Small 4 delivers the same quality answer in roughly one-quarter the tokens. On LiveCodeBench, it outperforms GPT-OSS-120B while using 20% less output. This efficiency per token directly impacts cost and scalability in production, as fewer output tokens mean lower API bills and faster response times.

Performance-wise, Mistral Small 4 achieves a 40% latency reduction and 3x throughput improvement over Mistral Small 3, while being competitive with GPT-OSS-120B on benchmarks. The model is a founding member of the NVIDIA Nemotron Coalition and is available day-0 as an NVIDIA NIM for optimized containerized inference, enabling seamless deployment on NVIDIA infrastructure. It can also be customized with NVIDIA NeMo for domain-specific fine-tuning.

For self-hosting, Mistral Small 4 requires a minimum of 4x HGX H100, 2x HGX H200, or 1x DGX B200, and is supported on vLLM, llama.cpp, SGLang, and Transformers. The model is also available through La Plateforme (Mistral's API), major cloud model catalogs, as well as through CallMissed's unified gateway.

At $0.20 per million input tokens and $0.80 per million output tokens, Mistral Small 4 is among the most affordable frontier-class models available. The combination of unified capabilities across instruct, reasoning, coding, and multimodal tasks, extreme efficiency (6.5B active from 119B total), Apache 2.0 licensing, NVIDIA NIM availability, and ultra-affordable pricing makes it a compelling choice for production deployments that need reasoning, coding, and general capabilities in a single model without the complexity of routing between specialized systems.

Pricing

Metric	Price
Input /1M tokens	₹20.0000
Output /1M tokens	₹80.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

Unifies instruct, reasoning, and coding in one model
119B total params, only 6.5B active — ultra-efficient
Industry-first reasoning_effort parameter
Multimodal: text and image input

Benchmarks

Benchmark	Score	Notes
MMLU-Pro	78.2%	Professional knowledge
HumanEval	86.8%	Code generation
MATH-500	84.5%	Competition mathematics
AA LCR	0.72	1.6K chars vs Qwen needing 5.8-6.1K chars
Throughput	3x Small 3	3x throughput vs Mistral Small 3
Latency	-40%	40% latency reduction vs Mistral Small 3

Technical Details

Architecture: 119B total MoE with 128 experts, 4 active per token (6.5B active, 8B incl. embedding/output)
Unifies Magistral (reasoning) + Pixtral (multimodal) + Devstral (coding)
reasoning_effort parameter: "none" for fast, "high" for deep reasoning
Context window: 256K tokens
Apache 2.0 license — full commercial freedom
40% latency reduction, 3x throughput vs Mistral Small 3
Competitive with GPT-OSS-120B on benchmarks with shorter outputs
Minimum hardware: 4x HGX H100, 2x HGX H200, or 1x DGX B200
Available on vLLM, llama.cpp, SGLang, Transformers
Available via Mistral API and CallMissed unified gateway

Strengths

Unifies instruct, reasoning, and coding — no need for model routing
Only 6.5B active params from 119B total — extremely efficient
Apache 2.0 license with full commercial freedom
reasoning_effort parameter for flexible compute-quality trade-off
Ultra-affordable at $0.20/$0.80 per 1M tokens

Limitations

Lower absolute capability than larger frontier models (GPT-5.4, Opus 4.6)
6.5B active parameters limits depth on the most complex reasoning tasks
Newer unified architecture with less production track record

Use Cases

Code generationReasoning tasksMultimodal analysisCost-efficient deployment

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "mistralai/mistral-small-2603", "messages": [{"role": "user", "content": "Write a Rust function with error handling"}]}'

Endpoint: POST /v1/chat/completions · Model ID: mistralai/mistral-small-2603

Try Mistral Small 4 now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs