How much does Mistral Small 3.1 cost?

Mistral Small 3.1 costs $0.35/1M tokens for input and $0.56/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Mistral Small 3.1 via API?

Send a POST request to POST /v1/chat/completions with model "mistral-small-3.1" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Mistral Small 3.1?

Mistral Small 3.1 supports a 128K token context window with up to 8K output tokens.

Back to all models

LLM Chatfree-tieropen-sourcevision

Mistral Small 3.1

by Mistral AI · Released March 2025

A 24B parameter open-source model with 128K context, vision understanding, and function calling. Outperforms GPT-4o Mini and Gemma 3 while running at 150 tokens/sec. Free on CallMissed.

LLM Chat

Mistral Small 3.1

Context Window

128K

Parameters

24B (dense)

Max Output

Overview

Mistral Small 3.1 (2503) builds upon Mistral Small 3 by adding state-of-the-art vision understanding and enhancing long context capabilities up to 128K tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks while remaining efficient enough to run on a single GPU.

The model outperforms comparable models like Gemma 3 and GPT-4o Mini across a range of benchmarks, while delivering inference speeds of 150 tokens per second. It supports function calling, structured outputs, and JSON mode — making it suitable for agentic workflows and tool-use scenarios.

Mistral Small 3.1 is released under the Apache 2.0 license, making it fully open-source and available for commercial use without restrictions. On CallMissed, it runs on the CallMissed gateway, making it available on the free tier with no additional cost beyond credits.

Key improvements over Mistral Small 3 include multimodal vision understanding (the model can process images alongside text), extended context from 32K to 128K tokens, and improved performance on long-document comprehension tasks. The model is optimized for efficient local inference, supporting use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.

Pricing

Metric	Price
Input /1M tokens	₹35.0000
Output /1M tokens	₹56.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

Free on CallMissed — available on the free tier
Outperforms GPT-4o Mini and Gemma 3 on most benchmarks
128K context window for long documents
Vision understanding — processes images alongside text
Apache 2.0 open-source license
150 tokens/sec inference speed

Benchmarks

Benchmark	Score	Notes
MMLU	81.0%	General knowledge
HumanEval	84.8%	Code generation
MATH	69.3%	Mathematics
GPQA	40.7%	Graduate-level science
IFEval	77.8%	Instruction following
Output Speed	150 t/s	Inference throughput

Technical Details

Architecture: Dense Transformer, 24B parameters
Context window: 128,000 tokens (extended from 32K in Mistral Small 3)
Vision: multimodal — processes text and image inputs
Function calling and structured outputs supported
License: Apache 2.0 (fully open-source, commercial use allowed)
Hosted on the CallMissed gateway — free tier eligible
Optimized for single-GPU deployment
Knowledge cutoff: Early 2025

Strengths

Free on CallMissed — no paid plan required
Open-source (Apache 2.0) — can be self-hosted
Strong performance relative to model size — beats GPT-4o Mini
Vision + text multimodal capabilities
128K context for long documents
Fast inference at 150 tokens/sec

Limitations

Smaller than frontier models — less capable on the hardest reasoning tasks
Vision capabilities are newer and less tested than dedicated vision models
No extended thinking / chain-of-thought reasoning mode

Use Cases

Conversational agentsFunction calling and tool useLong-document comprehensionImage understandingPrivacy-sensitive deployments

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-small-3.1",
    "messages": [{"role": "user", "content": "Explain the difference between async and sync programming in Python"}],
    "temperature": 0.7
  }'

Endpoint: POST /v1/chat/completions · Model ID: mistral-small-3.1

Try Mistral Small 3.1 now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs