Mistral Small 3.1
by Mistral AI · Released March 2025
A 24B parameter open-source model with 128K context, vision understanding, and function calling. Outperforms GPT-4o Mini and Gemma 3 while running at 150 tokens/sec. Free on CallMissed.
Mistral Small 3.1
Powered by Mistral AI · Dense Transformer (24B)
Context Window
128K
Parameters
24B (dense)
Max Output
8K
Category
LLM Chat
Overview
Mistral Small 3.1 (2503) builds upon Mistral Small 3 by adding state-of-the-art vision understanding and enhancing long context capabilities up to 128K tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks while remaining efficient enough to run on a single GPU.
The model outperforms comparable models like Gemma 3 and GPT-4o Mini across a range of benchmarks, while delivering inference speeds of 150 tokens per second. It supports function calling, structured outputs, and JSON mode — making it suitable for agentic workflows and tool-use scenarios.
Mistral Small 3.1 is released under the Apache 2.0 license, making it fully open-source and available for commercial use without restrictions. On CallMissed, it runs on Cloudflare Workers AI infrastructure, making it available on the free tier with no additional cost beyond credits.
Key improvements over Mistral Small 3 include multimodal vision understanding (the model can process images alongside text), extended context from 32K to 128K tokens, and improved performance on long-document comprehension tasks. The model is optimized for efficient local inference, supporting use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹35.0000 |
| Output /1M tokens | ₹56.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Free on CallMissed — available on the free tier
- Outperforms GPT-4o Mini and Gemma 3 on most benchmarks
- 128K context window for long documents
- Vision understanding — processes images alongside text
- Apache 2.0 open-source license
- 150 tokens/sec inference speed
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU | 81.0% |
| HumanEval | 84.8% |
| MATH | 69.3% |
| GPQA | 40.7% |
| IFEval | 77.8% |
| Output Speed | 150 t/s |
Technical Details
- Architecture: Dense Transformer, 24B parameters
- Context window: 128,000 tokens (extended from 32K in Mistral Small 3)
- Vision: multimodal — processes text and image inputs
- Function calling and structured outputs supported
- License: Apache 2.0 (fully open-source, commercial use allowed)
- Hosted on Cloudflare Workers AI — free tier eligible
- Optimized for single-GPU deployment
- Knowledge cutoff: Early 2025
Strengths
- Free on CallMissed — no paid plan required
- Open-source (Apache 2.0) — can be self-hosted
- Strong performance relative to model size — beats GPT-4o Mini
- Vision + text multimodal capabilities
- 128K context for long documents
- Fast inference at 150 tokens/sec
Limitations
- Smaller than frontier models — less capable on the hardest reasoning tasks
- Vision capabilities are newer and less tested than dedicated vision models
- No extended thinking / chain-of-thought reasoning mode
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-small-3.1",
"messages": [{"role": "user", "content": "Explain the difference between async and sync programming in Python"}],
"temperature": 0.7
}'Endpoint: POST /v1/chat/completions · Model ID: mistral-small-3.1
Try Mistral Small 3.1 now
Get 1000 free API credits on signup. No credit card required.