LLM Chatfree-tieropen-sourcevision

Mistral Small 3.1

by Mistral AI · Released March 2025

A 24B parameter open-source model with 128K context, vision understanding, and function calling. Outperforms GPT-4o Mini and Gemma 3 while running at 150 tokens/sec. Free on CallMissed.

LLM Chat

Mistral Small 3.1

Powered by Mistral AI · Dense Transformer (24B)

Context Window

128K

Parameters

24B (dense)

Max Output

8K

Category

LLM Chat

Overview

Mistral Small 3.1 (2503) builds upon Mistral Small 3 by adding state-of-the-art vision understanding and enhancing long context capabilities up to 128K tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks while remaining efficient enough to run on a single GPU.

The model outperforms comparable models like Gemma 3 and GPT-4o Mini across a range of benchmarks, while delivering inference speeds of 150 tokens per second. It supports function calling, structured outputs, and JSON mode — making it suitable for agentic workflows and tool-use scenarios.

Mistral Small 3.1 is released under the Apache 2.0 license, making it fully open-source and available for commercial use without restrictions. On CallMissed, it runs on Cloudflare Workers AI infrastructure, making it available on the free tier with no additional cost beyond credits.

Key improvements over Mistral Small 3 include multimodal vision understanding (the model can process images alongside text), extended context from 32K to 128K tokens, and improved performance on long-document comprehension tasks. The model is optimized for efficient local inference, supporting use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.

Pricing

MetricPrice
Input /1M tokens₹35.0000
Output /1M tokens₹56.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Free on CallMissed — available on the free tier
  • Outperforms GPT-4o Mini and Gemma 3 on most benchmarks
  • 128K context window for long documents
  • Vision understanding — processes images alongside text
  • Apache 2.0 open-source license
  • 150 tokens/sec inference speed

Benchmarks

BenchmarkScore
MMLU81.0%
HumanEval84.8%
MATH69.3%
GPQA40.7%
IFEval77.8%
Output Speed150 t/s

Technical Details

  • Architecture: Dense Transformer, 24B parameters
  • Context window: 128,000 tokens (extended from 32K in Mistral Small 3)
  • Vision: multimodal — processes text and image inputs
  • Function calling and structured outputs supported
  • License: Apache 2.0 (fully open-source, commercial use allowed)
  • Hosted on Cloudflare Workers AI — free tier eligible
  • Optimized for single-GPU deployment
  • Knowledge cutoff: Early 2025

Strengths

  • Free on CallMissed — no paid plan required
  • Open-source (Apache 2.0) — can be self-hosted
  • Strong performance relative to model size — beats GPT-4o Mini
  • Vision + text multimodal capabilities
  • 128K context for long documents
  • Fast inference at 150 tokens/sec

Limitations

  • Smaller than frontier models — less capable on the hardest reasoning tasks
  • Vision capabilities are newer and less tested than dedicated vision models
  • No extended thinking / chain-of-thought reasoning mode

Use Cases

Conversational agentsFunction calling and tool useLong-document comprehensionImage understandingPrivacy-sensitive deployments

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-small-3.1",
    "messages": [{"role": "user", "content": "Explain the difference between async and sync programming in Python"}],
    "temperature": 0.7
  }'

Endpoint: POST /v1/chat/completions · Model ID: mistral-small-3.1

Try Mistral Small 3.1 now

Get 1000 free API credits on signup. No credit card required.