How much does gpt-5-mini cost?

gpt-5-mini costs $0.25/1M tokens for input and $2/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use gpt-5-mini via API?

Send a POST request to POST /v1/chat/completions with model "gpt-5-mini" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of gpt-5-mini?

gpt-5-mini supports a 400K token context window with up to 128K output tokens.

Back to all models

LLM Chatreasoning

gpt-5-mini

by OpenAI · Released 2025

OpenAI GPT-5 mini — fast, affordable reasoning model. 400K context, text-only.

LLM Chat

gpt-5-mini

Context Window

400K

Parameters

Not disclosed

Max Output

128K

Overview

GPT-5 mini is OpenAI's cost-efficient reasoning model — explicitly documented as a "reasoning" tier with high internal deliberation, 400,000 tokens of context, and up to 128,000 tokens of output (platform.openai.com/docs/models/gpt-5-mini). The API model id is `gpt-5-mini`. On CallMissed you invoke it via `/v1/chat/completions` with the same JSON schema as other OpenAI chat models, but you should read OpenAI's reasoning guide: sampling parameters like temperature may be constrained, and billing includes reasoning tokens counted toward output.

OpenAI positions GPT-5 mini for well-defined, high-volume tasks that still benefit from chain-of-thought quality — classification with nuance, multi-step planning, structured extraction, moderation with explanation, and agent routing — without paying full GPT-5 flagship prices. It accepts text and image input but returns text only. Fine-tuning is not supported on the model card; image generation tools in the Responses API are also unavailable for this model family.

Pricing on CallMissed is $0.25 per million input tokens and $2.00 per million output tokens, with cached input at $0.025 per million — among the cheapest ways to get genuine reasoning on the platform. That makes GPT-5 mini attractive for batch evaluation pipelines, guardrail models, background planners, and internal copilots where latency in the hundreds of milliseconds is acceptable. Be mindful of `max_tokens` / `max_completion_tokens`: reasoning models can consume output budget on internal thinking; if the limit is too low you may see incomplete responses.

Compared to GPT-4.1, GPT-5 mini trades explicit sampling control and ultra-long 1M context for stronger deliberation on tricky instructions. Compared to GPT-4o, it is text-first and reasoning-native — pick GPT-4o when you need vision-heavy interactive chat at moderate cost, and GPT-5 mini when step-by-step logic dominates. Many production systems use GPT-5 mini as a "second opinion" model or as the brain in tool loops while a smaller model handles formatting.

Integration guidance: always set a generous output cap for tasks that require explanation; log finish reasons. Use system prompts to constrain format (JSON, bullet lists) because reasoning models follow instructions well but may over-explain if not guided. Streaming is supported — wire it for UX even if total latency exceeds non-reasoning models. On CallMissed you use the clean id `gpt-5-mini` in requests — no provider prefix.

Limitations: not the absolute frontier on the hardest research tasks (OpenAI reserves that for larger GPT-5 variants), text-only output path for chat completions, and behavior differs from classic GPT-4 sampling — test temperature-sensitive prompts again when migrating. For audio or realtime speech, use `gpt-realtime` or the STT/TTS models instead.

Reasoning token economics: OpenAI reasoning models bill internal chain-of-thought toward output usage. A request that looks like "500 completion tokens" to the user may include additional reasoning tokens in usage breakdowns. Size budgets for agents accordingly — if your pipeline caps spend per step, monitor `usage` fields after each call.

Evaluation playbook: before promoting GPT-5 mini to production, run a held-out set comparing it to GPT-4.1 on your exact prompts. Reasoning models often win on ambiguous policy interpretation, multi-constraint scheduling, and fraud review, while tieing or losing on simple templated tasks where mini reasoning is overkill.

Structured output patterns: even without native JSON mode on every snapshot, system prompts like "Return only valid JSON matching this schema" work well. Validate with pydantic/zod on your server; never trust raw JSON for SQL execution without parameterization.

Hosting notes: GPT-5 mini tracks OpenAI snapshot naming (`gpt-5-mini-2025-08-07` etc.). CallMissed maps the unversioned id to the current production deployment. Breaking changes are rare but possible on snapshot bumps — pin integration tests to behavioral assertions, not exact wording.

Multi-model orchestration: a common pattern is GPT-5 mini as planner (decides tool sequence) plus GPT-4o mini-class formatter (not on this page) or deterministic code for final user text. Another pattern uses GPT-5 mini only on escalations after a cheaper model confidence check fails.

Latency expectations: first-token latency exceeds non-reasoning models; communicate "thinking" states in UI when streaming. For batch jobs, concurrency limits may apply — stagger large nightly runs.

Safety: reasoning models can over-analyze jailbreak attempts — still apply input filtering and output moderation for user-generated content platforms.

Snapshot pinning: document which CallMissed deployment snapshot you validated against in your internal runbooks (`gpt-5-mini` unversioned id tracks our current production alias). Re-run eval suites when the catalog changelog notes snapshot bumps. For support tickets, always include request id, model id, and approximate prompt token count — reasoning models fail in subtle ways when output caps truncate mid-thought.

Pricing

Metric	Price
Input /1M tokens	₹25.0000
Output /1M tokens	₹200.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

Affordable reasoning
400K context
Streaming + tools

Benchmarks

Benchmark	Score	Notes
GPQA	0.71	Graduate science

Technical Details

Model id: gpt-5-mini
Reasoning model — fixed temperature/top_p

Strengths

Low cost
Large context

Limitations

Text-only
Fixed sampling params

Use Cases

High-volume reasoningClassificationAgents

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "gpt-5-mini", "messages": [{"role": "user", "content": "Plan this task"}]}'

Endpoint: POST /v1/chat/completions · Model ID: gpt-5-mini

Try gpt-5-mini now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs