GPT-5.4 Mini
by OpenAI · Released March 2026
A smaller, faster, and more affordable variant of GPT-5.4. Retains the 1M context window and most capabilities at a fraction of the cost. Ideal for high-volume applications where speed and cost matter.
GPT-5.4 Mini
Powered by OpenAI · Transformer (proprietary, distilled)
Context Window
1M
Parameters
Undisclosed
Max Output
16K
Category
LLM Chat
Overview
GPT-5.4 Mini is a distilled variant of GPT-5.4, designed for high-volume production workloads where speed and cost are critical. Despite being significantly smaller, it retains the 1M token context window — a remarkable engineering achievement that allows it to process massive documents and codebases at a fraction of the cost of its larger siblings.
The model is optimized for fast inference, making it suitable for real-time chat applications, content summarization, classification tasks, and any workflow where low latency matters. At $1.00/M input and $6.00/M output, it offers 6x cheaper output tokens compared to GPT-5.4, making it the go-to choice for cost-sensitive deployments that still need strong general capabilities.
GPT-5.4 Mini maintains good performance on standard benchmarks while trading some capability on the most complex reasoning tasks. It excels at straightforward tasks like summarization, extraction, classification, and conversational AI where the full power of GPT-5.4 or Pro is unnecessary.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹100.0000 |
| Output /1M tokens | ₹600.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- 6x cheaper than GPT-5.4 on output tokens
- 1M token context window retained
- Fast inference for real-time applications
- Strong performance on standard benchmarks
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 80.1% |
| HumanEval | 88.5% |
| MATH-500 | 88.7% |
| GPQA Diamond | 68.2% |
| SWE-bench Verified | 58.3% |
Technical Details
- Distilled from GPT-5.4 — retains core capabilities at smaller size
- Context window: 1,000,000 tokens retained from full GPT-5.4
- Optimized for fast inference and low latency
- 6x cheaper output tokens compared to GPT-5.4
- Supports structured outputs, function calling, and JSON mode
- Post-trained with RLHF for instruction following
- Available via OpenAI API and CallMissed unified gateway
Strengths
- 6x cheaper than GPT-5.4 while retaining the 1M context window
- Fast inference optimized for real-time and high-volume workloads
- Strong general-purpose performance for straightforward tasks
- Good balance of cost, speed, and capability for production deployments
Limitations
- Reduced performance on complex reasoning compared to GPT-5.4 and Pro
- Less capable at multi-step agentic tasks requiring deep planning
- Proprietary — no self-hosting or fine-tuning options
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "openai/gpt-5.4-mini", "messages": [{"role": "user", "content": "Summarize this article"}]}'Endpoint: POST /v1/chat/completions · Model ID: openai/gpt-5.4-mini
Try GPT-5.4 Mini now
Get 1000 free API credits on signup. No credit card required.