GPT-5.4 Nano
by OpenAI · Released March 2026
The smallest and most affordable model in the GPT-5.4 family. Designed for ultra-high-volume, latency-sensitive workloads. Retains the 1M context window with the lowest per-token cost.
GPT-5.4 Nano
Powered by OpenAI · Transformer (proprietary, distilled)
Context Window
1M
Parameters
Undisclosed
Max Output
16K
Category
LLM Chat
Overview
GPT-5.4 Nano is the smallest and most cost-effective model in the GPT-5.4 family, designed for ultra-high-volume, latency-sensitive workloads where every millisecond and every fraction of a cent matters. At $0.27/M input and $1.70/M output, it is by far the cheapest OpenAI model while still retaining the signature 1M token context window.
Despite its small size, GPT-5.4 Nano delivers surprisingly capable performance on tasks like entity extraction, text classification, routing, and lightweight conversational AI. It is purpose-built for embedding directly into products at massive scale — think millions of API calls per day for features like auto-complete, content moderation, or intent detection.
The model trades complex reasoning and deep analysis capability for raw speed and cost efficiency. It is not the right choice for multi-step coding tasks or nuanced research, but for the vast majority of production AI features that need fast, reliable, and cheap inference, GPT-5.4 Nano is the optimal pick in the OpenAI lineup.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹27.0000 |
| Output /1M tokens | ₹170.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Lowest cost in the GPT-5.4 family
- 1M token context window
- Ultra-low latency for real-time use
- Ideal for embedding in products at scale
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 72.4% |
| HumanEval | 82.1% |
| MATH-500 | 80.5% |
| GPQA Diamond | 58.7% |
Technical Details
- Smallest model in the GPT-5.4 family — optimized for cost and speed
- Context window: 1,000,000 tokens retained despite small model size
- Ultra-low latency inference for real-time applications
- Pricing: $0.27/M input, $1.70/M output — cheapest OpenAI model
- Supports structured outputs, function calling, and JSON mode
- Distilled from larger GPT-5.4 models
- Ideal for embedding in high-volume product features
Strengths
- Cheapest model in the GPT-5.4 family at $0.27/M input tokens
- Ultra-low latency makes it ideal for real-time product features
- Retains 1M context window despite minimal model size
- Excellent for high-volume tasks like classification, extraction, and routing
Limitations
- Significantly reduced reasoning capability compared to GPT-5.4 and Pro
- Not suitable for complex coding, research, or multi-step planning tasks
- Proprietary — no self-hosting or fine-tuning options
- May produce lower quality outputs on nuanced or ambiguous prompts
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "openai/gpt-5.4-nano", "messages": [{"role": "user", "content": "Extract the key entities from this text"}]}'Endpoint: POST /v1/chat/completions · Model ID: openai/gpt-5.4-nano
Try GPT-5.4 Nano now
Get 1000 free API credits on signup. No credit card required.