GPT-5.4 Pro
by OpenAI · Released March 2026
OpenAI's most capable model. GPT-5.4 Pro features a 1M token context window, native computer use, tool search, and sets new records on professional benchmarks. Optimized for deep reasoning, complex coding, and long-horizon agentic workflows.
GPT-5.4 Pro
Powered by OpenAI · Transformer (proprietary)
Context Window
1M
Parameters
Undisclosed
Max Output
128K
Category
LLM Chat
Overview
GPT-5.4 Pro is OpenAI's most capable model, the flagship of the GPT-5.4 family that unifies frontier reasoning, coding, and computer use into a single system. It features a 1-million-token context window (272K standard, 1M in Codex experimental mode), 128K max output tokens for generating entire codebases in one pass, native computer use that interacts with desktops through screenshots, controls mouse and keyboard, and writes Playwright code for browser automation, and tool search — an agentic capability that loads tool definitions on demand instead of all at once, saving tens of thousands of tokens per request.
On professional benchmarks, GPT-5.4 Pro tops the GDPval-AA leaderboard at 1667 Elo, ahead of Claude Sonnet 4.6 at 1633 and Opus 4.6 at 1606. It achieves 87.3% on spreadsheet modeling (vs 68.4% for GPT-5.2), and human raters preferred GPT-5.4 presentations 68% of the time over GPT-5.2. On Humanity's Last Exam, it scored 52.1%, breaking the 50% threshold for the first time among any model. FrontierMath reached 47.6% (vs 40.3% for GPT-5.2), ARC-AGI-1 hit 93.7%, and ARC-AGI-2 reached 83.3% in Pro mode (vs 73.3% for standard GPT-5.4).
The computer use and agentic benchmarks are where GPT-5.4 Pro truly stands apart. On OSWorld-Verified, it scores 75.0% — exceeding human performance of 72.4%, with the previous top model being Kimi K2.5 at 63.3%. This is the first time any AI model has surpassed human-level performance on this benchmark. Web browsing capabilities are equally impressive: 89.3% on BrowseComp (Pro variant; standard GPT-5.4 at 82.7%), 67.3% on WebArena-Verified, and 92.8% on Online-Mind2Web. Toolathlon scored 54.6%, demonstrating strong autonomous tool use.
The model delivers a 33% reduction in false claims and 18% fewer responses containing any errors compared to predecessors. Its steerability is significantly enhanced — it outlines a plan before continuing and allows mid-response adjustments, giving users more control over the output direction. Tool search saves tens of thousands of tokens per request by loading tool definitions on demand rather than including all definitions in every context window.
Native computer use operates via screenshots, mouse and keyboard control, and Playwright browser automation, enabling the model to interact with desktop software, fill out web forms, navigate multi-tab workflows, and execute complex GUI-based tasks autonomously. This makes GPT-5.4 Pro uniquely suited for enterprise automation scenarios that require interacting with legacy software, web applications, and desktop tools.
Safety evaluations include a chain-of-thought controllability study showing that models cannot effectively hide their reasoning, with controllability rates between 0.1% and 15.4%. OpenAI expanded its cyber safety stack and reduced refusals compared to GPT-5.2, making the model more helpful on legitimate edge-case queries without compromising on genuinely harmful requests.
At $30/M input and $180/M output, GPT-5.4 Pro is priced for teams that need the absolute best performance on complex reasoning, long-horizon agentic workflows, professional-grade analysis, and tasks where exceeding human-level performance on computer use and web browsing is critical. For most production workloads, the standard GPT-5.4 at $2.50/$15 offers the same architecture at a fraction of the cost.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹3000.0000 |
| Output /1M tokens | ₹18000.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- 1M token context window for massive codebases and documents
- Native computer use — can operate desktop software
- Tool search for finding and using the right tools autonomously
- Top scores on professional benchmarks (SWE-Bench, GPQA, MATH)
Benchmarks
| Benchmark | Score |
|---|---|
| GDPval | 83% |
| Spreadsheet Modeling | 87.3% |
| SWE-bench Pro | 57.7% |
| Terminal-Bench 2.0 | 75.0% |
| OSWorld-Verified | 75.0% |
| BrowseComp | 89.3% |
| WebArena-Verified | 67.3% |
| Online-Mind2Web | 92.8% |
| FrontierMath | 47.6% |
| Humanity's Last Exam | 52.1% |
| ARC-AGI-1 | 93.7% |
| ARC-AGI-2 | 83.3% |
| Toolathlon | 54.6% |
Technical Details
- Context window: 1,000,000 tokens (272K standard, 1M in Codex experimental)
- Max output: 128K tokens for generating entire codebases in one pass
- Native computer use: interacts with desktop via screenshots, controls mouse/keyboard, writes Playwright code for browser automation
- Tool search: loads tool definitions on demand, saving tens of thousands of tokens per request
- 33% fewer false claims and 18% fewer responses with any errors vs predecessors
- Steerability: outlines plan before continuing, allows mid-response adjustments
- Proprietary Transformer architecture with undisclosed parameter count
- Post-trained with RLHF and extensive red-teaming for safety
- Supports structured outputs, function calling, and JSON mode
- Available via OpenAI API and through CallMissed unified gateway
Strengths
- Most capable model from OpenAI — #1 on GDPval at 1667 Elo
- OSWorld-Verified 75.0% exceeds human performance (72.4%)
- Native computer use enables GUI automation and desktop software operation
- 1M context window handles massive codebases and document collections
- 33% fewer hallucinations and 18% fewer error-containing responses
- Tool search enables fully autonomous agentic workflows
Limitations
- Premium pricing at $30/$180 per 1M tokens — designed for high-value tasks
- Higher latency due to model size — not ideal for real-time chat
- Proprietary and closed-source — no self-hosting option
- Overkill for simple tasks where smaller models suffice
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "openai/gpt-5.4-pro", "messages": [{"role": "user", "content": "Analyze this codebase and suggest architectural improvements"}]}'Endpoint: POST /v1/chat/completions · Model ID: openai/gpt-5.4-pro
Try GPT-5.4 Pro now
Get 1000 free API credits on signup. No credit card required.