Claude Opus 4.6
by Anthropic · Released February 5, 2026
Anthropic's most capable model. Claude Opus 4.6 features a 1M token context window, 128K max output tokens, extended thinking, and a 14.5-hour task completion horizon. Excels at financial analysis, complex code debugging, multi-step planning, and autonomous task execution.
Claude Opus 4.6
Powered by Anthropic · Transformer (proprietary)
Context Window
1M
Parameters
Undisclosed
Max Output
128K
Category
LLM Chat
Overview
Claude Opus 4.6 is Anthropic's most capable model and the first Opus-class system to feature a 1-million-token context window (available in beta). It doubles the maximum output to 128K tokens (up from 64K) and introduces premium pricing for contexts exceeding 200K tokens ($10 input / $37.50 output per million tokens). The model plans more carefully, sustains agentic tasks longer, operates more reliably in larger codebases, and delivers significantly better code review and debugging than its predecessors.
Opus 4.6 thinks more deeply than previous Claude models, revisiting its reasoning before settling on an answer. This deeper deliberation can add cost and latency on simpler tasks, so Anthropic recommends dialing the effort level to medium for routine queries. The model supports adaptive thinking with four configurable effort levels — low, medium, high (default), and max — and picks up contextual clues about how much reasoning a given prompt requires. Context compaction (in beta) automatically summarizes older context when approaching the token threshold, keeping conversations coherent without manual truncation.
Agent teams in Claude Code (research preview) allow multiple agents to work in parallel and coordinate autonomously, unlocking complex multi-repo workflows. In one demonstration, Opus 4.6 autonomously closed 13 issues and assigned 12 to the right team members in a single day, managing an approximately 50-person organization across 6 repositories. Partners described the model as handling a multi-million-line codebase migration "like a senior engineer."
Benchmark results represent a qualitative shift in capability. On MRCR v2 8-needle at 1M context, Opus 4.6 scores 76% compared to Sonnet 4.5 at just 18.5% — a dramatic improvement in long-context utilization. It achieves the highest score on Terminal-Bench 2.0 and leads all frontier models on Humanity's Last Exam. On GDPval-AA, it outperforms GPT-5.2 by approximately 144 Elo points and Opus 4.5 by 190 points, translating to winning roughly 70% of head-to-head comparisons. BrowseComp results are the best of any model at locating hard-to-find information online, with a multi-agent harness pushing accuracy to 86.8%.
In legal and cybersecurity domains, Opus 4.6 scores 90.2% on BigLaw Bench with 40% perfect scores and 84% of responses scoring above 0.8. For cybersecurity, 38 out of 40 investigations produced the best results in a blind ranking against Claude 4.5 models, with each model running up to 9 subagents and over 100 tool calls per investigation.
Safety is a core focus. Opus 4.6 has the lowest rate of over-refusals of any recent Claude model and underwent the most comprehensive safety evaluations Anthropic has ever conducted, including 6 new cybersecurity probes. Misaligned behavior rates remain low across all tested scenarios. Reduced refusals mean the model is more helpful on legitimate edge-case queries without compromising on genuinely harmful requests.
Partner adoption has been strong. Teams at Notion, Devin, Cognition, Windsurf, Lovable, Box, Figma, and v0 have integrated Opus 4.6 into their products, citing its sustained agentic performance and reliability in production. Claude in Excel has received improvements, and Claude in PowerPoint is available as a research preview. US-only inference is offered at 1.1x standard pricing for organizations with data residency requirements.
At standard pricing of $7/$35 per million tokens (with premium rates above 200K context), Opus 4.6 is positioned for enterprise teams that need the deepest reasoning, longest autonomous task horizons, and most reliable agentic performance available — particularly for financial analysis, legal review, complex code debugging, and multi-step autonomous workflows.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹700.0000 |
| Output /1M tokens | ₹3500.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- 1M token context window with 128K max output
- 14.5-hour autonomous task completion horizon
- #1 on Finance Agent benchmark
- Extended thinking for deep reasoning chains
Benchmarks
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 80.8% |
| OSWorld-Verified | 72.7% |
| Terminal-Bench 2.0 | 65.4% |
| Humanity's Last Exam | #1 |
| BigLaw Bench | 90.2% |
| MRCR (1M) | 76% |
| Finance Agent v1.1 | 60.1% |
| GDPval-AA | 1606 Elo |
Technical Details
- Context window: 1,000,000 tokens with 128K max output (doubled from 64K)
- Adaptive thinking: 4 configurable effort levels for reasoning depth control
- Interleaved thinking: reasons between tool calls for better agentic performance
- Context compaction: auto-summarizes long conversations to stay within limits
- 14.5-hour autonomous task completion horizon for long-running workflows
- Post-trained with Constitutional AI (CAI) and RLHF
- Supports tool use, structured outputs, and computer use
- Available via Anthropic API and CallMissed unified gateway
Strengths
- #1 on Finance Agent, Terminal-Bench, and Humanity's Last Exam benchmarks
- 14.5-hour task horizon enables truly autonomous long-running workflows
- Adaptive thinking lets developers control reasoning depth vs. cost
- Interleaved thinking between tool calls dramatically improves agentic accuracy
- 128K max output for generating complete codebases and detailed reports
Limitations
- Premium pricing at $7/$35 per 1M tokens — expensive for high-volume use
- Higher latency with extended thinking enabled, especially at max effort
- Proprietary and closed-source — no self-hosting option
- 1M context with heavy tool use can lead to high per-request costs
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "anthropic/claude-opus-4.6", "messages": [{"role": "user", "content": "Analyze this financial report and identify risks"}]}'Endpoint: POST /v1/chat/completions · Model ID: anthropic/claude-opus-4.6
Try Claude Opus 4.6 now
Get 1000 free API credits on signup. No credit card required.