How much does Claude Haiku 4.5 cost?

Claude Haiku 4.5 costs $1/1M tokens for input and $5/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Claude Haiku 4.5 via API?

Send a POST request to POST /v1/chat/completions with model "anthropic/claude-haiku-4.5" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Claude Haiku 4.5?

Claude Haiku 4.5 supports a 200K token context window with up to 64K output tokens.

Back to all models

LLM Chatfastaffordablevision

Claude Haiku 4.5

by Anthropic · Released October 2025

Anthropic's fastest and most affordable Claude model. 200K context, 64K max output, vision, extended thinking, and computer use — at a fraction of the cost of Sonnet or Opus.

LLM Chat

Claude Haiku 4.5

Context Window

200K

Parameters

Undisclosed

Max Output

64K

Overview

Claude Haiku 4.5 is Anthropic's lightweight frontier model, designed for high-volume production workloads where speed and cost matter. Despite being the smallest model in the Claude 4 family, it delivers performance on par with Claude Sonnet 4 (which held the title of best coding model just five months before Haiku 4.5's release) and outperforms it in several areas.

The model supports a 200,000-token context window with up to 64,000 output tokens — a massive jump from Haiku 3.5's 8,192 output limit. It processes both text and images, supports extended thinking (the first Haiku model to do so), computer use for GUI automation, and context awareness for maintaining state across multi-turn conversations.

On SWE-bench Verified, Claude Haiku 4.5 scores 73.3%, making it one of the world's best coding models at any price point. It achieves 97 tokens per second on Artificial Analysis benchmarks, making it significantly faster than Sonnet or Opus. The model excels at code generation, classification, content moderation, real-time chat, and any task where low latency and high throughput are critical.

At $1.00/M input and $5.00/M output, Haiku 4.5 is 4x cheaper than Sonnet 4.6 on input and 4x cheaper on output, while delivering comparable quality on most tasks. It supports prompt caching (cache reads at $0.10/M, cache creation at $1.25/M), making repeated system prompts extremely affordable. For teams that need Claude-quality responses at scale without the premium pricing, Haiku 4.5 is the recommended choice.

Pricing

Metric	Price
Input /1M tokens	₹100.0000
Output /1M tokens	₹500.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

73.3% on SWE-bench Verified — world-class coding at Haiku pricing
97 tokens/sec — fastest Claude model
First Haiku with extended thinking and computer use
200K context, 64K max output
4x cheaper than Sonnet 4.6
Prompt caching: cache reads at $0.10/M tokens

Benchmarks

Benchmark	Score	Notes
SWE-bench Verified	73.3%	Real-world software engineering
MMLU-Pro	78.2%	Professional knowledge
HumanEval	88.1%	Code generation
MATH-500	83.4%	Competition mathematics
GPQA Diamond	62.1%	Graduate-level science
Output Speed	97 t/s	Artificial Analysis benchmark

Technical Details

Context window: 200,000 tokens
Max output: 64,000 tokens (8x increase over Haiku 3.5)
Vision: processes text and image inputs
Extended thinking: chain-of-thought reasoning (first Haiku to support this)
Computer use: GUI automation via screenshots and mouse/keyboard control
Prompt caching: cache reads $0.10/M, cache creation $1.25/M
Knowledge cutoff: February 2025
Supports function calling, structured outputs, and JSON mode
Available via Anthropic API and CallMissed unified gateway

Strengths

World-class coding at the lowest Claude price point
Fastest Claude model at 97 tokens/sec
Extended thinking enables complex reasoning at Haiku cost
4x cheaper than Sonnet 4.6 with comparable quality on most tasks
Prompt caching makes repeated system prompts extremely affordable

Limitations

Lower capability ceiling than Sonnet 4.6 or Opus 4.6 on the hardest reasoning tasks
Proprietary — no self-hosting option
200K context is smaller than GPT-5.4's 1M or Opus 4.6's 1M

Use Cases

High-volume chat and supportCode generation and reviewContent moderationClassification and extractionReal-time applications

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-haiku-4.5",
    "messages": [{"role": "user", "content": "Write a Python function to parse CSV files with error handling"}],
    "max_tokens": 2048
  }'

Endpoint: POST /v1/chat/completions · Model ID: anthropic/claude-haiku-4.5

Try Claude Haiku 4.5 now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs