How much does Kimi K2.5 Fast cost?

Kimi K2.5 Fast costs $0.81/1M tokens for input and $4.05/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use Kimi K2.5 Fast via API?

Send a POST request to POST /v1/chat/completions with model "kimi-k2.5-fast" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of Kimi K2.5 Fast?

Kimi K2.5 Fast supports a 128K token context window with up to 16K output tokens.

Back to all models

LLM Chatfast

Kimi K2.5 Fast

by Moonshot · Released January 2026

The fast inference variant of Kimi K2.5. Same 1T parameter architecture but optimized for low-latency responses. Ideal for real-time applications where speed is critical.

LLM Chat

Kimi K2.5 Fast

Context Window

128K

Parameters

1T total / 32B active (MoE)

Max Output

16K

Overview

Kimi K2.5 Fast is the speed-optimized variant of Moonshot AI's flagship K2.5 model. It shares the same 1 trillion total parameter MoE architecture with 32B active parameters per token, but is specifically optimized for low-latency inference — achieving 414 tokens per second throughput, making it one of the fastest large-scale models available.

The optimization focuses on inference efficiency without significantly compromising output quality. The model retains the core capabilities of K2.5 — native multimodality, strong coding performance, and flexible operating modes — while delivering responses fast enough for real-time chat applications, voice agent backends, and interactive coding assistants.

Kimi K2.5 Fast is the recommended choice when you need K2.5-level capability but latency is a primary concern. It handles real-time conversational AI, interactive coding sessions, and voice-driven applications where users expect near-instant responses.

Pricing

Metric	Price
Input /1M tokens	₹81.0000
Output /1M tokens	₹405.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

Optimized for low-latency inference
Same architecture as K2.5 standard
414 tokens/second throughput
Ideal for real-time chat applications

Benchmarks

Benchmark	Score	Notes
SWE-bench	75.2%	Software engineering (slightly below K2.5 standard)
LiveCodeBench	83.8%	Live competitive programming
HumanEval	90.7%	Code generation
Throughput	414 tok/s	High-throughput inference

Technical Details

Same 1T total / 32B active MoE architecture as K2.5 standard
Optimized for low-latency inference: 414 tokens/second
Inference optimizations include quantization and speculative decoding
Retains native multimodal capabilities from K2.5
Context window: 128K tokens
Open-weight model — same weights as K2.5 with optimized serving
Available via Moonshot API and CallMissed unified gateway

Strengths

414 tok/s throughput — among the fastest large-scale models
Same architecture as K2.5 with minimal quality trade-off
Ideal for real-time and voice-driven applications
Open-weight — can be self-hosted with optimized serving

Limitations

Slight quality reduction compared to K2.5 standard on complex tasks
Same pricing as K2.5 standard — speed optimization, not cost optimization
128K context is smaller than 1M-context competitors

Use Cases

Real-time chatVoice agent backendsLow-latency applicationsInteractive coding

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "kimi-k2.5-fast", "messages": [{"role": "user", "content": "Quick answer: what is the capital of France?"}]}'

Endpoint: POST /v1/chat/completions · Model ID: kimi-k2.5-fast

Try Kimi K2.5 Fast now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs