Kimi K2.5 Fast
by Moonshot · Released January 2026
The fast inference variant of Kimi K2.5. Same 1T parameter architecture but optimized for low-latency responses. Ideal for real-time applications where speed is critical.
Kimi K2.5 Fast
Powered by Moonshot · Sparse Mixture-of-Experts (1T total / 32B active, optimized)
Context Window
128K
Parameters
1T total / 32B active (MoE)
Max Output
16K
Category
LLM Chat
Overview
Kimi K2.5 Fast is the speed-optimized variant of Moonshot AI's flagship K2.5 model. It shares the same 1 trillion total parameter MoE architecture with 32B active parameters per token, but is specifically optimized for low-latency inference — achieving 414 tokens per second throughput via Clarifai, making it one of the fastest large-scale models available.
The optimization focuses on inference efficiency without significantly compromising output quality. The model retains the core capabilities of K2.5 — native multimodality, strong coding performance, and flexible operating modes — while delivering responses fast enough for real-time chat applications, voice agent backends, and interactive coding assistants.
Kimi K2.5 Fast is the recommended choice when you need K2.5-level capability but latency is a primary concern. It handles real-time conversational AI, interactive coding sessions, and voice-driven applications where users expect near-instant responses.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹52.0000 |
| Output /1M tokens | ₹230.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Optimized for low-latency inference
- Same architecture as K2.5 standard
- 414 tokens/second throughput via Clarifai
- Ideal for real-time chat applications
Benchmarks
| Benchmark | Score |
|---|---|
| SWE-bench | 75.2% |
| LiveCodeBench | 83.8% |
| HumanEval | 90.7% |
| Throughput | 414 tok/s |
Technical Details
- Same 1T total / 32B active MoE architecture as K2.5 standard
- Optimized for low-latency inference: 414 tokens/second via Clarifai
- Inference optimizations include quantization and speculative decoding
- Retains native multimodal capabilities from K2.5
- Context window: 128K tokens
- Open-weight model — same weights as K2.5 with optimized serving
- Available via Moonshot API and CallMissed unified gateway
Strengths
- 414 tok/s throughput — among the fastest large-scale models
- Same architecture as K2.5 with minimal quality trade-off
- Ideal for real-time and voice-driven applications
- Open-weight — can be self-hosted with optimized serving
Limitations
- Slight quality reduction compared to K2.5 standard on complex tasks
- Same pricing as K2.5 standard — speed optimization, not cost optimization
- 128K context is smaller than 1M-context competitors
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "kimi-k2.5-fast", "messages": [{"role": "user", "content": "Quick answer: what is the capital of France?"}]}'Endpoint: POST /v1/chat/completions · Model ID: kimi-k2.5-fast
Try Kimi K2.5 Fast now
Get 1000 free API credits on signup. No credit card required.