Qwen 3.5 Flash
by Qwen · Released February 2026
The fastest and most affordable model in the Qwen 3.5 family. Features native multimodality, 262K native context window (extendable to ~1M), and support for 201 languages. Optimized for speed and high-volume use cases.
Qwen 3.5 Flash
Powered by Qwen · Sparse Mixture-of-Experts
Context Window
128K
Parameters
Undisclosed (MoE)
Max Output
8K
Category
LLM Chat
Overview
Qwen 3.5 Flash is the fastest and most affordable model in the Qwen 3.5 family, designed for high-volume production workloads where cost and speed are paramount. At just $0.09/M input tokens, it is one of the cheapest capable models available — making it ideal for applications that process millions of requests per day.
The model features a 262K native context window (extendable to approximately 1M with techniques like YaRN), native multimodal support across text, image, video, and audio, and coverage of 201 languages. Despite its focus on speed and cost, it maintains surprisingly strong performance on standard benchmarks, making it suitable for a wide range of production tasks.
Qwen 3.5 Flash excels at high-volume translation, quick classification, lightweight conversational AI, and any cost-sensitive deployment where the full power of Qwen 3.5 Plus is unnecessary. Its 201-language support makes it particularly valuable for global applications that need to handle diverse linguistic inputs at scale.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹9.0000 |
| Output /1M tokens | ₹35.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Ultra-low cost: $0.09/1M input tokens
- Native multimodal support
- 201 language coverage
- Fast inference optimized for high volume
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 74.2% |
| HumanEval | 83.5% |
| MATH-500 | 82.1% |
| GPQA Diamond | 56.8% |
Technical Details
- Fastest model in the Qwen 3.5 family — optimized for speed
- 262K native context window (extendable to ~1M)
- Native multimodal: text, image, video, and audio input
- 201 language support — same linguistic coverage as Qwen 3.5 Plus
- Ultra-low pricing: $0.09/M input, $0.35/M output
- MoE architecture for efficient inference
- Available via Alibaba Cloud API and CallMissed unified gateway
Strengths
- Ultra-low cost at $0.09/M input — among the cheapest capable models
- 201 language support for global applications
- Native multimodal capabilities despite low price point
- Fast inference optimized for high-volume production workloads
Limitations
- Reduced reasoning depth compared to Qwen 3.5 Plus
- Not suitable for complex multi-step reasoning or deep analysis
- Less established ecosystem compared to OpenAI and Anthropic models
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "qwen/qwen3.5-flash", "messages": [{"role": "user", "content": "Translate this to Japanese"}]}'Endpoint: POST /v1/chat/completions · Model ID: qwen/qwen3.5-flash
Try Qwen 3.5 Flash now
Get 1000 free API credits on signup. No credit card required.