Gemini 3 Flash Preview
by Google · Released 2026
Google's fast and affordable model. Lightning-fast inference with a 1M token context window. Optimized for speed and high-volume developer use cases while maintaining strong reasoning capabilities.
Gemini 3 Flash Preview
Powered by Google · Transformer (proprietary)
Context Window
1M
Parameters
Undisclosed
Max Output
16K
Category
LLM Chat
Overview
Gemini 3 Flash Preview is Google's speed-optimized model, designed for developers who need fast inference at scale without sacrificing the 1M token context window. It delivers lightning-fast responses while maintaining strong reasoning capabilities across text, image, and other modalities.
The model is built for high-volume production workloads where latency and cost are primary concerns. At $0.70/M input and $4.00/M output, it offers an excellent balance of capability and affordability — significantly cheaper than Gemini 3.1 Pro while retaining multimodal input support and the full 1M context window. This makes it ideal for real-time chat applications, quick summarization, and cost-sensitive deployments.
As a Flash-tier model, it trades some depth of reasoning for raw speed. It handles straightforward tasks like summarization, classification, translation, and conversational AI with high quality, while more complex multi-step reasoning tasks may benefit from the Pro variant.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹50.0000 |
| Output /1M tokens | ₹300.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Lightning-fast inference speed
- 1M token context window
- Highly capable for its price point
- Multimodal input support
- Routed direct to Google AI Studio — pass-through pricing
Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 78.5% |
| HumanEval | 86.2% |
| MATH-500 | 85.4% |
| GPQA Diamond | 62.1% |
Technical Details
- Context window: 1,000,000 tokens — same as Gemini 3.1 Pro
- Optimized for lightning-fast inference and low latency
- Native multimodal: text, image, and other input types
- Supports function calling, structured outputs, and search grounding
- Pass-through Google AI Studio pricing — $0.50/$3 per 1M tokens
- Routed direct to Google AI Studio (no OpenRouter hop, no markup)
Strengths
- Lightning-fast inference — among the fastest frontier-tier models
- 1M context window at a fraction of Pro pricing
- Multimodal input support for text, images, and more
- Excellent cost-to-capability ratio for production workloads
Limitations
- Reduced reasoning depth compared to Gemini 3.1 Pro
- Preview model — may change before general availability
- Less capable on complex multi-step agentic tasks
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "google/gemini-3-flash-preview", "messages": [{"role": "user", "content": "Quickly summarize these meeting notes"}]}'Endpoint: POST /v1/chat/completions · Model ID: google/gemini-3-flash-preview
Try Gemini 3 Flash Preview now
Get 1000 free API credits on signup. No credit card required.