LLM Chatfastaffordablepass-through pricing

Gemini 3 Flash Preview

by Google · Released 2026

Google's fast and affordable model. Lightning-fast inference with a 1M token context window. Optimized for speed and high-volume developer use cases while maintaining strong reasoning capabilities.

LLM Chat

Gemini 3 Flash Preview

Powered by Google · Transformer (proprietary)

Context Window

1M

Parameters

Undisclosed

Max Output

16K

Category

LLM Chat

Overview

Gemini 3 Flash Preview is Google's speed-optimized model, designed for developers who need fast inference at scale without sacrificing the 1M token context window. It delivers lightning-fast responses while maintaining strong reasoning capabilities across text, image, and other modalities.

The model is built for high-volume production workloads where latency and cost are primary concerns. At $0.70/M input and $4.00/M output, it offers an excellent balance of capability and affordability — significantly cheaper than Gemini 3.1 Pro while retaining multimodal input support and the full 1M context window. This makes it ideal for real-time chat applications, quick summarization, and cost-sensitive deployments.

As a Flash-tier model, it trades some depth of reasoning for raw speed. It handles straightforward tasks like summarization, classification, translation, and conversational AI with high quality, while more complex multi-step reasoning tasks may benefit from the Pro variant.

Pricing

MetricPrice
Input /1M tokens₹50.0000
Output /1M tokens₹300.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Lightning-fast inference speed
  • 1M token context window
  • Highly capable for its price point
  • Multimodal input support
  • Routed direct to Google AI Studio — pass-through pricing

Benchmarks

BenchmarkScore
MMLU-Pro78.5%
HumanEval86.2%
MATH-50085.4%
GPQA Diamond62.1%

Technical Details

  • Context window: 1,000,000 tokens — same as Gemini 3.1 Pro
  • Optimized for lightning-fast inference and low latency
  • Native multimodal: text, image, and other input types
  • Supports function calling, structured outputs, and search grounding
  • Pass-through Google AI Studio pricing — $0.50/$3 per 1M tokens
  • Routed direct to Google AI Studio (no OpenRouter hop, no markup)

Strengths

  • Lightning-fast inference — among the fastest frontier-tier models
  • 1M context window at a fraction of Pro pricing
  • Multimodal input support for text, images, and more
  • Excellent cost-to-capability ratio for production workloads

Limitations

  • Reduced reasoning depth compared to Gemini 3.1 Pro
  • Preview model — may change before general availability
  • Less capable on complex multi-step agentic tasks

Use Cases

Real-time chatQuick summarizationHigh-volume processingCost-sensitive applications

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "google/gemini-3-flash-preview", "messages": [{"role": "user", "content": "Quickly summarize these meeting notes"}]}'

Endpoint: POST /v1/chat/completions · Model ID: google/gemini-3-flash-preview

Try Gemini 3 Flash Preview now

Get 1000 free API credits on signup. No credit card required.