LLM Chatreasoningmultimodaltoolspass-through pricing

Gemini 3.5 Flash

by Google · Released 2026

Google's latest fast frontier model. A 1M-token-context multimodal model (text, image, video, audio) with native reasoning, tool calling, and prompt caching — the speed-tier upgrade to the Gemini 3 Flash line.

LLM Chat

Gemini 3.5 Flash

Powered by Google · Transformer (proprietary)

Context Window

1M

Parameters

Undisclosed

Max Output

65K

Category

LLM Chat

Overview

Gemini 3.5 Flash is Google's newest speed-optimized frontier model, succeeding Gemini 3 Flash. It keeps the full 1M-token input context window (65K output) and adds stronger reasoning ("thinking" is on by default), native multimodal input across text, image, video, and audio, reliable function calling, and prompt caching for repeated context.

It targets high-volume production workloads that need frontier-class quality at flash-tier latency: real-time chat, document and meeting summarization, multimodal understanding, agentic tool loops, and retrieval-augmented generation over very long contexts. The thinking budget lets you trade latency for depth on harder prompts while staying fast on simple ones.

On CallMissed it is fully OpenAI-compatible on `/v1/chat/completions` with streaming, tools, and caching. Pricing is pass-through from Google's published Global-tier rate — $1.50 per 1M input tokens and $9.00 per 1M output tokens, with cached input at $0.15 — so you pay the maker's rate with no markup.

Pricing

MetricPrice
Input /1M tokens₹150.0000
Output /1M tokens₹900.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Latest Gemini Flash — frontier quality at flash latency
  • 1M token context window (65K output)
  • Native multimodal: text, image, video, audio
  • Reasoning on by default + prompt caching
  • Routed direct — pass-through Google pricing

Benchmarks

BenchmarkScore
Context1M
ReasoningYes
MultimodalYes

Technical Details

  • Context window: 1,048,576 input / 65,536 output tokens
  • Native multimodal input: text, image, video, audio
  • Function calling, structured outputs, and prompt caching
  • Reasoning ("thinking") enabled by default
  • Pass-through pricing — $1.50 / $9.00 per 1M tokens (Global)
  • Routed direct (no markup)

Strengths

  • Frontier quality with flash-tier speed and 1M context
  • Multimodal across text, image, video, and audio
  • Reasoning + caching for complex, repeated-context workloads
  • Pass-through pricing — pay Google's rate, no markup

Limitations

  • Pricier than Gemini 3 Flash Preview / 3.1 Flash Lite
  • Reasoning mode raises time-to-first-token on hard prompts
  • Flash tier — the Pro line still leads on the hardest reasoning

Use Cases

Real-time chatLong-context analysisMultimodal understandingAgentic tool loops

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "google/gemini-3.5-flash", "messages": [{"role": "user", "content": "Summarize this 200-page report"}]}'

Endpoint: POST /v1/chat/completions · Model ID: google/gemini-3.5-flash

Try Gemini 3.5 Flash now

Get 1000 free API credits on signup. No credit card required.