LLM चैटreasoningmultimodaltoolspass-through pricing

Gemini 3.5 Flash

द्वारा Google · रिलीज़ 2026

Google's latest fast frontier model. A 1M-token-context multimodal model (text, image, video, audio) with native reasoning, tool calling, and prompt caching — the speed-tier upgrade to the Gemini 3 Flash line.

LLM चैट

Gemini 3.5 Flash

द्वारा संचालित Google · Transformer (proprietary)

कॉन्टेक्स्ट विंडो

1M

पैरामीटर

Undisclosed

अधिकतम आउटपुट

65K

श्रेणी

LLM चैट

अवलोकन

Gemini 3.5 Flash is Google's newest speed-optimized frontier model, succeeding Gemini 3 Flash. It keeps the full 1M-token input context window (65K output) and adds stronger reasoning ("thinking" is on by default), native multimodal input across text, image, video, and audio, reliable function calling, and prompt caching for repeated context.

It targets high-volume production workloads that need frontier-class quality at flash-tier latency: real-time chat, document and meeting summarization, multimodal understanding, agentic tool loops, and retrieval-augmented generation over very long contexts. The thinking budget lets you trade latency for depth on harder prompts while staying fast on simple ones.

On CallMissed it is fully OpenAI-compatible on `/v1/chat/completions` with streaming, tools, and caching. Pricing is pass-through from Google's published Global-tier rate — $1.50 per 1M input tokens and $9.00 per 1M output tokens, with cached input at $0.15 — so you pay the maker's rate with no markup.

प्राइसिंग

मेट्रिककीमत
इनपुट /1M tokens₹150.0000
आउटपुट /1M tokens₹900.0000

1 क्रेडिट = ₹1 = $0.01 USD। कीमतें प्रोवाइडर से दिखाई गई हैं; CallMissed ~35% मार्कअप के साथ पास-थ्रू करता है।

मुख्य बातें

  • Latest Gemini Flash — frontier quality at flash latency
  • 1M token context window (65K output)
  • नेटिव मल्टीमोडल: टेक्स्ट, इमेज, वीडियो, ऑडियो
  • Reasoning on by default + prompt caching
  • Routed direct — pass-through Google pricing

बेंचमार्क

बेंचमार्कस्कोर
Context1M
ReasoningYes
MultimodalYes

तकनीकी विवरण

  • Context window: 1,048,576 input / 65,536 output tokens
  • Native multimodal input: text, image, video, audio
  • Function calling, structured outputs, and prompt caching
  • Reasoning ("thinking") enabled by default
  • Pass-through pricing — $1.50 / $9.00 per 1M tokens (Global)
  • Routed direct (no markup)

ताकतें

  • Frontier quality with flash-tier speed and 1M context
  • Multimodal across text, image, video, and audio
  • Reasoning + caching for complex, repeated-context workloads
  • Pass-through pricing — pay Google's rate, no markup

सीमाएं

  • Pricier than Gemini 3 Flash Preview / 3.1 Flash Lite
  • Reasoning mode raises time-to-first-token on hard prompts
  • Flash tier — the Pro line still leads on the hardest reasoning

उपयोग के मामले

रीयल-टाइम चैटLong-context analysisमल्टीमोडल समझAgentic tool loops

API उदाहरण

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "google/gemini-3.5-flash", "messages": [{"role": "user", "content": "Summarize this 200-page report"}]}'

एंडपॉइंट: POST /v1/chat/completions · मॉडल ID: google/gemini-3.5-flash

Gemini 3.5 Flash अभी आज़माएं

साइनअप पर 1000 फ्री API क्रेडिट पाएं। कोई क्रेडिट कार्ड ज़रूरी नहीं।