Whisper Large v3 Turbo
by OpenAI · Released 2024
OpenAI Whisper Large v3 Turbo — 99-language ASR with auto-detect. Supports both transcription and translation modes. Best accuracy/cost ratio for global multilingual speech.
Whisper Large v3 Turbo
Powered by OpenAI · Encoder-decoder Transformer (distilled decoder)
Context Window
N/A
Parameters
809M
Max Output
N/A
Category
Speech to Text
Overview
Whisper Large v3 Turbo is OpenAI's open-weight ASR model optimized for fast inference while retaining the multilingual breadth of the Large v3 family. It supports 99 languages out of the box, automatically detecting the spoken language when none is specified, and can either transcribe (output text in the source language) or translate (output English regardless of input). It is the most cost-efficient way to add global multilingual speech recognition to a product.
Deployed via Cloudflare Workers AI, it accepts base64-encoded audio in standard formats (MP3, WAV, FLAC) and returns structured JSON with the transcription text plus optional VTT-formatted segments for subtitle workflows. The Turbo variant uses a smaller decoder than Large v3, achieving ~8× faster inference with only minor accuracy loss on most languages.
At $0.06 per audio hour, it is roughly 9× cheaper than Sarvam Saaras for use cases that don't need Indian-language code-mixing. Pair it with one of our LLMs for end-to-end speech-to-insight workflows: meeting summarization, podcast indexing, accessibility captions, or compliance archiving.
Pricing
| Metric | Price |
|---|---|
| Price /hour | ₹6.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- 99 languages with automatic language detection
- Transcribe + translate modes
- VTT subtitle output for downstream tooling
- 8× faster inference than Whisper Large v3
Benchmarks
| Benchmark | Score |
|---|---|
| Languages | 99 |
| Speed | 8× |
| Hourly cost | $0.06 |
Technical Details
- Runs on Cloudflare Workers AI (`@cf/openai/whisper-large-v3-turbo`)
- Accepts base64 MP3/WAV/FLAC; max ~30 min per request
- Returns transcription_info.text + segments[].vtt
- task=transcribe (default) or task=translate
- Optional: vad_filter, initial_prompt, beam_size, hallucination_silence_threshold
Strengths
- Best multilingual coverage (99 languages)
- Auto language detection — no need for ISO tags
- Built-in translation to English
- ~9× cheaper than Sarvam Saaras for non-Indian languages
Limitations
- Less accurate than Saaras on Indian languages and code-mixed speech
- Batch only on this surface — for streaming use Nova-3 or Flux
- Hallucinations on long silences without vad_filter
Use Cases
API Example
curl https://api.callmissed.com/v1/audio/transcriptions \ -H "Authorization: Bearer cm_YOUR_KEY" \ -F file=@audio.mp3 \ -F model=whisper-large-v3-turbo \ -F language=en
Endpoint: POST /v1/audio/transcriptions · Model ID: whisper-large-v3-turbo
Try Whisper Large v3 Turbo now
Get 1000 free API credits on signup. No credit card required.