How much does whisper cost?

whisper costs $0.4/hour on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use whisper via API?

Send a POST request to POST /v1/audio/transcriptions with model "whisper" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of whisper?

whisper supports a N/A token context window with up to N/A output tokens.

Back to all models

Speech to Textmultilingual

whisper

by OpenAI · Released 2024

OpenAI Whisper — 99 languages, transcribe + translate. Model id `whisper`.

Speech to Text

whisper

Context Window

N/A

Parameters

Whisper

Max Output

N/A

Overview

The `whisper` model id on CallMissed maps to OpenAI Whisper speech recognition — the same Whisper family OpenAI popularized for robust multilingual ASR, exposed here under the deployment name `whisper` rather than OpenAI's cloud id `whisper-1`. Use it on `/v1/audio/transcriptions` with `model=whisper` (multipart form upload) identical to OpenAI's audio transcription API shape.

Whisper is a general-purpose speech recognition model supporting dozens of languages with automatic language detection, transcription to the source language, and optional translation to English via the translations endpoint on OpenAI's platform. File uploads are supported up to 25 MB per request, in formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm. On CallMissed, pricing is $0.40 per audio hour — higher than `whisper-large-v3-turbo` but with enterprise-grade routing and consistent OpenAI weights.

Pick `whisper` when you need batch transcription of meetings, podcasts, or call recordings where streaming partials are unnecessary, especially if you already standardize on OpenAI-compatible audio endpoints. It complements rather than replaces `whisper-large-v3-turbo` on our platform: `whisper-large-v3-turbo` is cheaper for bulk multilingual work, while `whisper` fits teams standardized on OpenAI-compatible compliance postures.

Integration: POST audio as `file`, set `model=whisper`, optionally pass `language` ISO code to skip autodetect, and choose `response_format` (`json`, `text`, `srt`, `vtt` where supported). Whisper supports verbose JSON with segment timestamps on OpenAI — useful for subtitle pipelines. Translation mode (audio to English text) is Whisper-family specific; the newer `gpt-4o-transcribe` models do not mirror every Whisper feature — consult docs before migrating translation workflows.

Limitations: batch-only on CallMissed voice surfaces (live voice agents prefer streaming STT like `saaras:v3`, `nova-3`, or `gpt-4o-transcribe`), no speaker diarization on the Whisper path (use `gpt-4o-transcribe-diarize` instead), and file size caps per request. Long recordings must be split client-side. For lowest cost on English-heavy archives, benchmark against `whisper-large-v3-turbo`. For lowest latency partials, use `gpt-4o-mini-transcribe`.

Batch pipeline design: split hour-long recordings into 10–15 minute chunks under 25 MB, transcribe in parallel workers, stitch timestamps by offset. Whisper returns segment timestamps in verbose JSON — use them to rebuild a unified timeline.

Language strategies: omit `language` for autodetect on multilingual call centers; set `language=en` when you know the channel to reduce errors. Whisper supports many languages but quality varies — benchmark Hindi vs `saaras:v3` on CallMissed for Indic telephony.

Translation path: Whisper-family translation to English differs from gpt-4o-transcribe capabilities — if you rely on `/audio/translations`, confirm Whisper still meets needs before migrating STT models.

Subtitle generation: map segments to SRT with start/end from verbose JSON; watch punctuation differences vs human captioners.

Whisper variants: CallMissed also offers `whisper-large-v3-turbo` at lower $/hour — choose `whisper` when your compliance path requires consistent OpenAI weights; choose turbo for cost-at-scale archives.

Error handling: corrupt audio files return 400 — validate containers upstream. Clipping and silence confuse VAD-less batch ASR less than streaming STT but still hurt WER.

Post-processing: apply spell-check domain dictionaries (medical, SKU names) after ASR — models rarely know your internal codenames.

Historical context: OpenAI open-sourced Whisper in 2022, revolutionizing offline ASR quality; CallMissed hosts the production `whisper` deployment for API access without running GPUs yourself. The model remains a baseline batch ASR choice years later despite streaming successors.

curl walkthrough: `curl -X POST https://api.callmissed.com/v1/audio/transcriptions -H "Authorization: Bearer cm_KEY" -F model=whisper -F file=@call.mp3` returns JSON text. Add `-F response_format=verbose_json` when you need segment timestamps for editing tools.

Media prep: normalize loudness (EBU R128) before upload; whisper struggles on clipped audio. Convert odd codecs with ffmpeg to 16 kHz mono WAV for problematic files.

Industry verticals: podcast networks batch overnight; legal firms transcribe depositions; healthcare must HIPAA-wrap storage even if ASR is accurate — CallMissed transit encryption does not make your bucket compliant by itself.

Competitive matrix: vs `gpt-4o-transcribe` — Whisper lacks streaming partials on some integrations but supports translation endpoint; vs `saaras:v3` — Sarvam wins many Indic telephony sets; vs Nova — Nova wins live English call centers with diarization options.

Upgrade triggers: move off Whisper when you need streaming captions, speaker labels, or lowest WER on GPT-4o-class audio models — stay on Whisper when you need cheapest batch transcription with translation.

Batch sizing: parallelize with worker count equal to min(API rate limit headroom, CPU cores) — Whisper is network-bound on upload more than compute-bound on client side.

Pricing

Metric	Price
Price /hour	₹40.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

99 languages
Transcribe + translate
Batch + VAD

Benchmarks

Benchmark	Score	Notes
Languages	99	Auto-detect

Technical Details

Model id: whisper
POST /v1/audio/transcriptions

Strengths

Broad language coverage

Limitations

Batch only on this surface

Use Cases

TranscriptionTranslationSubtitles

API Example

curl https://api.callmissed.com/v1/audio/transcriptions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -F file=@audio.mp3 -F model=whisper

Endpoint: POST /v1/audio/transcriptions · Model ID: whisper

Try whisper now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs