Speech to Textmultilingual

whisper

by OpenAI · Released 2024

OpenAI Whisper on Azure — 99 languages, transcribe + translate. Model id `whisper`.

Speech to Text

whisper

Powered by OpenAI · Encoder-decoder

Context Window

N/A

Parameters

Whisper

Max Output

N/A

Category

Speech to Text

Overview

The `whisper` model id on CallMissed maps to OpenAI Whisper speech recognition hosted on Azure OpenAI — the same Whisper family OpenAI popularized for robust multilingual ASR, exposed here under the deployment name `whisper` rather than OpenAI's cloud id `whisper-1`. Use it on `/v1/audio/transcriptions` with `model=whisper` (multipart form upload) identical to OpenAI's audio transcription API shape.

Whisper is a general-purpose speech recognition model supporting dozens of languages with automatic language detection, transcription to the source language, and optional translation to English via the translations endpoint on OpenAI's platform. Azure documents file uploads up to 25 MB per request and formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm (learn.microsoft.com/azure/foundry/openai/whisper-quickstart). On CallMissed, pricing is $0.40 per audio hour — higher than Cloudflare's `whisper-large-v3-turbo` but with Azure enterprise routing and consistent OpenAI weights.

Pick `whisper` when you need batch transcription of meetings, podcasts, or call recordings where streaming partials are unnecessary, especially if you already standardize on OpenAI-compatible audio endpoints. It complements rather than replaces `whisper-large-v3-turbo` on our platform: the Cloudflare model is cheaper for bulk multilingual work, while Azure Whisper fits teams standardized on OpenAI/Azure compliance postures.

Integration: POST audio as `file`, set `model=whisper`, optionally pass `language` ISO code to skip autodetect, and choose `response_format` (`json`, `text`, `srt`, `vtt` where supported). Whisper supports verbose JSON with segment timestamps on OpenAI — useful for subtitle pipelines. Translation mode (audio to English text) is Whisper-family specific; the newer `gpt-4o-transcribe` models do not mirror every Whisper feature — consult docs before migrating translation workflows.

Limitations: batch-only on CallMissed voice surfaces (live voice agents prefer streaming STT like `saaras:v3`, `nova-3`, or `gpt-4o-transcribe`), no speaker diarization on the Whisper path (use `gpt-4o-transcribe-diarize` instead), and file size caps per request. Long recordings must be split client-side. For lowest cost on English-heavy archives, benchmark against `whisper-large-v3-turbo`. For lowest latency partials, use `gpt-4o-mini-transcribe`.

Batch pipeline design: split hour-long recordings into 10–15 minute chunks under 25 MB, transcribe in parallel workers, stitch timestamps by offset. Whisper returns segment timestamps in verbose JSON — use them to rebuild a unified timeline.

Language strategies: omit `language` for autodetect on multilingual call centers; set `language=en` when you know the channel to reduce errors. Whisper supports many languages but quality varies — benchmark Hindi vs `saaras:v3` on CallMissed for Indic telephony.

Translation path: Whisper-family translation to English differs from gpt-4o-transcribe capabilities — if you rely on `/audio/translations`, confirm Whisper still meets needs before migrating STT models.

Subtitle generation: map segments to SRT with start/end from verbose JSON; watch punctuation differences vs human captioners.

Azure vs Cloudflare Whisper: CallMissed also offers `whisper-large-v3-turbo` at lower $/hour — choose Azure `whisper` when compliance path requires Azure OpenAI; choose turbo for cost-at-scale archives.

Error handling: corrupt audio files return 400 — validate containers upstream. Clipping and silence confuse VAD-less batch ASR less than streaming STT but still hurt WER.

Post-processing: apply spell-check domain dictionaries (medical, SKU names) after ASR — models rarely know your internal codenames.

Historical context: OpenAI open-sourced Whisper in 2022, revolutionizing offline ASR quality; Azure OpenAI and CallMissed host the production `whisper` deployment for API access without running GPUs yourself. The model remains a baseline batch ASR choice years later despite streaming successors.

curl walkthrough: `curl -X POST https://api.callmissed.com/v1/audio/transcriptions -H "Authorization: Bearer cm_KEY" -F model=whisper -F file=@call.mp3` returns JSON text. Add `-F response_format=verbose_json` when you need segment timestamps for editing tools.

Media prep: normalize loudness (EBU R128) before upload; whisper struggles on clipped audio. Convert odd codecs with ffmpeg to 16 kHz mono WAV for problematic files.

Industry verticals: podcast networks batch overnight; legal firms transcribe depositions; healthcare must HIPAA-wrap storage even if ASR is accurate — CallMissed transit encryption does not make your bucket compliant by itself.

Competitive matrix: vs `gpt-4o-transcribe` — Whisper lacks streaming partials on some integrations but supports translation endpoint; vs `saaras:v3` — Sarvam wins many Indic telephony sets; vs Deepgram Nova — Nova wins live English call centers with diarization options.

Upgrade triggers: move off Whisper when you need streaming captions, speaker labels, or lowest WER on GPT-4o-class audio models — stay on Whisper when you need cheapest Azure-path batch with translation.

Batch sizing: parallelize with worker count equal to min(API rate limit headroom, CPU cores) — Whisper is network-bound on upload more than compute-bound on client side.

Pricing

MetricPrice
Price /hour₹40.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • 99 languages
  • Transcribe + translate
  • Batch + VAD

Benchmarks

BenchmarkScore
Languages99

Technical Details

  • Model id: whisper
  • POST /v1/audio/transcriptions

Strengths

  • Broad language coverage

Limitations

  • Batch only on this surface

Use Cases

TranscriptionTranslationSubtitles

API Example

curl https://api.callmissed.com/v1/audio/transcriptions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -F file=@audio.mp3 -F model=whisper

Endpoint: POST /v1/audio/transcriptions · Model ID: whisper

Try whisper now

Get 1000 free API credits on signup. No credit card required.