Speech to Textproductionstreaming

Nova 3

by Deepgram · Released 2025

Deepgram Nova 3 — production-grade STT with diarization, smart formatting, and domain modes (general / medical / finance). 11 languages with optional auto-detect.

Speech to Text

Nova 3

Powered by Deepgram · Proprietary E2E ASR with diarization

Context Window

N/A

Parameters

Undisclosed

Max Output

N/A

Category

Speech to Text

Overview

Nova 3 is Deepgram's newest production STT model, offering high-accuracy transcription with rich post-processing. It supports speaker diarization (numbering each speaker), automatic punctuation, smart formatting (numbers, dates, currencies, addresses), profanity filtering, and topic / sentiment extraction. Specialized domain modes ("medical" or "finance") tune the acoustic and language models for industry-specific vocabulary.

Deployed via Cloudflare Workers AI, Nova 3 supports both batch (REST) and real-time (WebSocket) modes — this surface uses batch. For streaming voice agents, the WebSocket mode delivers interim results, VAD events, and end-of-utterance signals at sub-300ms latency.

At $0.50 per audio hour batch / $0.92 per hour streaming, it sits between Whisper (cheaper, English-leaning) and Saaras (Indian-languages-only). Pick Nova 3 when you need diarization, smart formatting, or industry-tuned recognition.

Pricing

MetricPrice
Price /hour₹50.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Speaker diarization out of the box
  • Smart formatting: numbers, dates, currencies
  • Domain modes: general / medical / finance
  • Batch + real-time WebSocket support

Benchmarks

BenchmarkScore
Languages11
Streaming latency<300ms
Domains3

Technical Details

  • Runs on Cloudflare Workers AI (`@cf/deepgram/nova-3`)
  • Batch: POST audio/mpeg → JSON transcript
  • Streaming: WebSocket with interim_results, vad_events, utterance_end_ms
  • Optional: diarize, punctuate, smart_format, profanity_filter, sentiment, topics
  • Domain modes: mode=general | medical | finance

Strengths

  • Built-in diarization — no separate speaker model needed
  • Smart formatting saves post-processing pipelines
  • Industry-tuned domain modes
  • Real-time WebSocket for sub-second voice agents

Limitations

  • Only 11 languages vs Whisper's 99
  • Streaming mode requires WebSocket plumbing (not exposed via REST)
  • Higher per-hour cost than Whisper for general use

Use Cases

Call center analyticsMedical scribingFinancial compliance recordingVoice agents

API Example

curl https://api.callmissed.com/v1/audio/transcriptions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -F file=@call.mp3 \
  -F model=nova-3 \
  -F language=en-US

Endpoint: POST /v1/audio/transcriptions · Model ID: nova-3

Try Nova 3 now

Get 1000 free API credits on signup. No credit card required.