Nova 3
by Deepgram · Released 2025
Deepgram Nova 3 — production-grade STT with diarization, smart formatting, and domain modes (general / medical / finance). 11 languages with optional auto-detect.
Nova 3
Powered by Deepgram · Proprietary E2E ASR with diarization
Context Window
N/A
Parameters
Undisclosed
Max Output
N/A
Category
Speech to Text
Overview
Nova 3 is Deepgram's newest production STT model, offering high-accuracy transcription with rich post-processing. It supports speaker diarization (numbering each speaker), automatic punctuation, smart formatting (numbers, dates, currencies, addresses), profanity filtering, and topic / sentiment extraction. Specialized domain modes ("medical" or "finance") tune the acoustic and language models for industry-specific vocabulary.
Deployed via Cloudflare Workers AI, Nova 3 supports both batch (REST) and real-time (WebSocket) modes — this surface uses batch. For streaming voice agents, the WebSocket mode delivers interim results, VAD events, and end-of-utterance signals at sub-300ms latency.
At $0.50 per audio hour batch / $0.92 per hour streaming, it sits between Whisper (cheaper, English-leaning) and Saaras (Indian-languages-only). Pick Nova 3 when you need diarization, smart formatting, or industry-tuned recognition.
Pricing
| Metric | Price |
|---|---|
| Price /hour | ₹50.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Speaker diarization out of the box
- Smart formatting: numbers, dates, currencies
- Domain modes: general / medical / finance
- Batch + real-time WebSocket support
Benchmarks
| Benchmark | Score |
|---|---|
| Languages | 11 |
| Streaming latency | <300ms |
| Domains | 3 |
Technical Details
- Runs on Cloudflare Workers AI (`@cf/deepgram/nova-3`)
- Batch: POST audio/mpeg → JSON transcript
- Streaming: WebSocket with interim_results, vad_events, utterance_end_ms
- Optional: diarize, punctuate, smart_format, profanity_filter, sentiment, topics
- Domain modes: mode=general | medical | finance
Strengths
- Built-in diarization — no separate speaker model needed
- Smart formatting saves post-processing pipelines
- Industry-tuned domain modes
- Real-time WebSocket for sub-second voice agents
Limitations
- Only 11 languages vs Whisper's 99
- Streaming mode requires WebSocket plumbing (not exposed via REST)
- Higher per-hour cost than Whisper for general use
Use Cases
API Example
curl https://api.callmissed.com/v1/audio/transcriptions \ -H "Authorization: Bearer cm_YOUR_KEY" \ -F file=@call.mp3 \ -F model=nova-3 \ -F language=en-US
Endpoint: POST /v1/audio/transcriptions · Model ID: nova-3