22 Indian languages natively trained
Hindi, Tamil, Telugu, Bengali, Marathi, Kannada, Malayalam, Gujarati, Punjabi, Odia, Assamese, Urdu + 10 more — plus English (Indian, US, UK accents). Not translated from English — trained on Indian speech.
The fastest, cheapest, most accurate speech-to-text API for 22 Indian languages. Streaming WebSocket under 400ms, OpenAI-compatible batch, $0.004 per minute.
OpenAI-compatible batch. Native WebSocket streaming. Pick your path.
Sign up at app.callmissed.com and copy your cm_ API key. 1000 free API credits attached.
POST a WAV, MP3, FLAC, OGG, or M4A file — or stream raw PCM over WebSocket for live audio.
JSON response with timestamps, speaker diarization, word-level confidence, and automatic language detection.
Pipe into an LLM for Q&A, summarize meetings, caption videos, or feed your ASR-dependent workflow.
If you've tried Google STT or Whisper on Hindi audio, you know what's wrong. This fixes it.
Hindi, Tamil, Telugu, Bengali, Marathi, Kannada, Malayalam, Gujarati, Punjabi, Odia, Assamese, Urdu + 10 more — plus English (Indian, US, UK accents). Not translated from English — trained on Indian speech.
WebSocket streaming (<400ms partial transcripts) for real-time apps, or HTTP batch for files. Same API shape, same model quality.
Customers switch between Hindi and English mid-sentence. Our models decode both, tag each word with its source language, and never drop accuracy on mixed speech.
Every word comes with start/end time and confidence score. Multi-speaker audio gets speaker_0/speaker_1 labels automatically.
Boost recognition of your brand names, product SKUs, medical terms, or regional place names by passing a custom vocab list at request time.
Audio stored in AWS ap-south-1 (Mumbai), auto-deleted after 30 days. Optional on-request PII redaction (phone numbers, Aadhaar, card numbers) at the word level.
Real deployments, real data rates, real cost savings.
Your call center records 10,000 calls a day. Human QA samples maybe 200 of those. Use the STT API to transcribe all 10,000 — then run compliance checks, keyword spotting, and sentiment analysis downstream.
Result
100% call QA at 1% of manual cost.
Teams run meetings in Hindi, Tamil, Marathi, or Hinglish. Stream the meeting audio into the STT API, then pipe the transcript to an LLM for summary, action items, and decision log.
Result
Meeting notes in 30 seconds, in the language spoken.
Streaming platforms and broadcasters need captions in 10+ Indian languages. The STT API generates timed captions with word-level timestamps, ready for SRT/WebVTT export.
Result
10x cheaper than human captioning services.
Students speak into the app, STT transcribes with confidence per-word, and your app compares expected vs actual phonemes to give pronunciation feedback — in 22 Indian languages.
Result
Built-in pronunciation coach without external vendors.
Enumerators in rural India speak survey responses directly into your mobile app (no typing). STT transcribes their speech in the local language, your app extracts structured fields, and the data syncs when connectivity returns.
Result
Data collection in the respondent's language, 3x faster.
Live captions for conferences, webinars, or government broadcasts. Stream audio into the WebSocket API, render partial transcripts in <400ms — so deaf and hard-of-hearing attendees can follow along in real time.
Result
WCAG-compliant live captions without a human captioner.
On Indian-language accuracy, we beat every major provider — at a quarter of the price.
| Feature | CallMissed | Google STT | AWS Transcribe | Whisper API | Deepgram |
|---|---|---|---|---|---|
22 Indian languages | |||||
Hinglish + code-switching | |||||
Streaming WebSocket <400ms | |||||
Word-level timestamps | |||||
Speaker diarization | |||||
Custom vocabulary | |||||
OpenAI-compatible API shape | |||||
India data residency | |||||
Price / minute | $0.004 | $0.016 | $0.024 | $0.006 | $0.0043 |
Comparison based on publicly listed features as of 2026. Check each vendor's site for the latest.
The CallMissed STT API follows the OpenAI audio transcription shape. If you've ever called openai.audio.transcriptions.create(), this is a drop-in.
from openai import OpenAI
client = OpenAI(
base_url="https://api.callmissed.com/v1",
api_key="cm_your_key",
)
with open("call_recording.mp3", "rb") as audio:
result = client.audio.transcriptions.create(
model="saaras:v3", # Sarvam — Indian + code-mixed.
# Also: "whisper-large-v3-turbo" (99 langs)
# Also: "nova-3" (Deepgram, diarize + smart-format)
file=audio,
language="hi-IN", # Sarvam BCP-47; "unknown" for auto-detect
response_format="verbose_json",
)
print(result.text)Python — file transcription with speaker diarization
import { CallMissed } from "callmissed";
const cm = new CallMissed({ apiKey: process.env.CM_KEY });
const stream = cm.audio.stt.stream({
model: "saarika-v2",
language: "hi",
sampleRate: 16000,
});
stream.on("partial", (t) => console.log("partial:", t.text));
stream.on("final", (t) => console.log("final:", t.text, "speaker:", t.speaker));
// pipe raw PCM from your mic / getUserMedia
micStream.on("data", (chunk) => stream.send(chunk));Node/JS — streaming transcription from a microphone
A speech-to-text (STT) API is a REST or WebSocket endpoint that converts audio (microphone, phone call, recorded file) into text transcripts. You send the audio bytes, it returns words with timestamps, confidence scores, and optional speaker labels. CallMissed's STT API is specifically trained on Indian languages and accents, making it the most accurate option for Hindi, Tamil, Telugu, Bengali, Marathi, and 17 more.
Sign up free, grab an API key, paste our snippet. 1000 API credits included — about 2,500 minutes of transcription.