Speech-to-Text API

Speech-to-text API built for Indian languages

The fastest, cheapest, most accurate speech-to-text API for 22 Indian languages. Streaming WebSocket under 400ms, OpenAI-compatible batch, $0.004 per minute.

  • 22 Indian languages trained natively (not translated from English)
  • Hinglish + Tanglish code-switching handled
  • Streaming (<400ms) + batch (file upload)
  • $0.004/min — 75% cheaper than Google STT
Sound wave audio visualization for speech recognition
22Indian languages
<400msStreaming latency
95%+WER accuracy
$0.004per minute
How it works

Transcribing audio in 4 lines of code

OpenAI-compatible batch. Native WebSocket streaming. Pick your path.

01

Grab an API key

Sign up at app.callmissed.com and copy your cm_ API key. 1000 free API credits attached.

02

Send audio

POST a WAV, MP3, FLAC, OGG, or M4A file — or stream raw PCM over WebSocket for live audio.

03

Get transcript

JSON response with timestamps, speaker diarization, word-level confidence, and automatic language detection.

04

Use downstream

Pipe into an LLM for Q&A, summarize meetings, caption videos, or feed your ASR-dependent workflow.

Features

What makes CallMissed STT different

If you've tried Google STT or Whisper on Hindi audio, you know what's wrong. This fixes it.

22 Indian languages natively trained

Hindi, Tamil, Telugu, Bengali, Marathi, Kannada, Malayalam, Gujarati, Punjabi, Odia, Assamese, Urdu + 10 more — plus English (Indian, US, UK accents). Not translated from English — trained on Indian speech.

Streaming + batch modes

WebSocket streaming (<400ms partial transcripts) for real-time apps, or HTTP batch for files. Same API shape, same model quality.

Hinglish + code-switching

Customers switch between Hindi and English mid-sentence. Our models decode both, tag each word with its source language, and never drop accuracy on mixed speech.

Word-level timestamps + diarization

Every word comes with start/end time and confidence score. Multi-speaker audio gets speaker_0/speaker_1 labels automatically.

Custom vocabulary

Boost recognition of your brand names, product SKUs, medical terms, or regional place names by passing a custom vocab list at request time.

Data privacy + PII redaction

Audio stored in AWS ap-south-1 (Mumbai), auto-deleted after 30 days. Optional on-request PII redaction (phone numbers, Aadhaar, card numbers) at the word level.

Use Cases

Speech-to-text use cases

Real deployments, real data rates, real cost savings.

Call center quality monitoring
Call center QA

Transcribe 100% of calls for compliance + coaching

Your call center records 10,000 calls a day. Human QA samples maybe 200 of those. Use the STT API to transcribe all 10,000 — then run compliance checks, keyword spotting, and sentiment analysis downstream.

Result

100% call QA at 1% of manual cost.

Team meeting in conference room
Meeting transcription

Indian-language meeting notes + action items

Teams run meetings in Hindi, Tamil, Marathi, or Hinglish. Stream the meeting audio into the STT API, then pipe the transcript to an LLM for summary, action items, and decision log.

Result

Meeting notes in 30 seconds, in the language spoken.

Media production
Media & OTT

Indian-language captions and subtitles

Streaming platforms and broadcasters need captions in 10+ Indian languages. The STT API generates timed captions with word-level timestamps, ready for SRT/WebVTT export.

Result

10x cheaper than human captioning services.

Student with laptop learning
EdTech

Pronunciation scoring for language learners

Students speak into the app, STT transcribes with confidence per-word, and your app compares expected vs actual phonemes to give pronunciation feedback — in 22 Indian languages.

Result

Built-in pronunciation coach without external vendors.

Rural field data collection
Field / rural applications

Voice-first data collection

Enumerators in rural India speak survey responses directly into your mobile app (no typing). STT transcribes their speech in the local language, your app extracts structured fields, and the data syncs when connectivity returns.

Result

Data collection in the respondent's language, 3x faster.

Conference with live captions
Accessibility

Real-time captions for live events

Live captions for conferences, webinars, or government broadcasts. Stream audio into the WebSocket API, render partial transcripts in <400ms — so deaf and hard-of-hearing attendees can follow along in real time.

Result

WCAG-compliant live captions without a human captioner.

Compare

CallMissed STT vs Google, AWS, Whisper, Deepgram

On Indian-language accuracy, we beat every major provider — at a quarter of the price.

FeatureCallMissedGoogle STTAWS TranscribeWhisper APIDeepgram
22 Indian languages
Hinglish + code-switching
Streaming WebSocket <400ms
Word-level timestamps
Speaker diarization
Custom vocabulary
OpenAI-compatible API shape
India data residency
Price / minute
$0.004$0.016$0.024$0.006$0.0043

Comparison based on publicly listed features as of 2026. Check each vendor's site for the latest.

Code

Python, Node, curl — start transcribing in 30 seconds

The CallMissed STT API follows the OpenAI audio transcription shape. If you've ever called openai.audio.transcriptions.create(), this is a drop-in.

python
from openai import OpenAI client = OpenAI( base_url="https://api.callmissed.com/v1", api_key="cm_your_key", ) with open("call_recording.mp3", "rb") as audio: result = client.audio.transcriptions.create( model="saaras:v3", # Sarvam — Indian + code-mixed. # Also: "whisper-large-v3-turbo" (99 langs) # Also: "nova-3" (Deepgram, diarize + smart-format) file=audio, language="hi-IN", # Sarvam BCP-47; "unknown" for auto-detect response_format="verbose_json", ) print(result.text)

Python — file transcription with speaker diarization

javascript
import { CallMissed } from "callmissed"; const cm = new CallMissed({ apiKey: process.env.CM_KEY }); const stream = cm.audio.stt.stream({ model: "saarika-v2", language: "hi", sampleRate: 16000, }); stream.on("partial", (t) => console.log("partial:", t.text)); stream.on("final", (t) => console.log("final:", t.text, "speaker:", t.speaker)); // pipe raw PCM from your mic / getUserMedia micStream.on("data", (chunk) => stream.send(chunk));

Node/JS — streaming transcription from a microphone

FAQ

Speech-to-text API questions, answered

A speech-to-text (STT) API is a REST or WebSocket endpoint that converts audio (microphone, phone call, recorded file) into text transcripts. You send the audio bytes, it returns words with timestamps, confidence scores, and optional speaker labels. CallMissed's STT API is specifically trained on Indian languages and accents, making it the most accurate option for Hindi, Tamil, Telugu, Bengali, Marathi, and 17 more.

Start transcribing in Hindi in 30 seconds

Sign up free, grab an API key, paste our snippet. 1000 API credits included — about 2,500 minutes of transcription.