What is gpt-realtime?

OpenAI gpt-realtime — speech-to-speech model (voice agent WebSocket only, not chat completions).

How much does gpt-realtime cost?

gpt-realtime costs $4/1M tokens for input and $16/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use gpt-realtime via API?

Send a POST request to WebSocket /v1/voice/sessions with model "gpt-realtime" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of gpt-realtime?

gpt-realtime supports a 32K token context window with up to N/A output tokens.

सभी मॉडल पर वापस जाएं

LLM चैटrealtimevoice

gpt-realtime

द्वारा OpenAI · रिलीज़ 2025

OpenAI gpt-realtime — स्पीच-टू-स्पीच मॉडल (केवल वॉयस एजेंट WebSocket, चैट completions नहीं)।

LLM चैट

gpt-realtime

द्वारा संचालित OpenAI · Realtime multimodal

कॉन्टेक्स्ट विंडो

32K

पैरामीटर

Not disclosed

अधिकतम आउटपुट

N/A

श्रेणी

LLM चैट

अवलोकन

`gpt-realtime` is OpenAI's production realtime speech-to-speech model: one model listens, reasons, and speaks with low enough latency for live conversation (platform.openai.com/docs/models/gpt-realtime). It is not available on `/v1/chat/completions`. On CallMissed you use it through the voice agent pipeline — create a session via `/v1/voice/sessions` (or the LiveKit/WebRTC flow documented for voice agents) with `llm_model` set to `gpt-realtime`.

OpenAI documents 32,000 tokens of context and up to 4,096 tokens of output for the realtime family, with modalities including text, audio, and image input and text+audio output. Transport is WebRTC, WebSocket, or SIP in OpenAI's native offering; CallMissed integrates via our voice agent worker that bridges LiveKit rooms to the realtime deployment. Function calling is supported; structured outputs are not listed on the model card.

Pricing is token-based with distinct audio rates — OpenAI lists text at $4/$16 per million tokens and audio at $32/$64 per million tokens for gpt-realtime (see model page for current breakdown). CallMissed bills according to our published realtime rates on the catalog. Budget for continuous audio streams: minutes of conversation accumulate audio tokens quickly compared to text-only chat.

Use gpt-realtime when you want a single unified model for phone bots, voice assistants, interview coaches, and hands-free workflows without chaining separate STT, LLM, and TTS providers. Latency is the product goal — you trade the flexibility of mixing Whisper + GPT-4.1 + TTS for operational simplicity. Voices include OpenAI's realtime voice set (alloy, echo, shimmer, ash, ballad, coral, sage, verse, marin, cedar per our voice agent allowlist).

Current platform note: realtime voice models bill against active call minutes (about $0.375/min for gpt-realtime) and run with automatic fallback to the cascaded STT→LLM→TTS pipeline if the speech-to-speech model is briefly unavailable, so sessions still connect. When enabled, test barge-in, interruption, and endpointing behavior with your microphone pipeline.

Limitations: WebSocket/voice only (no text chat completions endpoint), higher cost than batch STT+LLM+TTS stacks for some workloads, and dependency on client-side audio capture quality. For batch transcription after the fact, use `whisper` or `gpt-4o-transcribe` instead. For cheapest speech output without realtime reasoning, consider `gpt-4o-mini-tts` with a text LLM.

Session lifecycle on CallMissed: create a voice session specifying STT/TTS/realtime LLM ids, join the LiveKit room with the returned token, stream microphone audio, receive synthesized speech back. The realtime model replaces separate STT+LLM+TTS chaining for duplex conversation.

Audio token budgeting: OpenAI splits text vs audio token meters on the model card — continuous speech consumes audio tokens rapidly. Pilot with recorded calls to estimate monthly spend before enabling toll-free numbers.

Function calling in voice: define tools sparingly — latency grows with schema size. Prefer two or three high-value tools (lookup order, book appointment) over dozens of rarely used functions.

Voice persona selection: match voice to brand (marin/cedar for calm support, alloy for neutral). Test barge-in — users interrupting the bot is normal; realtime models handle turn boundaries better than chained pipelines when configured correctly.

Maintenance status: when catalog shows maintenance, realtime quota may be exhausted in region — fall back to `saaras:v3` + `gpt-4.1` + `bulbul:v3` for Indic, or Nova + text LLM + Aura for English telephony.

Monitoring: track session duration, time-to-first-audio-byte, tool error rate, and credit burn per minute of talk time.

Regulatory: record retention and consent banners are your responsibility — realtime streams may contain PCI/PHI if agents read them aloud.

Architecture diagram in words: User microphone → LiveKit room → CallMissed voice agent worker → the realtime API → synthesized audio back through LiveKit → user speaker. Text side channels (transcripts, tool calls) may still flow to your backend via webhooks you implement around the session.

Hardware recommendations: wired headsets for demo booths; acoustic echo cancellation on mobile; avoid Bluetooth latency for latency-sensitive demos. Server-side, run agent workers close to users geographically when possible — WebRTC media prefers proximity.

Testing checklist: silence handling, interrupt mid-sentence, background TV noise, code-switching languages, tool call while speaking, session reconnect after network drop, credit exhaustion graceful message.

Sales engineering note: when prospects compare to "ChatGPT voice mode," clarify that `gpt-realtime` is the API-accessible realtime class powering similar experiences — your product wraps it with LiveKit transport and CallMissed billing.

Documentation cross-links: OpenAI Realtime API guide, CallMissed voice session API reference, LiveKit client SDK docs for web/iOS/Android.

Pricing communication: quote per-minute estimates with assumptions (talk ratio, tool usage); audio token math is opaque to non-technical buyers — simplify for proposals, detail in engineering spreadsheets.

प्राइसिंग

मेट्रिक	कीमत
इनपुट /1M tokens	₹400.0000
आउटपुट /1M tokens	₹1600.0000

1 क्रेडिट = ₹1 = $0.01 USD। कीमतें प्रोवाइडर से दिखाई गई हैं; CallMissed ~35% मार्कअप के साथ पास-थ्रू करता है।

मुख्य बातें

स्पीच-टू-स्पीच
कम लेटेंसी
एकल-मॉडल वॉयस पाइपलाइन
$0.375/min

तकनीकी विवरण

मॉडल id: gpt-realtime
केवल वॉयस-एजेंट WebSocket
~$0.375 per active call minute

ताकतें

एकीकृत भाषण पाइपलाइन
कम लेटेंसी

सीमाएं

चैट completions पर उपलब्ध नहीं

उपयोग के मामले

वॉयस एजेंटफ़ोन बॉटलाइव बातचीत

API उदाहरण

# Create a voice session with llm_model=gpt-realtime via POST /v1/voice/sessions

एंडपॉइंट: WebSocket /v1/voice/sessions · मॉडल ID: gpt-realtime

gpt-realtime अभी आज़माएं

साइनअप पर 1000 फ्री API क्रेडिट पाएं। कोई क्रेडिट कार्ड ज़रूरी नहीं।

फ्री शुरू करें डॉक्स पढ़ें