LLM Chatrealtimevoice

gpt-realtime

by OpenAI · Released 2025

OpenAI gpt-realtime — speech-to-speech model (voice agent WebSocket only, not chat completions).

LLM Chat

gpt-realtime

Powered by OpenAI · Realtime multimodal

Context Window

32K

Parameters

Not disclosed

Max Output

N/A

Category

LLM Chat

Overview

`gpt-realtime` is OpenAI's production realtime speech-to-speech model: one model listens, reasons, and speaks with low enough latency for live conversation (platform.openai.com/docs/models/gpt-realtime). It is not available on `/v1/chat/completions`. On CallMissed you use it through the voice agent pipeline — create a session via `/v1/voice/sessions` (or the LiveKit/WebRTC flow documented for voice agents) with `llm_model` set to `gpt-realtime`.

OpenAI documents 32,000 tokens of context and up to 4,096 tokens of output for the realtime family, with modalities including text, audio, and image input and text+audio output. Transport is WebRTC, WebSocket, or SIP in OpenAI's native offering; CallMissed integrates via our voice agent worker that bridges LiveKit rooms to the Azure-hosted realtime deployment. Function calling is supported; structured outputs are not listed on the model card.

Pricing is token-based with distinct audio rates — OpenAI lists text at $4/$16 per million tokens and audio at $32/$64 per million tokens for gpt-realtime (see model page for current breakdown). CallMissed bills according to our published realtime rates on the catalog. Budget for continuous audio streams: minutes of conversation accumulate audio tokens quickly compared to text-only chat.

Use gpt-realtime when you want a single unified model for phone bots, voice assistants, interview coaches, and hands-free workflows without chaining separate STT, LLM, and TTS providers. Latency is the product goal — you trade the flexibility of mixing Whisper + GPT-4.1 + TTS for operational simplicity. Voices include OpenAI's realtime voice set (alloy, echo, shimmer, ash, ballad, coral, sage, verse, marin, cedar per our voice agent allowlist).

Current platform note: CallMissed marks gpt-realtime as maintenance when Azure realtime quota is unavailable in a region — check the model catalog `status` field before launching production voice agents. When enabled, test barge-in, interruption, and endpointing behavior with your microphone pipeline.

Limitations: WebSocket/voice only (no text chat completions endpoint), higher cost than batch STT+LLM+TTS stacks for some workloads, and dependency on client-side audio capture quality. For batch transcription after the fact, use `whisper` or `gpt-4o-transcribe` instead. For cheapest speech output without realtime reasoning, consider `gpt-4o-mini-tts` with a text LLM.

Session lifecycle on CallMissed: create a voice session specifying STT/TTS/realtime LLM ids, join the LiveKit room with the returned token, stream microphone audio, receive synthesized speech back. The realtime model replaces separate STT+LLM+TTS chaining for duplex conversation.

Audio token budgeting: OpenAI splits text vs audio token meters on the model card — continuous speech consumes audio tokens rapidly. Pilot with recorded calls to estimate monthly spend before enabling toll-free numbers.

Function calling in voice: define tools sparingly — latency grows with schema size. Prefer two or three high-value tools (lookup order, book appointment) over dozens of rarely used functions.

Voice persona selection: match voice to brand (marin/cedar for calm support, alloy for neutral). Test barge-in — users interrupting the bot is normal; realtime models handle turn boundaries better than chained pipelines when configured correctly.

Maintenance status: when catalog shows maintenance, Azure realtime quota may be exhausted in region — fall back to `saaras:v3` + `gpt-4.1` + `bulbul:v3` on CallMissed for Indic, or Deepgram + text LLM + Aura for English telephony.

Monitoring: track session duration, time-to-first-audio-byte, tool error rate, and credit burn per minute of talk time.

Regulatory: record retention and consent banners are your responsibility — realtime streams may contain PCI/PHI if agents read them aloud.

Architecture diagram in words: User microphone → LiveKit room → CallMissed voice agent worker → Azure OpenAI Realtime API → synthesized audio back through LiveKit → user speaker. Text side channels (transcripts, tool calls) may still flow to your backend via webhooks you implement around the session.

Hardware recommendations: wired headsets for demo booths; acoustic echo cancellation on mobile; avoid Bluetooth latency for latency-sensitive demos. Server-side, run agent workers close to users geographically when possible — WebRTC media prefers proximity.

Testing checklist: silence handling, interrupt mid-sentence, background TV noise, code-switching languages, tool call while speaking, session reconnect after network drop, credit exhaustion graceful message.

Sales engineering note: when prospects compare to "ChatGPT voice mode," clarify that `gpt-realtime` is the API-accessible realtime class powering similar experiences — your product wraps it with LiveKit transport and CallMissed billing.

Documentation cross-links: OpenAI Realtime API guide, Azure OpenAI realtime deployments, CallMissed voice session API reference, LiveKit client SDK docs for web/iOS/Android.

Pricing communication: quote per-minute estimates with assumptions (talk ratio, tool usage); audio token math is opaque to non-technical buyers — simplify for proposals, detail in engineering spreadsheets.

Pricing

MetricPrice
Input /1M tokens₹4000.0000
Output /1M tokens₹8000.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Speech-to-speech
  • Low latency
  • Single-model voice pipeline

Technical Details

  • Model id: gpt-realtime
  • Voice-agent WebSocket only

Strengths

  • Unified speech pipeline
  • Low latency

Limitations

  • Not available on chat completions
  • Maintenance — quota pending

Use Cases

Voice agentsPhone botsLive conversation

API Example

# Create a voice session with llm_model=gpt-realtime via POST /v1/voice/sessions

Endpoint: WebSocket /v1/voice/sessions · Model ID: gpt-realtime

Try gpt-realtime now

Get 1000 free API credits on signup. No credit card required.