LLM Chatrealtimevoice

gpt-realtime-mini

by OpenAI · Released 2025

OpenAI gpt-realtime-mini — cheaper speech-to-speech realtime model for voice agents.

LLM Chat

gpt-realtime-mini

Powered by OpenAI · Realtime multimodal

Context Window

32K

Parameters

Not disclosed

Max Output

N/A

Category

LLM Chat

Overview

`gpt-realtime-mini` is OpenAI's cost-efficient realtime speech-to-speech model — the same unified audio-in/audio-out product pattern as `gpt-realtime`, tuned for lower cost (platform.openai.com/docs/models/gpt-realtime-mini). On CallMissed, select it as `llm_model=gpt-realtime-mini` when creating a voice session. Like its larger sibling, it does not appear on `/v1/chat/completions`; it is voice-pipeline only.

The model accepts text, image, and audio inputs and produces text and audio outputs with a 32,000-token context and up to 4,096 output tokens on the model card. OpenAI pricing lists substantially lower text token rates than full gpt-realtime ($0.60 input / $2.40 output per million text tokens on the model page — verify audio pricing separately). That makes realtime-mini attractive for consumer-facing assistants, internal help lines, and high-volume pilots where full realtime quality is unnecessary.

Choose gpt-realtime-mini when latency must stay conversational but absolute frontier reasoning is overkill — appointment booking, FAQ bots, guided troubleshooting with short turns, and language practice. You still benefit from one model handling end-of-turn detection server-side (`turn_detection="realtime_llm"` in our agent) rather than tuning VAD + STT + LLM + TTS yourself.

Voice selection uses the realtime voice allowlist (alloy, echo, shimmer, ash, ballad, coral, sage, verse, marin, cedar). Test on real devices: mobile networks add jitter that text-only APIs never expose. Monitor credits using voice session metrics — audio token accounting differs from chat.

Platform status may show maintenance when Azure realtime deployments lack quota; consult `/api/v1/models` before shipping. Fallback pattern: use `saaras:v3` + `gpt-4.1` + `bulbul:v3` pipeline models for Indian-language or batch-friendly voice when realtime is down.

Limitations: not a text API model, preview/GA lifecycle subject to OpenAI/Azure changes, and quality gap versus full gpt-realtime on complex reasoning mid-call. For transcription-only or TTS-only tasks, dedicated `gpt-4o-transcribe` / `gpt-4o-mini-tts` endpoints are simpler and cheaper.

Pilot economics: realtime-mini is designed so startups can run voice pilots without flagship realtime bills. Model a 5-minute average call, multiply by expected monthly calls, include audio token markup — compare against text chat fallback.

Use case fit: ideal for structured dialogs (collect name, address, confirm appointment) less suited for open-ended therapy or legal advice where full realtime quality matters.

Client implementation: WebRTC through LiveKit requires stable NAT traversal — test on corporate VPNs and mobile LTE. Provide reconnect UX; realtime sessions drop on network blips.

Quality ladder: start on realtime-mini; if CSAT or task completion falls below threshold, A/B test full `gpt-realtime` on a fraction of traffic.

Pairing with text backends: some teams use realtime-mini for voice UX but POST transcripts to `gpt-4.1` async for CRM notes — decouples latency-sensitive path from long analysis.

Quota awareness: same Azure realtime infrastructure as flagship — maintenance flags apply to both. Have a runbook.

Accessibility: voice-first interfaces help visually impaired users — ensure keyboard alternatives remain available for compliance (WCAG).

Go-to-market pattern: launch a "voice beta" tier on realtime-mini with usage caps, gather CSAT, upgrade power users to full realtime. Marketing sites can embed a demo widget — protect with CAPTCHA and per-IP rate limits because voice is abuse-prone.

Engineering stack: Next.js front-end with LiveKit React components, session tokens minted server-side only, never expose master API keys in browsers. Store minimal PII from voice sessions; transcripts may be optional opt-in.

Competitive positioning vs chained STT+LLM+TTS: realtime-mini reduces glue code and aligns end-of-turn detection with the speech model, often lowering perceived latency even if raw ASR+LLM+TTS benchmarks look similar on paper.

Failure modes: Azure maintenance flag — display banner in app; offer text chat fallback. Audio device permission denied — guide users with UI copy. Echo loops — enforce headphones in docs for desktop support tools.

Analytics: funnel from "start voice session" → "completed task" → "user returned within 7 days". Voice UX without analytics is flying blind.

Contractual SLA: unless CallMissed explicitly guarantees realtime availability in your enterprise agreement, do not promise 99.99% voice uptime to your customers — reference catalog maintenance status and maintain fallback modalities.

Token estimation worksheet: assume 150 words per minute spoken, convert to audio tokens using OpenAI's published rates, multiply by simultaneous sessions peak — finance teams use this to set prepaid credit bundles for voice SKUs. Document chosen fallback models in the same runbook so on-call engineers can switch tenants during outages without product redeploys. For developer onboarding, clone the CallMissed voice agent sample with `llm_model=gpt-realtime-mini` first — it is the lowest-cost way to validate WebRTC plumbing before promoting staging traffic to full `gpt-realtime`.

Pricing

MetricPrice
Input /1M tokens₹1500.0000
Output /1M tokens₹3000.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Lower cost realtime
  • Speech-to-speech

Technical Details

  • Model id: gpt-realtime-mini

Strengths

  • Cheaper than gpt-realtime

Limitations

  • Voice-only surface
  • Maintenance — quota pending

Use Cases

Cost-sensitive voice agents

API Example

# Create a voice session with llm_model=gpt-realtime-mini

Endpoint: WebSocket /v1/voice/sessions · Model ID: gpt-realtime-mini

Try gpt-realtime-mini now

Get 1000 free API credits on signup. No credit card required.