Building Voice Agents on CallMissed: From WebRTC to Sub-Second Round-Trip
A voice agent in 2026 is no longer a research demo. It is a real product surface — phone support, scheduling, in-app conversational UIs, embedded copilots — and the difference between one users tolerate and one users enjoy is almost entirely about latency and turn-taking. CallMissed gives you the production plumbing without the months of WebRTC tuning.
What CallMissed actually does
A voice agent has four moving parts:
CallMissed handles the transport (WebRTC over our hosted media servers) and orchestrates the STT → LLM → TTS pipeline behind one API. You pick the models; we run the room.
The session lifecycle
A voice session looks like this:
POST /v1/voice/sessions
→ { session_id, livekit_url, livekit_token, stt_model, llm_model, tts_voice }Your client connects to the WebRTC room with the returned token. The agent process — running server-side — joins the same room, subscribes to your microphone track, and publishes its synthesized response track back. Both directions are bidirectional and continuous; the user can interrupt, the agent will stop speaking, and the next turn begins.
Sessions are tenant-scoped. Each cm_* API key creates and manages sessions only for its own tenant. Session usage (minutes, model calls, audio bytes) is metered for billing and exposed via /api/v1/analytics.
Where latency actually comes from
Sub-second perceived latency is the goal. Time-to-first-audio (TTFA) is the metric that matters — how long after the user finishes speaking until the agent's voice begins. In 2026, a tight stack looks like:
Adding these up: a well-tuned voice agent ships first audio in 600–900ms. CallMissed is configured for that envelope by default. The hot-path discipline (no DB writes, no pre-yield work) is enforced server-side so adding instrumentation does not silently regress latency.
Picking models for your use case
CallMissed exposes a curated catalog through a single API call:
GET /api/v1/models?service=llm
GET /api/v1/models?service=stt
GET /api/v1/models?service=ttsFor each service we surface only the models we can route to in production. You set the llm_model, stt_model, and tts_voice per session. Common shapes:
We do not lock you into a single backend. If the best STT for your customers is multilingual, pick that; if it is English-only, the catalog has those too.
Knowledge bases and bots
A voice agent is most useful when it knows things. CallMissed bots are a structured wrapper around a system prompt, a knowledge base, and a model configuration. Attach a knowledge base to a bot, attach the bot to a voice session, and your agent answers from your documents instead of generic web knowledge.
Knowledge bases accept Markdown, PDF, and structured documents. Retrieval runs on each turn, scoped by tenant.
What to build first
The fastest way from zero to working voice agent on CallMissed:
cm_* API key with the voice scopePOST /v1/voice/sessions with the bot IDThat is roughly 50 lines of frontend code and one server call. The hard parts — interruption handling, partial transcript stability, audio normalization — are already configured.
What we do not do
Two things are explicitly not CallMissed's job:
The next step
If you are evaluating the voice stack, the playground at /playground lets you spin up a session against your tenant in under a minute. From there, the API surface is small enough to integrate end-to-end in an afternoon.