Drop-In OpenAI-Compatible API: Switch Models Without Rewriting Your Code
The OpenAI Chat Completions API has won the LLM API design war. Whether you like the schema or not, every serious SDK and tool now speaks it natively — openai-python, openai-node, the LangChain/LlamaIndex adapters, the Anthropic CLI's compat mode, even some local model runners. CallMissed's /v1/chat/completions endpoint is OpenAI-compatible by design so your existing code keeps working when you switch to us.
What "compatible" actually means
You set two things:
from openai import OpenAI
client = OpenAI(
base_url="https://api.callmissed.com/v1",
api_key="cm_live_..."
)That is the entire migration. Every method on the openai client — chat.completions.create, streaming, tool calling, response_format, n, temperature, max_tokens, top_p — works against the same schema OpenAI ships.
What you get on top
Wire compatibility is the floor. Where CallMissed adds value:
One key, many models
A single cm_* API key calls every model in the catalog. There are no per-vendor accounts to manage, no per-vendor billing, no per-vendor rate limits to negotiate. Switch models by changing the model field:
client.chat.completions.create(
model="claude-opus-4-7", # frontier reasoning
messages=[...]
)
client.chat.completions.create(
model="gpt-5-5-instant", # latency-tuned default
messages=[...]
)
client.chat.completions.create(
model="kimi-k2.5", # cheaper, long context
messages=[...]
)Pricing, capabilities, and metadata for every model are exposed on /api/v1/models so your code can pick at runtime.
Streaming-correct usage tracking
Streaming responses are fragile to instrument. The wrong place to record usage is inside the generator — a client disconnect leaves transactions hanging. CallMissed handles the post-stream write outside the request lifecycle so disconnect mid-stream does not corrupt usage records or leak DB connections. You see exact token counts on every request, even partials.
Built-in tenancy and audit
Every call is scoped to the API key's tenant. The audit log records request shape, model, token counts, and latency. If you need per-key budgets, expiration windows, or scope restrictions (e.g. "this key can only call STT, not LLM"), set them at key creation time.
Vision support
Models that accept image inputs (the catalog flags supports_vision=true) take the standard OpenAI image_url content blocks. No proprietary wrapping — same JSON, just a different model.
What we changed in the schema
Almost nothing. Three minor additions, all backwards-compatible:
usage.cost_usd — pre-computed cost, since pricing varies per modelmodel in the response is the resolved model, not the requested alias (so gpt-5-5 may resolve to a specific point version)x-callmissed-request-id header on every response, for support ticketsIf your code only reads the OpenAI-standard fields, none of these matter. If you want them, they are there.
What about function/tool calling?
Tool calling works the same way it does on OpenAI: tools parameter, tool_choice, tool_calls in the response. Models that natively support tools route directly; models that do not get a structured-output fallback so the surface stays consistent. The catalog flags supports_tools=true so you can pick at runtime.
Streaming details
stream = client.chat.completions.create(
model="claude-opus-4-7",
messages=[...],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")Server-Sent Events, data: [DONE] terminator, exact same parser. Time-to-first-token (TTFT) is held tight — no pre-yield work in the hot path — because a streaming endpoint that takes 800ms to start streaming has already lost the latency battle.
Migration checklist
If you are coming from raw OpenAI:
base_url to https://api.callmissed.com/v1OPENAI_API_KEY for your cm_* key/api/v1/models — either the same one you were using or a cheaper alternativeMost teams ship the migration in a single PR.
Why this is the right design
We could have invented our own schema. The reason we did not is that API design is a network-effects business. Every integration that already speaks OpenAI is a potential customer who can adopt CallMissed without writing new code. Compatibility is the cheapest distribution we have.
The trade-off is that we cannot innovate on the schema. We are fine with that. The interesting innovation is in routing, pricing, and reliability — not in inventing a thirteenth way to express "user message followed by assistant message."