Drop-In OpenAI-Compatible API: Switch Models Without Rewriting Your Code

CallMissed
·5 min readGuide

The OpenAI Chat Completions API has won the LLM API design war. Whether you like the schema or not, every serious SDK and tool now speaks it natively — openai-python, openai-node, the LangChain/LlamaIndex adapters, the Anthropic CLI's compat mode, even some local model runners. CallMissed's /v1/chat/completions endpoint is OpenAI-compatible by design so your existing code keeps working when you switch to us.

What "compatible" actually means

You set two things:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.callmissed.com/v1",
    api_key="cm_live_..."
)

That is the entire migration. Every method on the openai client — chat.completions.create, streaming, tool calling, response_format, n, temperature, max_tokens, top_p — works against the same schema OpenAI ships.

What you get on top

Wire compatibility is the floor. Where CallMissed adds value:

One key, many models

A single cm_* API key calls every model in the catalog. There are no per-vendor accounts to manage, no per-vendor billing, no per-vendor rate limits to negotiate. Switch models by changing the model field:

python
client.chat.completions.create(
    model="claude-opus-4-7",          # frontier reasoning
    messages=[...]
)

client.chat.completions.create(
    model="gpt-5-5-instant",          # latency-tuned default
    messages=[...]
)

client.chat.completions.create(
    model="kimi-k2.5",                # cheaper, long context
    messages=[...]
)

Pricing, capabilities, and metadata for every model are exposed on /api/v1/models so your code can pick at runtime.

Streaming-correct usage tracking

Streaming responses are fragile to instrument. The wrong place to record usage is inside the generator — a client disconnect leaves transactions hanging. CallMissed handles the post-stream write outside the request lifecycle so disconnect mid-stream does not corrupt usage records or leak DB connections. You see exact token counts on every request, even partials.

Built-in tenancy and audit

Every call is scoped to the API key's tenant. The audit log records request shape, model, token counts, and latency. If you need per-key budgets, expiration windows, or scope restrictions (e.g. "this key can only call STT, not LLM"), set them at key creation time.

Vision support

Models that accept image inputs (the catalog flags supports_vision=true) take the standard OpenAI image_url content blocks. No proprietary wrapping — same JSON, just a different model.

What we changed in the schema

Almost nothing. Three minor additions, all backwards-compatible:

  • usage.cost_usd — pre-computed cost, since pricing varies per model
  • model in the response is the resolved model, not the requested alias (so gpt-5-5 may resolve to a specific point version)
  • x-callmissed-request-id header on every response, for support tickets
  • If your code only reads the OpenAI-standard fields, none of these matter. If you want them, they are there.

    What about function/tool calling?

    Tool calling works the same way it does on OpenAI: tools parameter, tool_choice, tool_calls in the response. Models that natively support tools route directly; models that do not get a structured-output fallback so the surface stays consistent. The catalog flags supports_tools=true so you can pick at runtime.

    Streaming details

    python
    stream = client.chat.completions.create(
        model="claude-opus-4-7",
        messages=[...],
        stream=True
    )
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")

    Server-Sent Events, data: [DONE] terminator, exact same parser. Time-to-first-token (TTFT) is held tight — no pre-yield work in the hot path — because a streaming endpoint that takes 800ms to start streaming has already lost the latency battle.

    Migration checklist

    If you are coming from raw OpenAI:

  • Change base_url to https://api.callmissed.com/v1
  • Swap OPENAI_API_KEY for your cm_* key
  • Pick a model from /api/v1/models — either the same one you were using or a cheaper alternative
  • Keep everything else
  • Most teams ship the migration in a single PR.

    Why this is the right design

    We could have invented our own schema. The reason we did not is that API design is a network-effects business. Every integration that already speaks OpenAI is a potential customer who can adopt CallMissed without writing new code. Compatibility is the cheapest distribution we have.

    The trade-off is that we cannot innovate on the schema. We are fine with that. The interesting innovation is in routing, pricing, and reliability — not in inventing a thirteenth way to express "user message followed by assistant message."

    Related Posts