Structured Output vs Tool Use: Which When

CallMissed
·5 min readComparison

By 2026 the "JSON parsing with regex" era is over. Both major model APIs offer constrained-decoding paths that produce schema-valid output, and tool use is mature enough that one or the other handles 90% of structured generation workloads. The remaining question is which to reach for — and the answer depends on whether you're returning data or taking action.

The two paths

Structured output / response_format. You pass a JSON Schema; the model returns text that conforms to it. OpenAI calls this Structured Outputs (with response_format set to a json_schema). Anthropic offers similar capabilities via the tool-call mechanism, with the SDK packaging it as a structured-output convenience.

Tool use / function calling. You define one or more tools as JSON Schemas; the model picks one and emits arguments that conform. The arguments are validated against the schema; you execute the tool; the result goes back into the loop.

Same constraint mechanism (constrained decoding); different mental model.

How the two providers handle each

OpenAI integrates structured outputs directly with their API. With response_format: { type: "json_schema" } and strict: true, the model is forced into schema-valid output. This is widely described as guaranteed schema compliance; the constrained decoder rejects invalid token continuations during generation.

Anthropic delivers structured output primarily through the tool-use pattern. You define a tool with the schema; the model "calls" the tool; you read the arguments. Per several public comparisons, Anthropic's SDK silently transforms your schema by removing constraints like minimum, maximum, minLength, pattern and moving them to the description field. The constrained decoder validates after generation and retries if validation fails. [Inference]

Practical implication: schemas with rich constraints (e.g., string pattern: "^[A-Z]{3}\\d{4}$") round-trip more cleanly through OpenAI's structured output than through Anthropic's tool-use path. Adding a Pydantic / Zod validator on your end as a safety net is reasonable on either provider.

When to pick structured output

Reach for response_format / json_schema when:

  • The model is the terminus of the call — you want data back, not an action
  • You don't need conditional branching in the response (it's always the same schema)
  • You want the lowest possible latency overhead — no extra tool-call wrapper
  • The output goes directly into a database, UI, or another deterministic step
  • Examples: classification ("which intent does this fit?"), entity extraction, summarization with metadata, generating UI props.

    When to pick tool use

    Reach for tool calling when:

  • The model needs to choose between multiple actions (or choose to do nothing)
  • The action has side effects (write to DB, send email, call an API)
  • You want the agent loop to continue after the tool result returns
  • You need conditional logic in your control flow ("if the tool returned X, route differently")
  • Examples: any agent that does more than one thing, RAG with multiple retrievers, function dispatch, anything that loops.

    The blurry middle

    There's a class of tasks where either works:

  • "Extract these fields and decide whether to escalate" — could be one structured-output call returning {fields, escalate: bool} or one tool call to extract_and_decide
  • "Write a summary and tag it" — same shape
  • Rules of thumb in 2026:

  • One step, no branching → structured output. Lower latency, simpler code.
  • Multiple steps or branching → tool use. The agent loop is built for this.
  • Single output but you might add steps later → tool use. Easier to extend than to retrofit.
  • Latency considerations

    Structured output is usually a single round-trip. Tool use is at least one more — the model emits a tool call, you execute, you send the result back, the model produces the final response. For latency-sensitive surfaces (chat completions seen by humans), this can matter.

    A workaround for the tool-use latency tax: if your tool execution is fast and deterministic, return the result eagerly while the user is still seeing the model's first message. Some agent frameworks do this automatically; you can replicate the pattern in raw API code with parallel tool execution.

    Reliability considerations

    Both paths have edge cases:

  • Structured output — the model occasionally produces semantically wrong content within a syntactically valid schema (right shape, wrong values). Schema validity is not output validity. Add semantic checks.
  • Tool use — the model occasionally calls the wrong tool when descriptions overlap, or skips calling a required tool. Eval suites with explicit "did the model call X" assertions catch this.
  • In both cases, structured outputs alone don't replace evals. They make the parsing layer reliable; the reasoning layer still needs evidence.

    Cost considerations

    [Inference] Tool use generally costs more per round-trip because of the extra system-prompt overhead for tool descriptions and the extra round-trips. Structured output is closer to a vanilla completion in token consumption. If a task can be done as a single structured output call, that's typically the cheaper path.

    A pragmatic split

    Most agent systems in 2026 use both:

  • Routing and classification → structured output (cheap, fast, single round-trip)
  • Action execution → tool use (the loop is built for it)
  • Final response generation → either, depending on whether the response needs structure
  • Don't pick one for everything. They are complementary, not competitive.

    Frequently Asked Questions

    Is JSON mode still useful in 2026?
    Largely no. Plain JSON mode (without a schema) was a stopgap before constrained decoding shipped. Structured outputs with a JSON Schema are strictly better — same latency, far stronger guarantees on shape.
    Should I add Pydantic validation on top of structured outputs?
    [Inference] Yes, especially for Anthropic where the SDK strips some constraints from the schema before sending. A Pydantic / Zod model on the client side catches semantic errors structured output won't and decouples your downstream code from provider quirks.
    Can I stream structured output?
    Yes — both OpenAI and Anthropic stream JSON token-by-token. Streaming structured outputs is useful for partial UI rendering but tricky if you need full validation before any consumer sees the result. Buffer to first valid object boundary if you need atomic semantics.

    Related Posts