Tool Use Design Patterns for AI Agents
The single biggest determinant of agent quality is not the model — it's the tools. A capable model with badly designed tools wanders, retries, hallucinates parameters, and burns tokens. A weaker model with well-shaped tools often outperforms it. Tool design has accumulated a stable set of patterns; here are the ones that actually move the needle in production.
Granularity: aim for the agent's mental unit, not the API endpoint
The most common anti-pattern is exposing your REST API one-to-one as tools. The agent now needs to call get_orders, get_order_items, and get_customer to answer "what did Anuj buy last week" — three round-trips, three opportunities to drop a parameter. A single search_orders(customer, since, status?) tool is a stronger primitive.
Rule of thumb: a tool should map to a meaningful agent action, not a low-level resource. If you find yourself documenting "first call X, then Y, then Z" in the system prompt, you have the wrong granularity.
The opposite extreme is also a trap: a single mega-tool with 30 optional parameters lets the model pick the wrong combination silently. Aim for 5–20 tools, each doing one thing the model can describe in a sentence.
Idempotency: assume the agent will retry
Models retry. Frameworks retry. Networks retry. Treat every tool that mutates state as if it will be called twice for the same logical operation:
idempotency_key parameterStripe popularized this for HTTP APIs and it transfers directly. The cost of an extra parameter is tiny; the cost of a duplicate charge or duplicate email is not.
Structured returns: never just stringify
Returning "User created: id=123" looks fine until the next agent step needs to use the ID and tries to parse it from the string. Always return structured content:
{
"ok": true,
"user_id": 123,
"email": "anuj@example.com",
"created_at": "2026-05-09T12:00:00Z"
}In MCP and most function-calling protocols you can mix text and structured payloads. Use both: a one-line human-readable summary plus the structured data. The model uses the summary for chain-of-thought; downstream tools use the data.
Error shapes: distinguish input errors from system errors
A tool error is information the model can act on. The shape matters:
Return a typed error code, not a stack trace. The classic mistake is to surface the upstream exception verbatim; the model treats "ConnectionError: HTTPConnectionPool(host=..." as content to reason about and produces nonsense.
Scope permissions per call
A delete_user tool that can target any user in any tenant is a security incident waiting to happen. Permission scoping should sit at the tool layer, not the prompt:
The general rule: never trust the model to enforce a permission. The system prompt is not a security boundary.
Tool descriptions are prompts
Every tool description is part of your system prompt. Treat them with the same rigor:
The Anthropic and OpenAI tool-use guides both emphasize this. Generic descriptions ("Search the database") produce generic calls; specific descriptions ("Search recent orders by customer email or ID. Returns up to 50 most recent. Use this when the user references a past purchase.") produce focused calls.
Latency budgets per tool
Every tool call adds to the user's perceived latency. Two practical rules:
Aim to keep p95 tool latency under 1 second for hot-path agents. Slow tools should be moved off-path (background jobs, async polling) rather than blocking the loop.
A short anti-pattern catalog
status as one of: open, closed, archived" — but the model passes "OPEN" or "open " or "active". Use a real enum schema; let validation reject and surface the canonical list.
