Conversation Design for Voice: From Script to Flow

CallMissed
·6 min readGuide

Conversation design is the discipline that separates voice agents that are pleasant to use from voice agents that win lawsuits. The work happens before code: how should a turn unfold, what does the agent do when things go wrong, what is the persona, where does the conversation actually end. In 2026, "let the LLM figure it out" is not a strategy.

Turn structure: the smallest unit

A turn is one user input plus one agent response. Every turn has a structure:

  • Acknowledge. Show the user was heard. "Got it," "Sure," "One moment."
  • Answer. The substantive response.
  • Handoff. Hand the conversational floor back. A question, a pause, "anything else?"
  • Skip the acknowledgement and the user is unsure if they were heard. Skip the handoff and the user is unsure if it's their turn. The middle is what the LLM is good at; the bookends are what conversation design adds.

    Repair: when STT misheard

    STT errors are the most common conversation breakage. Repair design covers:

  • Implicit repair. Agent restates the action with the (potentially misheard) parameter — "Booking your appointment for Tuesday at 3pm." User can correct without being asked.
  • Explicit repair. Agent asks "Did I get that right? Tuesday at 3pm?" — slower but lower-risk for irreversible actions.
  • Re-ask. Agent says "Sorry, I didn't catch that — could you repeat the time?" Last resort; users hate this.
  • A common design pattern: implicit repair on low-risk parameters (preferred name, callback time), explicit repair on high-risk ones (payment amount, account changes).

    Error recovery: when the agent fails

    Errors come in three shapes:

  • STT/audio errors — couldn't hear, low confidence transcript.
  • Comprehension errors — heard, didn't understand the intent.
  • Tool/system errors — backend lookup failed, tool unavailable.
  • Each needs a different recovery script. Generic "I don't understand" for all three is the hallmark of a badly designed agent. Per the Pipecat instruction-following discussion, in 2026 the central tension is that smart models are slow and fast models can't follow nuanced instructions — error recovery is exactly where this matters.

    A practical recovery ladder:

  • First failure: ask once for clarification, in user-friendly language.
  • Second failure: offer a simpler alternative path.
  • Third failure: hand off to a human or fall back to a known-good action.
  • Three-strike escalation is a load-bearing design pattern. It bounds frustration and gives the user a clear off-ramp.

    Persona consistency

    Persona drift is when the agent sounds friendly in turn 1 and clinical in turn 5. The user's brain notices and trust drops. Two main causes:

  • Long context drift. As conversation history grows, the system prompt's persona instructions become a smaller share of the context. The model's default voice reasserts.
  • Tool result intrusion. Tool outputs come back as JSON or terse text and the model's response acquires their flavor.
  • Mitigations:

  • Re-state persona every N turns by promoting it in the system prompt.
  • Wrap tool results with "rephrase this in your normal voice" instructions.
  • Use a smaller, tightly-tuned model for the final user-facing response, even if a bigger model handles reasoning. [Inference]
  • Conversation arcs

    Most voice agent conversations follow one of a few arcs:

  • Transactional. "Book me a haircut at 3pm." Linear, slot-filling, ends with confirmation.
  • Informational. "What's my account balance?" One round, may have follow-ups.
  • Diagnostic. "Why isn't my router working?" Multi-turn troubleshooting.
  • Open. "Talk me through this contract clause." Long, branching, requires LLM judgment.
  • Design each arc explicitly. The transactional arc has clean step boundaries; the open arc is mostly about pacing and depth control. Mixing them — treating an informational query like an open arc — is how agents end up rambling.

    Confirming critical actions

    Some actions need confirmation. Some don't. Designing the threshold:

  • Confirm: money movement, irreversible changes, anything regulated.
  • Don't confirm: read-only lookups, conversation context, low-stakes preferences.
  • Over-confirming feels paranoid. Under-confirming creates incidents. The right line is product-specific; check it with users and lawyers before shipping.

    Designing the opening

    The first 5 seconds set every later expectation. A good opening:

  • Identifies the agent ("Hi, this is Aria from [company].")
  • Discloses it's AI, where regulation requires (most jurisdictions in 2026).
  • Asks an opening question that frames what comes next.
  • Bad openings: launch into a 30-second monologue, ask 4 questions in a row, fail to identify the company.

    Designing the closing

    Voice conversations end in three ways:

  • Task completed. Confirm and close. "All set — anything else?"
  • Handoff to human. Set expectation. "I'll connect you to an agent now — please hold."
  • User abandoned. Detect prolonged silence, close gracefully. "I'll let you go — call back anytime."
  • Each needs a designed script. The "user abandoned" path especially — a half-second too aggressive and you cut off thinking users; too lenient and the agent is talking to itself for minutes.

    Working with the LLM

    Conversation design and prompt engineering are different things. Prompt engineering is "what do I write in the system prompt." Conversation design is "what should the agent actually do across turns." They feed each other:

  • Conversation design produces the rules.
  • Prompt engineering encodes the rules into something the LLM can act on.
  • Testing reveals where rules are too vague or too rigid.
  • Treat conversation design as a separate document, owned by a designer, reviewed by product and legal. Don't bury it in the system prompt.

    Testing the design

    Conversation tests are not unit tests. Run:

  • Scripted scenarios. "User asks for X, gets distracted, comes back, finishes." Replay against the agent.
  • Adversarial scenarios. "User mumbles, switches languages, interrupts, lies about identity."
  • Real-call sampling. Listen to recordings of real production calls weekly. Cluster failures and update the design.
  • The shipped design is never finished — it evolves as you see how users actually behave, not how you imagined they would.

    The bottom line

    Conversation design is the layer above prompt engineering and below LLM choice. It defines turn structure, repair strategies, error recovery, persona consistency, openings and closings. Without it, voice agents drift. With it, they have a backbone the LLM can play within. In 2026, the agents users actually like are the ones with explicit design — not the ones that left it to the model.

    Frequently Asked Questions

    Who owns conversation design — the engineer or the writer?
    A designer who understands both linguistics and product. Pure engineers tend to under-design persona; pure writers under-design error recovery. The role is increasingly its own discipline at companies shipping serious voice agents.
    How do I detect persona drift in production?
    Sample real conversations and rate them on a persona checklist (warmth, professionalism, brevity). Watch the ratings drift over conversation length and over time. When ratings drop, retune the system prompt or shorten the context window.
    When should the agent escalate to a human?
    Three-strike rule is a useful default — three consecutive failed turns or any explicit user request escalates. Some products escalate sooner on regulated topics (medical, legal, financial advice) where AI mistakes are particularly costly.

    Related Posts