The Cost Economics of a Voice Minute in 2026
A voice minute is the smallest unit of revenue and cost for any voice AI product. Understanding what it actually costs to deliver one — and where the costs hide — is the difference between a healthy unit economics story and a graveyard of voice agent startups. Here is the 2026 breakdown.
The headline number
End-to-end production voice agent cost in 2026: $0.12–$0.45 per conversation minute, all-in including telephony, STT, LLM, TTS, and platform overhead. Per industry pricing analyses the range is consistent across vendors.
The variance comes from three things: which TTS you use, how chatty your agent is, and whether you're routing through a managed platform or running close to the metal.
The component breakdown
Per Klariqo's 2026 cost-per-minute analysis and Softcery's calculator, the line items at typical pricing:
Speech-to-text
STT is the cheapest component. Optimizing it rarely moves total cost meaningfully unless you're using premium STT in long-form conversations.
Large language model
LLM cost depends heavily on prompt size and conversation history length. A 4000-token system prompt repeated every turn is a major hidden cost driver. [Inference]
Text-to-speech
TTS is the most expensive and most variable component. The temptation to use the highest-quality voice always pays a price per minute that scales with usage.
Telephony
Telephony is often the surprise cost. Teams optimize the AI stack and forget that the dialer line item alone can dominate.
Platform / infrastructure
Managed platforms (Vapi, Retell, Bland) bundle this in a single per-minute number. Self-hosted stacks distribute it across compute, storage, monitoring, and engineering time.
Where costs hide
Five places the per-minute number can lie:
1. Idle time
A "5-minute conversation" includes user pauses, hold time, music-on-hold, and dead air. Some components bill on wall time (telephony) and some on processing time (STT). The wall-time billed components dominate for sparse conversations.
2. Long context windows
LLM cost grows with conversation history. A 30-turn conversation with 200 tokens per turn plus a 1500-token system prompt is roughly 7500 input tokens on every later turn. At even cheap rates, that's a meaningful per-minute multiplier.
3. Tool calls
Every tool call is an extra LLM round trip — sometimes two. For agents that look up a lot of state (calendar, CRM, knowledge base), tool-call overhead can be the second-largest LLM cost driver after history.
4. Streaming inefficiency
Streaming STT and TTS are billed per second of audio. Jittery, restarting streams (from network blips or aggressive interruption handling) can increase billed seconds beyond actual conversation length.
5. Failed call retries
Calls that fail mid-flight, get retried, or hand off mid-conversation can incur double billing — once for the failed attempt and once for the retry.
Scaling math
A worked example for a midsize deployment:
At $0.20/min all-in, total cost is $30,000/month or roughly $360,000/year.
At $0.30/min, that becomes $45,000/month or $540,000/year. The 50% delta between low-end and mid-tier pricing is real money.
Compare to human operations: [Inference] $1.50/min loaded for US agents puts the same volume at $225,000/month. Even with all of AI's hidden costs and overhead, the AI stack is roughly 5–10x cheaper.
Optimizing the per-minute number
Five high-leverage cost optimizations:
Pricing your product
If you're charging customers for voice minutes, the standard 2026 markup is 2–4x cost. A $0.20/min cost charged at $0.50–$0.80/min reads as competitive against legacy IVR costs and gives you margin to absorb support, sales, and product investment. [Inference]
Below 2x, you're squeezing margin to where any one bad month wipes you out. Above 4x, customers are doing the math themselves and migrating to in-house deployments.
The bottom line
A voice minute in 2026 costs $0.12–$0.45 to deliver, dominated by TTS and telephony, with LLM as a meaningful third and STT as a rounding error. The hidden costs — long context, tool calls, idle time, retries — can swing the number 30–50% if you're not watching. Optimize the right line items (TTS choice, prompt size, telephony rate) and the unit economics are durable. Optimize the wrong ones (squeezing 5% out of STT) and you're rearranging the deck chairs.