AI Product Pricing: Per-Token, Per-Seat, Per-Outcome

CallMissed
·6 min readArticle

Pricing AI products is harder than pricing SaaS for one structural reason: unlike a database row, an AI inference has a real, variable cost. That single fact reshapes every pricing decision. Here are the four pricing models actually deployed in 2026, what each is good for, and where each breaks.

Why AI pricing is different from SaaS pricing

Three economic facts make AI pricing distinct:

  • Variable COGS. Each query has a non-trivial compute cost. AI gross margins typically run 50-60% versus 80-90% for SaaS, per multiple 2026 GTM analyses.
  • Workload variance. A power user can cost 100x a casual user. Per-seat pricing under-collects from the first and over-collects from the second.
  • Substitutability. Foundation models drop in price every quarter. A premium price today is a discount in six months.
  • This is why pure per-seat is shrinking and hybrid is rising fast.

    Per-token / per-call pricing

    The most direct mapping from cost to price. Used by every foundation-model API and most infrastructure-layer AI products.

    Where it works:

  • Developer-facing APIs
  • Workloads where customers want to control spend
  • Products where usage is predictable per customer
  • Where it breaks:

  • Buyer fear of "runaway bills" — must be paired with budget caps and alerts
  • Procurement teams cannot sign open-ended contracts; need committed-use discounts
  • Internal users avoid the tool because they "don't want to burn tokens"
  • Practical defaults: offer a free tier, a fixed-budget tier (you eat overage at a margin), and committed-use enterprise contracts.

    Per-seat pricing

    The classic SaaS model. Still works for collaboration-style AI products where every user gets value continuously.

    Where it works:

  • Coding assistants used 8 hours a day
  • Email/calendar copilots
  • Tools where seat utilization is high
  • Where it breaks:

  • Replaces-work products. If one AI does the job of 10 humans, you cannot bill 10 seats.
  • Heavy-power-user distributions. The top 10% of users will burn 80% of the cost.
  • Workflows where the user is the AI agent itself, not a human
  • Q2 2026 per-seat ranges sit between $80-$400 per seat depending on tier, per industry tracking, but the fastest-growing AI companies are shifting away from pure per-seat.

    Per-outcome pricing

    Bill when the AI does the job. Used by customer-support AI ("per resolved ticket"), sales AI ("per qualified lead"), and document AI ("per processed contract").

    Where it works:

  • Buyers can map AI output directly to a financial unit they already track
  • Outcomes are measurable and disputeable cleanly
  • Margins on the outcome are wide enough to absorb COGS volatility
  • Where it breaks:

  • Defining "success" with the buyer — what counts as a "resolved" ticket?
  • Customer can game the metric (e.g., AI marks a ticket resolved that the human reopens)
  • Long sales cycles negotiating the outcome definition itself
  • The clean version: anchor your outcome on a metric the customer already collects in their own system, not one you produce. If they cannot dispute it without changing their own books, you have a defensible price.

    Hybrid pricing (base + usage)

    This is what's winning in 2026. A predictable base subscription locks in budget; usage tiers capture upside as customer value grows. Hybrid adoption rose from 27% to 41% in 12 months, per Bessemer.

    The structure:

  • Base subscription — covers a baseline volume of usage and the management surface (admin, security, support)
  • Usage credits — included in the base, additional usage billed at a per-unit rate
  • Tier upgrades — at each tier, base subscription rises, included credits rise, per-unit overage rate falls
  • This works because it gives buyers what they want (predictability) and gives builders what they need (variable revenue capture).

    What customers actually push back on

    Across enterprise procurement conversations, the recurring buyer concerns:

  • "How do I cap my spend?" — answer: hard caps, budget alerts, role-based usage limits.
  • "How do I forecast next year?" — answer: usage-history dashboards and committed-use discount tiers.
  • "What happens when model prices drop?" — answer: pass through cost reductions, or contractually link price to a model price index.
  • "Why am I paying for failed outputs?" — answer: in outcome pricing, only bill on success; in token pricing, refund credits on errors.
  • The companies that handle these well close enterprise deals 2-3x faster than the ones that don't. [Speculation — based on anecdotal founder reports]

    Pricing pitfalls to avoid

  • Pricing in tokens to a non-technical buyer. Tokens are not a unit anyone outside engineering understands. Translate into "calls", "requests", or "tasks" before quoting.
  • Quoting flat unlimited tiers without ringfencing. A single power user will eat your gross margin.
  • Pricing on output length. Customers will pay you to be verbose. Price on input or task completion, not output volume.
  • Promising static pricing for 12 months. Compute prices move; lock in indexed contracts instead.
  • How to choose

    Pick the model that matches the unit of value your buyer cares about:

  • Sells to developers → per-token, with caps
  • Sells to teams using the product daily → per-seat or hybrid
  • Sells to operations leaders replacing work → per-outcome or hybrid
  • Sells to enterprise procurement → hybrid with committed-use
  • Most successful 2026 AI startups end up at hybrid within 18 months of launch, regardless of where they started. The transition itself is a strategic project — communicate clearly, grandfather generously, and tie it to a product moment that justifies the change.

    Frequently Asked Questions

    Is hybrid pricing really winning?
    Yes — Bessemer's 2026 tracking shows hybrid rose from 27% to 41% adoption while pure per-seat fell from 21% to 15%. Hybrid balances buyer predictability with builder upside.
    How do I set the per-token rate without losing money?
    Track your loaded compute cost per 1K tokens, add 30-50% for non-compute COGS (storage, eval, ops), then mark up 2-3x for gross margin. Re-price quarterly as model costs move.
    Should startups offer unlimited tiers?
    Only with strict ringfencing — rate limits, model-tier caps, fair-use clauses. Pure unlimited at a flat price is how AI startups go bankrupt at scale.

    Related Posts