AI Voice Agents for Restaurant Ordering

CallMissedMay 8, 2026

·6 min readArticle

Restaurants AI Use Cases Industry Voice AI QSR

Drive-thru voice automation has been the most public test case for production voice AI in 2024–2026. McDonald's piloted with IBM and ended that partnership; new pilots are running with newer voice stacks; Presto raised additional capital in 2026 to scale to thousands of locations. The technology has crossed an accuracy threshold that makes it deployable at all — but only just barely, and only in tightly scoped configurations.

What changed in 2024–2026

McDonald's and IBM ran one of the largest drive-thru voice AI pilots through 2023, and ended the partnership in mid-2024 over accuracy concerns. The story drove a wave of "AI cannot do drive-thru" headlines.

The retreat was real, but so was the rebound. By 2026:

McDonald's is back in market with newer voice stacks. Recent pilot data suggests order accuracy of ~93%, with a reported +12 point lift in guest satisfaction in the test markets. [Unverified — early-pilot McDonald's-cited figure]

Presto reports ~95% accuracy on its drive-thru voice deployments, with a 20-second improvement in throughput and roughly 9 hours/day of labor savings per location. The company raised $10M in early 2026 to scale to thousands of restaurants.

Yum! Brands, Wendy's, CKE, Chipotle, and many smaller chains are running voice ordering pilots in some configuration

The gap between 2024 and 2026 is approximately the gap between mid-generation Whisper-class STT and the latest streaming, low-latency voice stacks paired with restaurant-tuned language models.

The accuracy threshold

The most-debated number in restaurant voice AI is "what accuracy is good enough?" The honest answer:

Below ~90% — fails. Customers correct the bot constantly; throughput goes negative.

90–95% — marginal. Works for simple orders, fails on complex modifications. Labor savings, but customer-experience losses partially offset them.

95%+ — viable. Especially when the escalation path to a human is clean (one-tap from the headset).

97%+ — the threshold above which the AI advantage clearly dominates.

Presto's 95% claim and McDonald's 93% claim place both at or near the bottom of "viable" but with different operational tradeoffs.

What works and what does not

What works in production:

Standard menu items with simple modifications. "Number 3 with a Coke" is solved.

Upsell prompting. "Would you like to add an apple pie?" — the AI never forgets, and upsell attach rates rise.

Throughput compression. Removing the "let me get my colleague" interruptions tightens the line.

24/7 deployment for chains running late-night ordering.

What still fails:

Complex modifications. "No tomato, extra cheese, half-decaf, light ice." Stacked modifiers degrade accuracy fast.

Heavy accents and code-switching. Especially in markets with high non-native-English populations.

Background noise. Drive-thru environments are not anechoic chambers. Production stacks use directional mics, beam-forming, and noise cancellation, but extreme conditions still degrade STT.

Off-menu requests. "Do you have any vegan options?" handled poorly.

Phone ordering vs. drive-thru

Phone-order automation is structurally easier than drive-thru:

The audio environment is cleaner

The customer is willing to talk slowly because they cannot see anyone

There is no throughput pressure

Failure modes are forgiving — the customer hangs up and tries again

Phone-order voice AI is in production at far more independent restaurants than drive-thru voice AI. Major POS vendors (Toast, Square for Restaurants) have launched native AI phone-ordering features in 2025–2026.

Unit economics

Rough numbers for a typical QSR location running drive-thru voice AI:

Capex: $3,000–$10,000 per location for headsets, networking, and edge hardware [Inference]

Opex: ~$300–$1,000/month per location for the voice AI service

Labor savings: ~9 hours/day in the Presto figure ≈ $7,500–$10,000/month at typical drive-thru wages

Throughput lift: 20 seconds per order × 200 orders/day = ~67 minutes of capacity recovered

The math works at scale. It does not work for a single-location independent restaurant where the per-location overhead exceeds the wage savings.

What the technology actually is

The 2026 stack typically combines:

Streaming STT tuned on restaurant-domain audio

A constrained LLM that maps spoken orders to a fixed POS schema (the model cannot emit a menu item that does not exist)

Streaming TTS with a defined voice persona for the brand

A POS bridge that writes the order directly into the kitchen system

A human escalation channel that the AI can hand off to in under a second

The constraint matters: the LLM is not a free-form generator. It can only emit valid POS items, modifiers, and quantities. This is what makes 95%+ accuracy possible — the search space is dramatically smaller than open-domain conversation.

What this means for restaurant operators

If you are evaluating voice AI in 2026:

Phone first, drive-thru second. Phone ordering is easier to deploy and has clearer ROI for most independent restaurants.

Demand a constrained-output contract. The LLM should not be able to emit menu items that do not exist on your POS.

Set a clear escalation policy. What does the customer hear when the bot is uncertain? How fast does a human pick up?

Pilot with realistic volume. Lunch rush is the test. Off-peak demos prove nothing.

Track three KPIs honestly: order accuracy, throughput, and customer-satisfaction delta.

What is next

The 2026–2027 frontier is multi-modal drive-thru — voice plus a screen the customer can see, plus a kitchen-side display that shows the order being assembled. The bet is that confirming complex orders visually closes the modifier-accuracy gap that voice alone cannot.

Voice ordering is one of the cleanest production tests of voice-AI maturity. It has reached the threshold of viable, has not reached the threshold of dominant, and 2027 is likely the year that either tips one way or the other.

Frequently Asked Questions

How accurate is AI drive-thru ordering in 2026?

Production deployments report 93–95% order accuracy on simple-to-moderate orders, with degradation on complex modifications, heavy accents, and noisy environments. Above 95% is the threshold where the AI advantage clearly outweighs friction.

Why did McDonald''s end the IBM AI drive-thru pilot?

McDonald''s ended the IBM partnership in mid-2024 citing accuracy and reliability concerns. Newer pilots with different voice AI stacks are now running, with reported accuracy improvements but the technology is not yet a full chain rollout.

Should an independent restaurant adopt voice AI ordering?

For drive-thru, probably not yet at single-location economics. For phone ordering, the math is cleaner — most major restaurant POS systems (Toast, Square) now offer native AI phone ordering, and the labor savings on missed calls alone often justify it.