Guide

Evaluating AI Vendors: A Procurement Checklist

CallMissed TeamMay 31, 2026

·6 min read

Evaluating AI Vendors: A Procurement Checklist

A 2026 procurement-grade AI vendor checklist — data handling, security, evals, output liability, escape hatches, and the red flags to watch for.

Procurement Vendor Management AI Compliance Enterprise AI Security

CallMissed

AI Communication Platform

Build AI-powered voice agents, WhatsApp bots, and customer engagement workflows.

Try free

Website Docs Playground Dashboard Pricing

The standard SaaS procurement checklist does not cover AI risk. SOC 2 reports do not certify model behavior. Privacy reviews do not address training-data leakage. Indemnification clauses written in 2018 do not cover output liability. Below is a 2026 AI-specific vendor evaluation checklist for buyers — and a useful self-audit for vendors trying to close enterprise deals.

1. Data handling

The single most material risk for an AI vendor relationship.

Where does our data go? List every model, sub-processor, and storage tier our prompts, completions, and retrieved context will touch.
Is it used for training? Get an explicit "no" in writing for both model training and model evaluation, with the option to opt in if useful.
Where is it stored, and for how long? Region, encryption-at-rest, retention policy, deletion-on-request SLA.
What happens to data on contract termination? Default should be 30-day deletion across all caches and logs.
Is there a zero-retention API option? Some providers offer it at the request level; confirm whether your vendor exposes it.

Anything weaker than "no training, customer-controlled retention, EU/India residency available" is a hard 'no' for regulated industries in 2026.

2. Security baseline

SOC 2 Type II. Not Type I — Type II covers operating effectiveness, not just controls existence.
ISO 27001. Standard for international procurement.
HIPAA / BAA. If touching health data.
Pen-test reports. Most recent within 12 months, with critical findings remediated.
Subprocessor list. Public, with notification on changes. AI vendors typically have many — every model provider is a subprocessor.
MFA / SSO / SAML. Table stakes; refuse vendors that gate SSO behind enterprise-tier upgrades for security-sensitive admin actions.

3. Model behavior and evals

This is the AI-specific addition that traditional vendor reviews skip.

What evals do they run before model upgrades? A vendor that swaps the underlying model with no eval discipline is a release-management risk to your product.
Will they freeze the model version on request? Many enterprise contracts now include "model version stability" clauses.
What is their hallucination rate on a representative task? Ask for benchmark scores against your domain. If they cannot produce one, that is a signal.
How do they handle bias / fairness? What slice analysis do they run? Do they expose disparity metrics?
What changes in vendor behavior trigger customer notification? System prompt changes, tool-use behavior changes, model swaps — all should require notice.

4. Reliability and SLAs

Uptime SLA. AI vendors increasingly offer 99.9% but read the carve-outs — model-provider downtime is often excluded.
Latency commitment. P95 and p99, not just averages.
Failover behavior. What happens if the upstream model is unavailable? Many vendors fail open (degrade to a weaker model) or hard (return error). Both are acceptable; "ours fails silently to wrong answers" is not.
Capacity guarantees. During holiday traffic, election cycles, or major news events, will throughput hold?

5. Compliance and governance

EU AI Act readiness. GPAI obligations entered application on 2 August 2025; broader provisions apply 2 August 2026, per the EU's implementation timeline. Ask the vendor for their classification (risk tier, GPAI vs. high-risk system).
GDPR. Data Processing Agreement on file, EU representative if applicable, sub-processor controls.
India DPDP. Phase 1 (Data Protection Board) effective November 2025; full substantive obligations by May 2027 per the IAPP timeline. Ask vendors with India users about consent flows and grievance mechanisms.
Sector-specific. HIPAA, PCI, FedRAMP, FERPA — match to your domain.

6. Output liability and indemnification

The clause most often missing from AI contracts.

IP indemnification. Does the vendor cover you if a model output is alleged to infringe copyright? Frontier labs and large vendors increasingly do; smaller vendors often don't.
Output-quality liability. Most vendors disclaim this aggressively. Push back where the output is actually used in a regulated decision.
Limitation-of-liability cap. Don't accept caps lower than 12 months of fees for material breaches.

7. Escape hatches

If the vendor goes sideways, can you leave?

Data export. Full export of your data, conversations, and embeddings in a documented format.
Model portability. Are prompts and tools written against an open API contract you can migrate, or are they locked into a proprietary system?
Notice period for material changes. 60-90 days minimum.
Termination rights for material breach. Including security incidents and model-behavior regressions.

8. Cost transparency

Per-call cost visibility. Real-time, not month-end.
Budget controls. Hard caps, soft alerts, role-based limits.
Pricing change notice. 90 days minimum, with grandfathering for the term you signed.
Committed-use discount terms. What happens if you under-consume? Most vendors do not roll over, but some do for enterprise terms.

9. Operational fit

Reference customers in your industry. Not just logos — actual conversations.
Onboarding timeline. A realistic estimate, not the sales pitch.
Support tier. SLA on response, dedicated TAM at what spend tier, escalation path for production incidents.
Roadmap visibility. Do you get pre-release notice on changes that affect your integration?

10. Red flags

If you see any of these, slow down:

No SOC 2. "We're working on it" is fine for seed-stage; not for a production deployment.
Vague training-data answer. If the vendor cannot give a clear 'no' on training, assume yes.
No model-version stability clause available. Means your product behavior depends on an upstream change you cannot control.
One-page MSA. AI deserves longer than a one-pager. Push for the AI rider.
Low limitation-of-liability cap. A $100K cap on a $500K/year contract is not a real remedy.

Putting it together

For a $250K+ AI vendor contract, the buyer-side process should run roughly 4-6 weeks: security review, AI rider negotiation, eval review, reference checks, pilot, and only then signing. Cutting any of these steps creates risk that surfaces months later when the vendor swaps a model or a regulator asks about your data handling.

For vendors trying to close enterprise: pre-build the answers to all 10 sections of this checklist, publish them in a trust center, and you will accelerate every deal you close.

Frequently Asked Questions

What is the most overlooked AI vendor checklist item?

Model-version stability. Buyers focus heavily on data handling and security but rarely contract for the right to freeze the model version. When the vendor silently upgrades the model, your downstream product behavior can change overnight.

Is SOC 2 enough for AI vendors?

No — SOC 2 covers security controls but not model behavior, data training, or output liability. Layer the AI-specific items in this checklist on top of standard SOC 2 / ISO 27001 review.

How long should an AI vendor procurement cycle take?

For contracts above $250K/year, plan 4-6 weeks: security, AI rider, eval review, references, pilot, signing. Skipping any step concentrates risk into post-signature surprises.