Evaluating AI Vendors: A Procurement Checklist

The standard SaaS procurement checklist does not cover AI risk. SOC 2 reports do not certify model behavior. Privacy reviews do not address training-data leakage. Indemnification clauses written in 2018 do not cover output liability. Below is a 2026 AI-specific vendor evaluation checklist for buyers — and a useful self-audit for vendors trying to close enterprise deals.
1. Data handling

The single most material risk for an AI vendor relationship.
- Where does our data go? List every model, sub-processor, and storage tier our prompts, completions, and retrieved context will touch.
- Is it used for training? Get an explicit "no" in writing for both model training and model evaluation, with the option to opt in if useful.
- Where is it stored, and for how long? Region, encryption-at-rest, retention policy, deletion-on-request SLA.
- What happens to data on contract termination? Default should be 30-day deletion across all caches and logs.
- Is there a zero-retention API option? Some providers offer it at the request level; confirm whether your vendor exposes it.
Anything weaker than "no training, customer-controlled retention, EU/India residency available" is a hard 'no' for regulated industries in 2026.
2. Security baseline

- SOC 2 Type II. Not Type I — Type II covers operating effectiveness, not just controls existence.
- ISO 27001. Standard for international procurement.
- HIPAA / BAA. If touching health data.
- Pen-test reports. Most recent within 12 months, with critical findings remediated.
- Subprocessor list. Public, with notification on changes. AI vendors typically have many — every model provider is a subprocessor.
- MFA / SSO / SAML. Table stakes; refuse vendors that gate SSO behind enterprise-tier upgrades for security-sensitive admin actions.
3. Model behavior and evals
This is the AI-specific addition that traditional vendor reviews skip.
- What evals do they run before model upgrades? A vendor that swaps the underlying model with no eval discipline is a release-management risk to your product.
- Will they freeze the model version on request? Many enterprise contracts now include "model version stability" clauses.
- What is their hallucination rate on a representative task? Ask for benchmark scores against your domain. If they cannot produce one, that is a signal.
- How do they handle bias / fairness? What slice analysis do they run? Do they expose disparity metrics?
- What changes in vendor behavior trigger customer notification? System prompt changes, tool-use behavior changes, model swaps — all should require notice.
4. Reliability and SLAs
- Uptime SLA. AI vendors increasingly offer 99.9% but read the carve-outs — model-provider downtime is often excluded.
- Latency commitment. P95 and p99, not just averages.
- Failover behavior. What happens if the upstream model is unavailable? Many vendors fail open (degrade to a weaker model) or hard (return error). Both are acceptable; "ours fails silently to wrong answers" is not.
- Capacity guarantees. During holiday traffic, election cycles, or major news events, will throughput hold?
5. Compliance and governance
- EU AI Act readiness. GPAI obligations entered application on 2 August 2025; broader provisions apply 2 August 2026, per the EU's implementation timeline. Ask the vendor for their classification (risk tier, GPAI vs. high-risk system).
- GDPR. Data Processing Agreement on file, EU representative if applicable, sub-processor controls.
- India DPDP. Phase 1 (Data Protection Board) effective November 2025; full substantive obligations by May 2027 per the IAPP timeline. Ask vendors with India users about consent flows and grievance mechanisms.
- Sector-specific. HIPAA, PCI, FedRAMP, FERPA — match to your domain.
6. Output liability and indemnification
The clause most often missing from AI contracts.
- IP indemnification. Does the vendor cover you if a model output is alleged to infringe copyright? Frontier labs and large vendors increasingly do; smaller vendors often don't.
- Output-quality liability. Most vendors disclaim this aggressively. Push back where the output is actually used in a regulated decision.
- Limitation-of-liability cap. Don't accept caps lower than 12 months of fees for material breaches.
7. Escape hatches
If the vendor goes sideways, can you leave?
- Data export. Full export of your data, conversations, and embeddings in a documented format.
- Model portability. Are prompts and tools written against an open API contract you can migrate, or are they locked into a proprietary system?
- Notice period for material changes. 60-90 days minimum.
- Termination rights for material breach. Including security incidents and model-behavior regressions.
8. Cost transparency

- Per-call cost visibility. Real-time, not month-end.
- Budget controls. Hard caps, soft alerts, role-based limits.
- Pricing change notice. 90 days minimum, with grandfathering for the term you signed.
- Committed-use discount terms. What happens if you under-consume? Most vendors do not roll over, but some do for enterprise terms.
9. Operational fit
- Reference customers in your industry. Not just logos — actual conversations.
- Onboarding timeline. A realistic estimate, not the sales pitch.
- Support tier. SLA on response, dedicated TAM at what spend tier, escalation path for production incidents.
- Roadmap visibility. Do you get pre-release notice on changes that affect your integration?
10. Red flags
If you see any of these, slow down:
- No SOC 2. "We're working on it" is fine for seed-stage; not for a production deployment.
- Vague training-data answer. If the vendor cannot give a clear 'no' on training, assume yes.
- No model-version stability clause available. Means your product behavior depends on an upstream change you cannot control.
- One-page MSA. AI deserves longer than a one-pager. Push for the AI rider.
- Low limitation-of-liability cap. A $100K cap on a $500K/year contract is not a real remedy.
Putting it together
For a $250K+ AI vendor contract, the buyer-side process should run roughly 4-6 weeks: security review, AI rider negotiation, eval review, reference checks, pilot, and only then signing. Cutting any of these steps creates risk that surfaces months later when the vendor swaps a model or a regulator asks about your data handling.
For vendors trying to close enterprise: pre-build the answers to all 10 sections of this checklist, publish them in a trust center, and you will accelerate every deal you close.




