The GPU Scarcity Story: H100, H200, and B200
"GPU shortage" was the defining infrastructure story of 2023 and 2024. By 2026 the story has shifted — but it has not gone away. Hopper-generation supply has loosened, Blackwell is ramping but constrained, and the gap between "what you can buy on a credit card" and "what you can buy with a multi-year commit" has only widened. Here is the picture as of mid-2026.
H100: from scarce to soft
The H100, NVIDIA's 2022-launched Hopper flagship, drove most of the 2023–2024 scarcity headlines. By 2026 it is widely available across cloud providers, with on-demand rates falling steadily as Blackwell capacity comes online. Spot and preemptible H100 pricing has slid first; on-demand list prices have followed. (Spheron blog) [Unverified — directional]
For most production inference workloads under ~70B parameters, H100 is now the value tier. It is plentiful, well-supported by every framework, and prices keep softening.
H200: the quiet workhorse
The H200 is essentially an H100 die paired with 141 GB of HBM3e memory and higher memory bandwidth. (NVIDIA H200 page) For LLM inference — which is bandwidth-bound, not compute-bound — the H200's larger HBM is often more useful than B200's raw FLOPS for models in the 70B–200B range.
H200 supply caught up faster than H100 ever did, and as B200 ramps the H200 market is softening too. For a team that wants Hopper headroom without B200 lead times, H200 is the practical pick in 2026. [Inference]
B200: shipping but constrained
The Blackwell B200 began shipping in late 2025 and through 2026 hyperscalers (AWS, GCP, Azure, Oracle) are rolling out instances. (gpu.fm guide)
The mechanical reasons B200 is hard to get:
One reported figure — an estimated 3.6 million unit backlog as of April 2026 — captures the gap between order book and shipments. (gpu.fm) [Unverified — single-source estimate]
For builders, the practical implication: cloud rental is the only realistic path to B200 inside a 30–60 day window. Buying hardware through OEMs (Supermicro, Dell, HPE) is a multi-quarter commitment, often gated by minimum-purchase agreements.
What builders actually buy in 2026
[Inference] Based on cloud-provider pricing pages and reported deal patterns:
| Tier | Workload | Practical pick |
|---|---|---|
| Inference, ≤70B model | Most production serving | H100 (value) or H200 (memory headroom) |
| Inference, 70B–200B | Larger open models, MoE | H200 or B200 (when available) |
| Inference, 200B+ frontier | Self-hosted Llama-class | B200 / GB200 NVL72 (rented) |
| Training, fine-tune | LoRA, small SFT | H100 or A100 — cheap and plentiful |
| Training, frontier | Pre-training new models | B200 / GB200 — multi-year commits |
Alternatives: AMD, Trainium, TPU
The "NVIDIA-only" world is loosening at the edges:
For most teams in 2026, NVIDIA is still the default. The alternatives are credible enough that single-vendor lock-in is the more interesting risk than "alternatives don't work."
Practical advice
Bottom line
The 2023 narrative — "you cannot get GPUs" — has become 2026's "you cannot get the newest GPUs at retail timelines." For most inference workloads, that does not matter. Hopper-tier is plentiful and cheap-er, and Blackwell rental is widely available even if Blackwell purchase is not. Plan capacity around what you can actually buy in your decision window, not around the press release.

