The Spring 2026 AI Roundup: Every Model That Shipped, and Why the Agent Wars Are Here

CallMissed
·6 min readNews

The last ninety days have been the most concentrated stretch of frontier AI releases in the field's short history. Between February and early May 2026, every major lab — and a handful of open-weight ones — pushed something new. Here is what shipped, what changed, and why all of them are starting to look like AI agents.

OpenAI: GPT-5.5 lands, then ships instantly

OpenAI announced GPT-5.5 on April 23, 2026, positioning it as a step up from GPT-5.4 in coding, scientific research, and tool use. According to the release coverage, the model posts 82.7% on Terminal-Bench 2.0 and 71.4% on the AI Security Institute's expert-level cybersecurity tasks, while holding GPT-5.4's per-token serving latency. Plus, Pro, Business, and Enterprise users got access first; the API followed a day later.

Less than two weeks later, on May 5, OpenAI replaced the default ChatGPT model. GPT-5.5 Instant rolls out to free, Plus, Pro, Go Business, and enterprise tiers. OpenAI's headline number is a 52.5% reduction in hallucinated claims on high-stakes prompts (medicine, law, finance) compared to GPT-5.3 Instant. The model can also reach back into past conversations, files, and connected Gmail to personalize answers — currently for Plus and Pro on the web.

The pacing tells you something. OpenAI is now shipping a default-model swap every two weeks. That is not a research-org cadence; that is a product cadence.

Anthropic: Claude Opus 4.7 quietly powers a new product

Anthropic shipped Claude Opus 4.7 in early April 2026, focused on coding, agents, vision, and multi-step task consistency. It keeps the 1M-token context window introduced with the 4.6 generation.

The bigger story is what Anthropic did with its own model. On April 17, the company released Claude Design, an experimental product that turns natural-language briefs into prototypes, slides, one-pagers, and mockups. Output ports to PDF, URL, PPTX, or directly to Canva. Anthropic positions it for non-designers — founders, product managers — and confirms the back end runs on Opus 4.7. For a lab that historically led with API access, this is a notable shift toward end-user surfaces. [Inference]

Google: two Gemini drops in two weeks

Google has been the busiest, with two distinct launches in late February.

Gemini 3.1 Pro went out on February 19. Google reports doubled reasoning performance over Gemini 3 Pro and 77.1% on ARC-AGI-2 — a benchmark designed around novel logic patterns rather than memorized templates. It is available in preview through the Gemini API, AI Studio, the Gemini CLI, Antigravity, and Android Studio for developers; via Vertex AI for enterprise; and in the Gemini app and NotebookLM for Pro/Ultra consumers.

A week later, on February 26, Google DeepMind shipped Nano Banana 2 — formally Gemini 3.1 Flash Image. It became the default image generator across the Gemini app, Search's AI Mode, Lens, Ads, and the Flow filmmaking tool on day one. Independent benchmarking placed it at the top of the Artificial Analysis Image Arena leaderboard within hours of launch, at roughly half the API price of its predecessor. The standout features per Google's announcement: precision text rendering for legible signage and marketing mockups, and image grounding via real-time web search.

Meta: Llama 4 brings MoE to open weights

On April 5, Meta released the first two models of the Llama 4 family: Scout and Maverick. Both use a mixture-of-experts architecture — a departure from the dense transformers of Llama 2 and 3 — with 17 billion active parameters per forward pass.

  • Llama 4 Scout routes through 16 experts and ships with a 10M-token context window, a notable open-weight figure.
  • Llama 4 Maverick uses 128 experts for stronger multimodal performance.
  • Meta has signaled a broader vision update at LlamaCon on April 29. The weights are on Hugging Face under the meta-llama org, and Llama 4 is also live on IBM watsonx.ai. For teams building on open weights, Llama 4 is currently among the strongest options for code generation. [Inference based on benchmark comparisons reported in secondary press]

    Mistral: one model, three product lines

    Mistral shipped Medium 3.5 on April 29 under a modified MIT open-weights license. The interesting part is not the parameter count — 128B dense, 256K context, $1.50 per million input tokens — but the consolidation. Mistral folded three previously separate product lines into one checkpoint:

  • Medium for general instruction following and reasoning
  • Devstral for coding
  • Pixtral for vision
  • A single dense checkpoint now serves all three with configurable reasoning effort. The model can be self-hosted on as few as four GPUs. It scores 77.6% on SWE-Bench Verified — competitive but behind Claude Sonnet 4.6 — and ships alongside Vibe, a remote cloud coding agent that opens pull requests directly on GitHub.

    The agent wars, finally

    If there is a single story tying the past month together, it is that everyone is suddenly building AI agents. Reports on May 8 covered Meta and Google accelerating agent products to compete with what Anthropic and OpenAI have already shipped. Mistral's Vibe is an agent. OpenAI's Codex is an agent. Anthropic's Claude Design is an agent surface. Even Google's robotics-focused gemini-robotics-er-1.6-preview, also released in April, is part of the same shift toward systems that take actions instead of just answering questions.

    Two things changed underneath this:

  • Model competence at multi-step tool use. GPT-5.5's Terminal-Bench scores, Claude Opus 4.7's multi-step consistency, and Llama 4 Maverick's coding strength all improved enough that "agent" stops being a demo and starts being a product.
  • Cost and latency parity. GPT-5.5 holding GPT-5.4's latency, Nano Banana 2 launching at half price, Mistral Medium 3.5 at $1.50/M input — the economics of running an agent loop dropped enough to ship to real users.
  • What this means for builders

    If you are building on any of these models, three practical takeaways:

  • Model defaults are unstable right now. OpenAI swapping the default ChatGPT model in two weeks means anything you are testing in the consumer surface today may behave differently next month. Pin model IDs in your API calls.
  • Open-weight is closing the gap. Llama 4, Mistral Medium 3.5, and the Qwen 3.5 / DeepSeek R2 releases earlier in the quarter put serious pressure on closed-source pricing. Self-hosted is back on the table for many use cases.
  • "AI agent" is now the integration target. If you are not exposing tool-use endpoints — function calling, MCP servers, API surfaces with clear contracts — you risk being invisible to the next wave of agent-powered products.
  • The pace is unlikely to slow. Google I/O hits May 19, with another Gemini wave widely expected. Anthropic has trailed something called Claude Mythos. OpenAI has not shown its hand on what comes after GPT-5.5 Instant, but two-week shipping cycles suggest the answer is soon.

    In AI, ninety days is the new year.

    Related Posts