LLM Chataffordable

Gemma 4 26B A4B

by Google · Released April 2, 2026

Google DeepMind's open-weight MoE model from the Gemma 4 family. 26B total parameters with only 4B active per forward pass — runs nearly as fast as a 4B model while delivering much larger model quality. Multimodal (text + image), 256K context, Apache 2.0 license.

LLM Chat

Gemma 4 26B A4B

Powered by Google · Mixture-of-Experts (26B total / 4B active)

Context Window

128K

Parameters

26B total / 4B active (MoE)

Max Output

8K

Category

LLM Chat

Overview

Gemma 4 26B A4B, released April 2, 2026 by Google DeepMind, is an open-weight Mixture-of-Experts model that achieves a remarkable efficiency breakthrough: 26B total parameters with only 4B active per forward pass. This means it runs nearly as fast as a 4B model while delivering quality comparable to much larger models — making it one of the most efficient open models available.

The model is multimodal, supporting both text and image input (with audio support on smaller variants), and features a 256K token context window. It supports 140+ languages, making it one of the most linguistically diverse open models. Released under the Apache 2.0 license, it offers full commercial freedom with no restrictions on use, modification, or distribution.

Gemma 4 26B A4B ranks #3 among open-source models on key benchmarks, punching well above its weight class thanks to the MoE architecture. It is particularly well-suited for cost-effective deployments, edge-friendly scenarios, and any application where the combination of multimodal capability, multilingual support, and permissive licensing matters.

Pricing

MetricPrice
Input /1M tokens₹40.0000
Output /1M tokens₹160.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

  • Apache 2.0 license — full commercial freedom
  • 26B total params, only 4B active (fast inference)
  • Multimodal: text and image input
  • 140+ language support

Benchmarks

BenchmarkScore
Open-Source Ranking#3
MMLU-Pro72.8%
HumanEval80.5%
MATH-50078.3%
GPQA Diamond55.2%

Technical Details

  • Architecture: MoE with 26B total / 4B active parameters per forward pass
  • Runs nearly as fast as a 4B model with much higher quality
  • Multimodal: text and image input (audio on smaller variants)
  • 256K native context window
  • 140+ language support — one of the most linguistically diverse open models
  • Apache 2.0 license — full commercial freedom, no restrictions
  • #3 open-source model on key benchmarks
  • Available via Google AI API and CallMissed unified gateway

Strengths

  • Apache 2.0 — most permissive license among top open models
  • Only 4B active params — runs on consumer hardware and edge devices
  • Multimodal text+image with 140+ language support
  • #3 open-source model — punches well above its weight class
  • Affordable at $0.40/$1.60 per 1M tokens

Limitations

  • Lower absolute capability than larger models (GPT-OSS-120B, Kimi K2.5)
  • 4B active parameters limits complex reasoning depth
  • Image understanding is less capable than dedicated vision models

Use Cases

Cost-effective chatImage understandingMultilingual tasksEdge-friendly deployment

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "gemma-4-26b-a4b-it", "messages": [{"role": "user", "content": "Describe what you see in this image"}]}'

Endpoint: POST /v1/chat/completions · Model ID: gemma-4-26b-a4b-it

Try Gemma 4 26B A4B now

Get 1000 free API credits on signup. No credit card required.