Embedding Models in 2026: OpenAI vs Cohere vs Open Source
The choice of embedding model shapes everything downstream in a RAG system — retrieval quality, storage cost, query latency, and ceiling on hybrid-search performance. In 2026 the field has narrowed to a clear set of contenders: OpenAI's text-embedding-3 family, Voyage AI's voyage-3 / voyage-3-large, Cohere embed-v3, Google's text-embedding-005, and open-source BGE-M3.
The contenders
OpenAI text-embedding-3-small / -large
The default for most teams since 2024. text-embedding-3-small is $0.02 per 1M tokens with 1536 dimensions (truncatable via Matryoshka). text-embedding-3-large is $0.13 per 1M tokens at 3072 dims (also truncatable). Both support 8,191-token context. (pecollective) [Unverified — pricing as of early 2026]
Strengths: ubiquitous SDK support, multilingual, predictable. Weaknesses: not state-of-the-art on benchmarks anymore.
Voyage AI voyage-3 / voyage-3-large
Voyage models lead public benchmarks (MTEB, RTEB) for retrieval-heavy tasks. voyage-3-large is reportedly $0.18 per 1M tokens, with up to 32,000-token context and a MoE architecture. (reintech, pecollective) [Unverified]
Voyage 4 Large reportedly outperforms text-embedding-3-large by 14% and Cohere embed-v4 by 8.2% on NDCG@10 on Voyage's RTEB benchmark. [Unverified — vendor benchmark; treat with caution]
Strengths: best published quality on retrieval, great for code, legal, medical. Weaknesses: requires document/query input distinction; smaller ecosystem.
Cohere embed-v3 / embed-v4
Strong multilingual performance (100+ languages), 512-token context (smaller than competitors), with separate document and query input modes.
Strengths: multilingual, mature reranker pairing (Cohere Rerank). Weaknesses: short context window limits chunk size flexibility.
Google text-embedding-005 / Vertex
Google's offering at roughly $0.006 per 1M tokens is dramatically cheaper than the rest, with quality often within a few points of the leaders for general-purpose retrieval. (tokenmix) [Unverified]
Strengths: cheapest, integrated with Gemini stack. Weaknesses: less momentum in third-party tooling.
BGE-M3 (open source)
The strongest open-source embedding family. BGE-M3 supports dense + sparse + multi-vector in a single model, 8,192-token context, multilingual.
Strengths: free, self-hostable, hybrid-friendly. Weaknesses: you operate it; quality is competitive but not always leading.
A 2026 comparison snapshot
| Model | Dimensions | Context | $/1M tokens | Notes |
|---|---|---|---|---|
| OpenAI text-embedding-3-small | 1536 | 8,191 | $0.02 | default, cheap |
| OpenAI text-embedding-3-large | 3072 | 8,191 | $0.13 | better recall |
| Voyage voyage-3 | 1024 | 32,000 | $0.06 | best $/quality |
| Voyage voyage-3-large | 1024 | 32,000 | $0.18 | leading retrieval |
| Cohere embed-v3 | 1024 | 512 | $0.10 | multilingual |
| Google text-embedding-005 | 768 | 2,048 | $0.006 | cheapest |
| BGE-M3 (self-hosted) | 1024 | 8,192 | infra | OSS, hybrid |
[Unverified — pricing varies; check the vendor's pricing page before committing]
Practical recommendations
Dimensions and Matryoshka truncation
OpenAI's v3 models support Matryoshka representation learning — you can truncate the embedding to a smaller size (e.g., 256 or 512 dims from 1536) at small quality cost. This is meaningful when storage or memory matters: a 256-dim embedding is 1/6 the size of a 1536-dim one with often only a few percentage points of recall@k loss. [Inference]
For most production use, 1024 dims is the practical sweet spot — large enough to retain quality, small enough to scale to hundreds of millions of vectors without breaking memory budgets.
Document vs query mode
Voyage and Cohere distinguish between embedding a document (storing) and embedding a query (retrieving). The same text gets a slightly different embedding depending on intent, because the model applies different processing.
OpenAI does not make this distinction; the same model handles both. For Voyage and Cohere users, set the input type correctly — passing query for stored documents (or vice versa) silently degrades retrieval quality.
Watch out for benchmark optics
Embedding leaderboards (MTEB, RTEB, BEIR) measure averaged performance across many tasks. Your task is one task. The model that wins MTEB by 2 points may lose to a 30%-cheaper model on your specific data.
The 2026 default workflow: start with a cheap model (text-embedding-3-small or Google), build a small golden set of representative queries, evaluate top-3 contenders on your data, pick the one that wins on your task — not on the leaderboard.
Bottom line
The embedding model market in 2026 is mature: cheap-and-good defaults exist (OpenAI 3-small, Google), specialist leaders exist (Voyage), open-source is competitive (BGE-M3), and switching costs are mostly in your data pipeline (re-embedding) rather than model lock-in. Pick the cheapest model that meets your retrieval quality bar on your data; revisit once a year.
