Embedding Models in 2026: OpenAI vs Cohere vs Open Source

CallMissed
·5 min readComparison

The choice of embedding model shapes everything downstream in a RAG system — retrieval quality, storage cost, query latency, and ceiling on hybrid-search performance. In 2026 the field has narrowed to a clear set of contenders: OpenAI's text-embedding-3 family, Voyage AI's voyage-3 / voyage-3-large, Cohere embed-v3, Google's text-embedding-005, and open-source BGE-M3.

The contenders

OpenAI text-embedding-3-small / -large

The default for most teams since 2024. text-embedding-3-small is $0.02 per 1M tokens with 1536 dimensions (truncatable via Matryoshka). text-embedding-3-large is $0.13 per 1M tokens at 3072 dims (also truncatable). Both support 8,191-token context. (pecollective) [Unverified — pricing as of early 2026]

Strengths: ubiquitous SDK support, multilingual, predictable. Weaknesses: not state-of-the-art on benchmarks anymore.

Voyage AI voyage-3 / voyage-3-large

Voyage models lead public benchmarks (MTEB, RTEB) for retrieval-heavy tasks. voyage-3-large is reportedly $0.18 per 1M tokens, with up to 32,000-token context and a MoE architecture. (reintech, pecollective) [Unverified]

Voyage 4 Large reportedly outperforms text-embedding-3-large by 14% and Cohere embed-v4 by 8.2% on NDCG@10 on Voyage's RTEB benchmark. [Unverified — vendor benchmark; treat with caution]

Strengths: best published quality on retrieval, great for code, legal, medical. Weaknesses: requires document/query input distinction; smaller ecosystem.

Cohere embed-v3 / embed-v4

Strong multilingual performance (100+ languages), 512-token context (smaller than competitors), with separate document and query input modes.

Strengths: multilingual, mature reranker pairing (Cohere Rerank). Weaknesses: short context window limits chunk size flexibility.

Google text-embedding-005 / Vertex

Google's offering at roughly $0.006 per 1M tokens is dramatically cheaper than the rest, with quality often within a few points of the leaders for general-purpose retrieval. (tokenmix) [Unverified]

Strengths: cheapest, integrated with Gemini stack. Weaknesses: less momentum in third-party tooling.

BGE-M3 (open source)

The strongest open-source embedding family. BGE-M3 supports dense + sparse + multi-vector in a single model, 8,192-token context, multilingual.

Strengths: free, self-hostable, hybrid-friendly. Weaknesses: you operate it; quality is competitive but not always leading.

A 2026 comparison snapshot

ModelDimensionsContext$/1M tokensNotes
OpenAI text-embedding-3-small15368,191$0.02default, cheap
OpenAI text-embedding-3-large30728,191$0.13better recall
Voyage voyage-3102432,000$0.06best $/quality
Voyage voyage-3-large102432,000$0.18leading retrieval
Cohere embed-v31024512$0.10multilingual
Google text-embedding-0057682,048$0.006cheapest
BGE-M3 (self-hosted)10248,192infraOSS, hybrid

[Unverified — pricing varies; check the vendor's pricing page before committing]

Practical recommendations

  • Default for cost-sensitive production: text-embedding-3-small or Google text-embedding-005. Both are cheap, both work for general retrieval, both have broad SDK support.
  • Default for retrieval-quality-sensitive workloads: voyage-3 or voyage-3-large. Worth the premium for legal, code, medical, technical documentation.
  • For multilingual: Cohere embed-v3 / embed-v4 or BGE-M3 (open source). Both are credible across many languages.
  • For self-hosting: BGE-M3. You get hybrid (dense + sparse) in one model and avoid per-token egress to a vendor.
  • Dimensions and Matryoshka truncation

    OpenAI's v3 models support Matryoshka representation learning — you can truncate the embedding to a smaller size (e.g., 256 or 512 dims from 1536) at small quality cost. This is meaningful when storage or memory matters: a 256-dim embedding is 1/6 the size of a 1536-dim one with often only a few percentage points of recall@k loss. [Inference]

    For most production use, 1024 dims is the practical sweet spot — large enough to retain quality, small enough to scale to hundreds of millions of vectors without breaking memory budgets.

    Document vs query mode

    Voyage and Cohere distinguish between embedding a document (storing) and embedding a query (retrieving). The same text gets a slightly different embedding depending on intent, because the model applies different processing.

    OpenAI does not make this distinction; the same model handles both. For Voyage and Cohere users, set the input type correctly — passing query for stored documents (or vice versa) silently degrades retrieval quality.

    Watch out for benchmark optics

    Embedding leaderboards (MTEB, RTEB, BEIR) measure averaged performance across many tasks. Your task is one task. The model that wins MTEB by 2 points may lose to a 30%-cheaper model on your specific data.

    The 2026 default workflow: start with a cheap model (text-embedding-3-small or Google), build a small golden set of representative queries, evaluate top-3 contenders on your data, pick the one that wins on your task — not on the leaderboard.

    Bottom line

    The embedding model market in 2026 is mature: cheap-and-good defaults exist (OpenAI 3-small, Google), specialist leaders exist (Voyage), open-source is competitive (BGE-M3), and switching costs are mostly in your data pipeline (re-embedding) rather than model lock-in. Pick the cheapest model that meets your retrieval quality bar on your data; revisit once a year.

    Frequently Asked Questions

    Is Voyage worth the price premium over OpenAI text-embedding-3?
    For retrieval-quality-sensitive domains (legal, code, medical, technical), often yes — reported retrieval gains are 4–10 points on NDCG. For general use cases, OpenAI 3-small at one-tenth the price often wins on cost-quality.
    Can I switch embedding models later?
    Yes, but you have to re-embed your entire corpus. For a 100M-vector store, that is hours-to-days of compute and a careful blue-green deploy. Plan switches at major architecture milestones, not casually.
    Do I need a 3072-dimension embedding?
    Usually not. 1024 dims is the typical production sweet spot. Higher dims help on very-large heterogeneous corpora but cost more in storage, memory, and query time. Test with truncation before committing to large dims.

    Related Posts