The vector database market has consolidated. By mid-2026 four products account for the overwhelming share of production RAG and embedding-search workloads: Pinecone, Qdrant, Weaviate, and pgvector. Each represents a distinct philosophy — fully managed serverless, OSS-first with a managed tier, hybrid retrieval as a first-class feature, or "just use the database you already have." This guide walks through what each is good at in 2026, where the costs land, and which one fits which workload.
The four products at a glance
Pinecone — fully managed, serverless-first, no ops. Strong on raw query speed and zero-config scaling.
Qdrant — Rust-based open source, excellent self-hosted economics, also has a Cloud tier. Best price-performance at small-to-mid scale. [Inference]
Weaviate — Go-based open source with strong hybrid (BM25 + dense) search, modules for re-ranking, and a generous schema model.
pgvector — a PostgreSQL extension. Not a "vector DB" in the strict sense; it is your existing database with a vector column type and HNSW/IVF indexes.
Pricing snapshot (mid-2026)
Numbers move; treat these as anchors, not quotes. At roughly 10M vectors of typical 768-dim embeddings:
Pinecone Serverless — around $70/month in mixed read/write usage, billed as $0.33/GB/month storage + $8.25 per 1M Read Units + $2.00 per 1M Write Units. (leanopstech) [Unverified]
Qdrant Cloud — around $65/month for a comparable workload; self-hosted on a $30–50/month VPS handles the same scale with effort. [Unverified]
Weaviate Cloud — around $135/month at the same scale; self-hosted needs ~16 GB RAM. [Unverified]
pgvector on RDS — effectively the cost of your existing Postgres if you have headroom; a dedicated small instance lands at $30–80/month. [Unverified]
At 100M vectors the gap widens: Pinecone runs into the high hundreds per month, while self-hosted Qdrant or pgvector on a sized instance stays under $200 if you tune it well. [Inference]
Performance: where each one shines
[Inference] Below is the texture of public benchmarks rather than a single authoritative number; numbers swing with index settings and dimensionality.
Pinecone — consistent low-millisecond query latency at any scale because the index is partitioned and replicated for you. The trade is opacity — you do not see the index parameters, and you cannot tune them.
Qdrant — extremely fast on HNSW with payload filtering, and its Rust core is memory-efficient. Strong at hybrid filtering ("vectors where tenant_id = X").
Weaviate — competitive raw speed; its standout is hybrid search built into the query language. BM25 + dense fusion with re-ranking is a one-line query, not a service to assemble.
pgvector — slower than dedicated engines at very large scale, but for under ~10M vectors with HNSW it is fast enough and you get joins, transactions, and RBAC for free.
Hybrid search
In 2026 hybrid (lexical + dense) is no longer optional for production RAG. Lexical-only misses semantic matches; dense-only misses keyword-exact matches like product SKUs.
Weaviate — hybrid is native, with a single hybrid query operator and a configurable alpha.
Qdrant — supports sparse vectors (BM25-style) and dense in the same collection, fused at query time. The setup is more manual than Weaviate but the runtime is just as fast.
Pinecone — supports hybrid via sparse-dense vector pairs; clean API, slightly higher cost per query.
pgvector — combine vector similarity with Postgres full-text search (tsvector); you write the fusion yourself, but you are also one SQL query away from the answer.
When each one fits
Use Pinecone if: you do not want to operate anything, your team is small, and your scale is under ~50M vectors. The "click and forget" tax is real.
Use Qdrant if: you want the best performance per dollar, you have basic Kubernetes or Docker comfort, and you are comfortable with self-hosting. Qdrant Cloud is also a credible managed option.
Use Weaviate if: hybrid retrieval and modules (re-ranking, generative-search) matter and you want them in the database, not assembled outside.
Use pgvector if: your data already lives in Postgres, you have under ~20M vectors, and you would rather extend a known system than introduce a new one. The operational simplicity is enormous.
Things to verify before committing
Index type — HNSW (better recall at memory cost) vs IVF (smaller, slower recall). Most production workloads pick HNSW.
Filter pushdown — can the engine prune by metadata before the ANN search? Pinecone, Qdrant, and Weaviate all do this; pgvector does it via SQL WHERE.
Hybrid retrieval — does the engine do BM25-style sparse search, or do you have to bolt it on?
Multi-tenancy — does the engine isolate tenants in the index (Qdrant, Pinecone) or only at query time (pgvector via row filters)?
Backups, snapshots, and recovery — managed services hide this; for self-hosted Qdrant and Weaviate, snapshot strategy is your problem.
Bottom line
For a new project in 2026, the default choice is between Qdrant (best balance for builders) and pgvector (best for "we already have Postgres"). Pick Pinecone when you want zero ops and are willing to pay the premium. Pick Weaviate when hybrid retrieval is the workload's center of gravity. The wrong choice rarely kills a project — but the right choice removes a class of problems before they appear.
Frequently Asked Questions
Is pgvector fast enough for production RAG?
For workloads under ~10M vectors with proper HNSW indexing, yes. Past that, dedicated engines tend to pull ahead on tail latency, and migration becomes the harder problem to plan for.
Should I pick a managed service or self-host?
Managed wins on time-to-first-query and reduces operational surface area. Self-hosted wins on cost above ~10M vectors and on data residency. Most teams should start managed and migrate only when the math forces them to.
Do I need hybrid search if I have good embeddings?
Likely yes. Hybrid (BM25 + dense) consistently beats dense-only on real-world RAG, especially for queries containing names, codes, or product SKUs that exact lexical match handles better than semantics.