We do at Discourse, in thousands of databases, and it's leveraged in most of the billions of page views we serve.
> Pre- vs. Post-Filtering (or: why you need to become a query planner expert)
This was fixed in version 0.8.0 via Iterative Scans (https://github.com/pgvector/pgvector?tab=readme-ov-file#iter...)
> Just use a real vector database
If you are running a single service that may be an easier sell, but it's not a silver bullet.
- halfvec (16bit float) for storage - bit (binary vectors) for indexes
Which makes the storage cost and on-going performance good enough that we could enable this in all our hosting.
iterative scans are more of a bandaid for filtering than a solution. you will still run into issues with highly restrictive filters. you still need to understand ef_search and max_search_tuples. strict vs relaxed ordering, etc. it's an improvement for sure, but the planner still doesn't deeply understand the cost model of filtered vector search
there isn't a general solution to the pre- vs post-filter problem—it comes down to having a smart planner that understands your data distribution. question is whether you have the resources to build and tune that yourself or want to offload it to a service that's able to focus on it directly
- Related Topics, a list of topics to read next, which uses embeddings of the current topic as the key to search for similar ones
- Suggesting tags and categories when composing a new topic
- Augmented search
- RAG for uploaded files
The point of YAGNI is that you shouldn't over-engineer up front until you've proven that you need the added complexity.
If you need vector search against 100,000 vectors and you already have PostgreSQL then pgvector is a great YAGNI solution.
10 million vectors that are changing constantly? Do a bit more research into alternative solutions.
But don't go integrating a separate vector database for 100,000 vectors on the assumption that you'll need it later.
Chroma implements SPANN and SPFresh (to avoid the limitations of HNSW), pre-filtering, hybrid search, and has a 100% usage-based tier (many bills are around $1 per month).
Chroma is also apache 2.0 - fully open source.
- We're IVF + quantization, can support 15x more updates per second comparing to pgvector's HNSW. Insert or delete an element in a posting list is a super light operation comparing to modify a graph (HNSW)
- Our main branch can now index 100M 768-dim vector in 20min with 16vcpu and 32G memory. This enables user to index/reindex in a very efficient way. We'll have a detailed blog about this soon. The core idea is KMeans is just a description of the distribution, so we can do lots of approximation here to accelerate the process.
- For reindex, actually postgres support `CREATE INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY`. User won't experience any data loss or inconsistency during the whole process.
- We support both pre-filtering and post-filtering. Check https://blog.vectorchord.ai/vectorchord-04-faster-postgresql...
- We support hybrid search with BM25 through https://github.com/tensorchord/VectorChord-bm25
The author simplifies the complexity of synchronizing between an existing database and a specialized vector database, as well as how to perform joint queries on them. This is also why we see most users choosing vector solution on PostgreSQL.
maintenance_work_mem begs to differ.
> You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?
You use REINDEX CONCURRENTLY.
> But updating an HNSW graph isn’t free—you’re traversing the graph to find the right place to insert the new node and updating connections.
How do you think a B+tree gets updated?
This entire post reads like the author didn’t read Postgres’ docs, and is now upset at the poor DX/UX.
> maintenance_work_mem
sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.
> REINDEX CONCURRENTLY
this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
> HNSW vs B+tree
it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them
my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)
That kills the indexing process, you cannot let it run with limited amount of memory.
> How do you think a B+tree gets updated?
In a B+Tree, you need to touch log H of the pages. In HNSW graph - you need to touch literally thousands of vectors once your graph gets big enough.
And if one needs the transactional/consistency semantics, hybrid/filtered-search, low latencies, etc - consider a SOTA Postgres system like AlloyDB with AlloyDB ScaNN which has better scaling/performance (1B+ vectors), enhanced query optimization (adaptive pre-/post-/in-filtering), and improved index operations.
Full disclosure: I founded ScaNN in GCP databases and currently lead AlloyDB Semantic Search. And all these opinions are my own.
From what I've seen is fast, has excellent API, and is implemented by a brilliant engineer in the space (Antirez).
But not using these things beyond local tests, I can never really hold opinions over those using these systems in production.
Especially in the AI and startup space.
please ask your RDS rep to support it
we (tiger data) are also happy to help push that along if we can help
> None of the blogs mention that building an HNSW index on a few million vectors
> can consume 10+ GB of RAM or more (depending on your vector dimensions and
> dataset size). On your production database. While it’s running. For potentially
> hours.
10 GB? Oh jolly gosh! That will almost show up as a pixel or two on my metrics dashboard.Who are these people that run production Postgres clusters on tiny hardware and then complain? Has AWS marketing really confused people into believing that some EC2 "instance size" is an actual server?
So basically, I'd love to have my storage provider give me a vector search API, which I guess is what Amazon S3 vectors is supposed to be (https://aws.amazon.com/s3/features/vectors/)?
Curious to hear what experience people have had with this.
As for inserts being difficult, we basically don't see that because we only update the vector store weekly. We're not trying to index rapidly-changing user data, so that's not a big deal for our use case.
How hard is it to move that process to another machine? Could you grab a dump of the relevant data, spin up a cloud instance with 16GB of RAM to build the index and then cheaply copy the results back to production when it finishes?
> The problem is that index builds are memory-intensive operations, and Postgres doesn’t have a great way to throttle them. You’re essentially asking your production database to allocate multiple (possibly dozens) gigabytes of RAM for an operation that might take hours, while continuing to serve queries.
> You end up with strategies like:
Write to a staging table, build the index offline, then swap it in (but now you have a window where searches miss new data)
Maintain two indexes and write to both (double the memory, double the update cost)
Build indexes on replicas and promote them
Accept eventual consistency (users upload documents that aren’t searchable for N minutes)
Provision significantly more RAM than your “working set” would suggest
> None of these are “wrong” exactly. But they’re all workarounds for the fact that pgvector wasn’t really designed for high-velocity real-time ingestion.short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)
Question is if that tradeoff is more or less complexity than maintaining a whole separate vector store.
I this taste with most posts about Postgres that don’t come from “how we scaled Postgres to X”. It seems a lot of writers are trying to ride the wave of popularity, creating a ton of noise that can end up as tech debt for readers
Is this really how it works? That seems like it’s returning an incorrect result.
ANN-Benchmark exists but it’s algorithm-focused rather than full-stack database testing, so it doesn’t capture real-world ops like concurrent writes, filtering, or resource management under load.
Would be great to see something more comprehensive and vendor-neutral emerge, especially testing things like: tail latencies under concurrent load, index build times vs quality tradeoffs, memory/disk usage, and behavior during failures/recovery
clickbench has 100m rows of data only, which makes it not comprehensive benchmark at all.
Ok yeah there's PGVector. Then you need something to do full text search. And if you put all that together, you have a complex Postgres deployment.
It seems to make sense for simple operations, but I'd rather just get a search engine / vector database, than try to twist Postgres's arm into a weird setup.
search is also just extension? So, its a strong point: you have one self-contained server with simple installation/maintenance story.
1. Updates: I wrote my own implementation of the HNSW with many changes compared to the paper. The result is that the data structure can be updated while it receives queries, like the other Redis data types. You add vectors with VADD, query for similarity with VSIM, delete with VREM. Also deleting vectors will not perform just a thumbstone deletion. The memory is actually reclaimed immediately.
2. Speed: The implementation is fast, fully threaded reads, partially threaded writes: even for insertion it is easy to stay in the few hundreds of ops/sec, and querying with VSIM is like 50k ops/sec in normal hardware.
3. Trivial: You can reimplement your use case in 10 minutes including learing how it works.
Of course it costs some memory, but less than you may guess: it supports quantization by default, transparently, and for a few millions of elements (most use cases) the memory usage is very low, totally affordable.
Bonus point: if you use vector sets you can ask my help for free. At this stage I support people using vector sets directly.
BM25 with query rewriting & expansion can do a lot of heavy lifting if you invest any time at all in configuring things to match your problem space. The article touches on FTS engines and hybrid approaches, but I would start there. Figure out where lexical techniques actually break down and then reach for the "semantic" technology. I'd argue that an LLM in front of a traditional lexical search engine (i.e., tool use) would generally be more powerful than a sloppy semantic vector space or a fine tuning job. It would also be significantly easier to trace and shape retrieval behavior.
Lucene is often all you need. They've recently added vector search capabilities if you think you really need some kind of hybrid abomination.
cpursley•4h ago
Furthermore, when all the hipster vector database die or go into maintenance mode or get the license rug-pull when the investors come looking for revenue, postgres will still be chugging along and getting better and better.
Anyways, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).
qeternity•4h ago
People who say this really have not thought this through, or simply don't understand what the usecases for vector search are.
But even if you had infinite context, with perfect attention, attention isn't free. Even if you had linear attention. It's much much cheaper to index your data than it is to reprocess everything. You don't go around scanning entire databases when you're just interested in row id=X
foobar10000•3h ago
As an example, if one is chunking inputs into a RAG, one is basically hardcoding a feature based on locality - which may or may not work. If it works - as in, it is a good feature (the attention matrix is really tail-heavy - LSTMs would work, etc...) - then hey, vector DBs work beautifully. But for many things where people have trouble with RAG, the locality assumption is heavily violated - and there you _need_ the full-on attention matrix.
tacoooooooo•3h ago
We're searching across millions of documents, so i doubt it