The Case Against PGVector

https://alex-jacobs.com/posts/the-case-against-pgvector/

381•tacoooooooo•3mo ago

Comments

cpursley•3mo ago

Yeah, but just like all other bolt-on databases, now your vital data/biz logic is disconnected from the hot new VC database of the month's logic and you have to write balls of mud to connect it all. That's a very big tradeoff (logic, operations, etc).

Furthermore, when all the hipster vector database die or go into maintenance mode or get the license rug-pull when the investors come looking for revenue, postgres will still be chugging along and getting better and better.

Anyways, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).

qeternity•3mo ago

> Also, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).

People who say this really have not thought this through, or simply don't understand what the usecases for vector search are.

But even if you had infinite context, with perfect attention, attention isn't free. Even if you had linear attention. It's much much cheaper to index your data than it is to reprocess everything. You don't go around scanning entire databases when you're just interested in row id=X

foobar10000•3mo ago

IMO for some things RAG works great, and for others you may need attention, and hence why the completely disparate experiences about RAG.

As an example, if one is chunking inputs into a RAG, one is basically hardcoding a feature based on locality - which may or may not work. If it works - as in, it is a good feature (the attention matrix is really tail-heavy - LSTMs would work, etc...) - then hey, vector DBs work beautifully. But for many things where people have trouble with RAG, the locality assumption is heavily violated - and there you _need_ the full-on attention matrix.

tacoooooooo•3mo ago

> Anyways, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).

We're searching across millions of documents, so i doubt it

xfalcox•3mo ago

> Nobody’s actually run this in production

We do at Discourse, in thousands of databases, and it's leveraged in most of the billions of page views we serve.

> Pre- vs. Post-Filtering (or: why you need to become a query planner expert)

This was fixed in version 0.8.0 via Iterative Scans (https://github.com/pgvector/pgvector?tab=readme-ov-file#iter...)

> Just use a real vector database

If you are running a single service that may be an easier sell, but it's not a silver bullet.

xfalcox•3mo ago

Also worth mentioning that we use quantization extensively:

- halfvec (16bit float) for storage - bit (binary vectors) for indexes

Which makes the storage cost and on-going performance good enough that we could enable this in all our hosting.

summarity•3mo ago

That's where it's at. I'm using the 1600D vectors from OpenAI models for findsight.ai, stored SuperBit-quantized. Even without fancy indexing, a full scan (1 search vector -> 5M stored vectors), takes less than 40ms. And with basic binning, it's nearly instant.

tacoooooooo•3mo ago

this is at the expense of precision/recall though isn't it?

summarity•3mo ago

With the quant size I'm using, recall is >95%.

pclmulqdq•3mo ago

Approximate nearest neighbor searches don't cost precision. Just recall.

simonw•3mo ago

It still amazes me that the binary trick works.

For anyone who hasn't seen it yet: it turns out many embedding vectors of e.g. 1024 floating point numbers can be reduced to a single bit per value that records if it's higher or lower than 0... and in this reduced form much of the embedding math still works!

This means you can e.g. filter to the top 100 using extremely memory efficient and fast bit vectors, then run a more expensive distance calculation against those top 100 with the full floating point vectors to pick the top 10.

FuckButtons•3mo ago

why is this amazing, it’s just a 1 bit lossy compression representation of the original information? If you have a vector in n-dimensional space this is effectively just representing the basis vectors that the original has.

simonw•3mo ago

You can take 8192 bytes of information (1024 x 32 bit floats) and reduce that to 128 bytes (1024 bits, a 64x reduction in size!) and still get results that are about 95% as good.

I find that cool and surprising.

sa-code•3mo ago

I'm with you, it's very satisfying to see a simple technique work well. It's impressive

computably•3mo ago

1024 bits for a hash is pretty roomy. The embedding "just" has to be well-distributed across enough of the dimensions.

ImPostingOnHN•3mo ago

Yeah, that's what I was thinking: Did we think 32 bits across each of the 1024 dimensions would be necessary? Maybe 32768 bits is adding unnecessary precision to what is ~1024 bits of information in the first place.

FuckButtons•3mo ago

That’s a much more interesting question, I wonder if there is a way to put a lower bound on the number of bits you could use?

xfalcox•3mo ago

I was taken back when I saw what was basically zero recall loss in the real world task of finding related topics, by doing the same thing you described where we over capture with binary embeddings, and only use the full (or half) precision on the subset.

Making the storage cost of the index 32 times smaller is the difference of being able to offer this at scale without worrying too much about the overhead.

Someone•3mo ago

> I was taken back when I saw what was basically zero recall loss in the real world task of finding related topics

By moving the values to a single bit, you’re lumping stuff together that was different before, so I don’t think recall loss would be expected.

Also: even if your vector is only 100-dimensional, there already are 2^100 different bit vectors. That’s over 10^30.

If your dataset isn’t gigantic and has documents that are even moderately dispersed in that space, the likelihood of having many with the same bit vector isn’t large.

barrkel•3mo ago

And if dispersion isn't good, it would be worthwhile running the vectors through another model trained to disperse them.

tveita•3mo ago

Depending on your data you might also get better results by applying a random rotation to your vector before quantization.

https://ieeexplore.ieee.org/abstract/document/6296665/ (https://refbase.cvc.uab.cat/files/GLG2012b.pdf)

3abiton•3mo ago

Now that you mention that, I wonder if LSH would perform better with slightly higher memory footprint

mfrye0•3mo ago

I was going to say the same. We're using binary vectors in prod as well. Makes a huge difference in the indexes. This wasn't mentioned once in the article.

tacoooooooo•3mo ago

for sure people are running pgvector in prd! i was more pointing at every tutorial

iterative scans are more of a bandaid for filtering than a solution. you will still run into issues with highly restrictive filters. you still need to understand ef_search and max_search_tuples. strict vs relaxed ordering, etc. it's an improvement for sure, but the planner still doesn't deeply understand the cost model of filtered vector search

there isn't a general solution to the pre- vs post-filter problem—it comes down to having a smart planner that understands your data distribution. question is whether you have the resources to build and tune that yourself or want to offload it to a service that's able to focus on it directly

cortesoft•3mo ago

I feel like this is more of a general critique about technology writing; there are always a lot of “getting started” tutorials for things, but there is a dearth of “how to actually use this thing in anger” documentation.

dpflan•3mo ago

What are you using it for? Is it part of a hybrid search system (keyword + vector)?

xfalcox•3mo ago

In Discourse embeddings power:

- Related Topics, a list of topics to read next, which uses embeddings of the current topic as the key to search for similar ones

- Suggesting tags and categories when composing a new topic

- Augmented search

- RAG for uploaded files

dpflan•3mo ago

Thanks for the details. Also, always appreciated Discord's engineering blog posts. Lots of interesting stories, and nice to see a company discuss using Elixir at scale.

nextaccountic•3mo ago

what does the rag for uploaded files do in discourse?

also, when i run a discourse search does it really do both a regular keyword search and a vector search? how do you combine results?

does all discourse instances have those features? for example, internals.rust-lang.org, do they use pgvector?

xfalcox•3mo ago

> what does the rag for uploaded files do in discourse?

You can upload files that will act as RAG files for an AI bot. The bot can also have access to forum content, plus the ability to run tools in our sandboxed JS environment, making it possible for Discourse to host AI bots.

> also, when i run a discourse search does it really do both a regular keyword search and a vector search? how do you combine results?

Yes, it does both. In the full page search it does keyword first, then vector asynchronously, which can be toggled by the user in the UI. It's auto toggled when keyword has zero results now. Results are combined using reciprocal rank fusion.

In the quick header search we simply append vector search to keyword search results when keyword returns less than 4 results.

> does all discourse instances have those features? for example, internals.rust-lang.org, do they use pgvector?

Yes, all use PGvector. In our hosting all instances default to having the vector features enabled, we run embeddings using https://github.com/huggingface/text-embeddings-inference

jascha_eng•3mo ago

There are also approaches do doing the filtering while traversing a vector index (not just pre/post) e.g. this paper by microsoft explains an approach https://dl.acm.org/doi/10.1145/3543507.3583552 which pgvectorscale implements here: https://github.com/timescale/pgvectorscale?tab=readme-ov-fil...

In theory these can be more efficient than plain pre/post filtering.

tacoooooooo•3mo ago

pgvectorscale is not available in RDS so this wasnt a great solution for us! but it does likely solve many of the problems with vanilla pgvector (what this post was about)

whakim•3mo ago

Interested to hear more about your experience here. At Halcyon, we have trillions of embeddings and found Postgres to be unsuitable at several orders of magnitude less than we currently have.

On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.

How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?

I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.

xfalcox•3mo ago

We host thousands of forums but each one has its own database, which means we get a sort of free sharding of the data where each instance has less than a million topics on average.

I can totally see that at a trillion scale for a single shard you want a specialized dedicated service, but that is also true for most things in tech when you get to the extreme scale .

whakim•3mo ago

Thanks for the reply! This makes much more sense now. To preface, I think pgvector is incredibly awesome software, and I have to give huge kudos to the folks working on it. Super cool. That being said, I do think the author isn't being unreasonable in that the limitations of pgvector are very real when you're talking indices that grow beyond millions of things, and the "just use pgvector" crowd in general doesn't have a lot of experience with scaling things beyond toy examples. Folks should take a hard look at what size they expect their indices to grow to in the near-to-medium-term future.

gerardatkonvo•3mo ago

Another thing is that consolidation means that you can less granularly scale. If suddenly vector searching becomes the bottleneck of your app you can't scale just the vector side of things.

BenGosub•3mo ago

The limitations of PGVector are touched upon in this podcast episode. https://open.spotify.com/episode/2rvn0ZhNoNFtozxpnMIqmo?si=i...

clickety_clack•3mo ago

My default is basically YAGNI. You should use as few services as possible, and only add something new when there’s issues. If everything is possible in Postgres, great! If not, at least I’ll know exactly what I need from the New Thing.

esafak•3mo ago

Databases are hard to swap out when you realize you need a different one.

morshu9001•3mo ago

That's true when you're talking about a generalized rdbms, but if this is an isolated set of tables for embeddings or something and you don't entangle it with everything else, it can be fine. See also, using Postgres as a KV store.

Fripplebubby•3mo ago

The post is a clear example of when YAGNI backfires, because you think YAGNI but then, you actually do need it. I had this experience, the author had this experience, you might as well - the things you think you AGN are actually pretty basic expectations and not luxuries: being able to write vectors real-time without having to run other processes out of band to keep the recall from degrading over time, being able to write a query that uses normal SQL filter predicates and similarity in one go for retrieval. These things matter and you won't notice that they actually don't work at scale until later on!

simonw•3mo ago

That's not YAGNI backfiring.

The point of YAGNI is that you shouldn't over-engineer up front until you've proven that you need the added complexity.

If you need vector search against 100,000 vectors and you already have PostgreSQL then pgvector is a great YAGNI solution.

10 million vectors that are changing constantly? Do a bit more research into alternative solutions.

But don't go integrating a separate vector database for 100,000 vectors on the assumption that you'll need it later.

Fripplebubby•3mo ago

I think the tricky thing here is that the specific things I referred to (real time writes and pushing SQL predicates into your similarity search) work fine at small scale in such a way that you might not actually notice that they're going to stop working at scale. When you have 100,000 vectors, you can write these SQL predicates (return the 5 top hits where category = x and feature = y) and they'll work fine up until one day it doesn't work fine anymore because the vector space has gotten large. So, I suppose it is fair to say this isn't YAGNI backfiring, this is me not recognizing the shape of the problem to come and not recognizing that I do, in fact, need it (to me that feels a lot like YAGNI backfiring, because I didn't think I needed it, but suddenly I do)

morshu9001•3mo ago

If the consequence of being wrong about the scalability is that you just have to migrate later instead of sooner, that's a win for YAGNI. It's only a loss if hitting this limit later causes service disruption or makes the migration way harder than if you'd done it sooner.

simonw•3mo ago

And honestly, even then YAGNI might still win.

There's a big opportunity cost involved in optimizing prematurely. 9/10 times you're wasting your time, and you may have found product-market fit faster if you had spent that time trying out other feature ideas instead.

If you hit a point where you have to do a painful migration because your product is succeeding that's a point to be celebrated in my opinion. You might never have got there if you'd spent more time on optimistic scaling work and less time iterating towards the right set of features.

Fripplebubby•3mo ago

I think I see this point now. I thought of YAGNI as, "don't ever over-engineer because you get it wrong a lot of the time" but really, "don't over-engineer out of the gate and be thankful if you get a chance to come back and do it right later". That fits my case exactly, and that's what we did (and it wasn't actually that painful to migrate).

simonw•3mo ago

Yeah, that's a great way of putting it.

kevstev•3mo ago

At my last job I took over eng at a Series B startup, and my (non-technical) CEO was an ill tempered type and pretty much wanted me to tell him that the entire tech stack was shit and the previous architect/pseudo head of eng was shit, etc. And I was like no... some tradeoffs were made that make a ton of sense for an early stage startup, and the great news is that you are still here and now have the revenue and customer base to start thinking in terms of building things for the next 3-5 years, even though some of things are starting to break. And even better, nothing was so dire that it required stopping the world, we could continue to build and shore up some of the struggling things at the same time.

He seemed to really want me to blame everything on my predecessor and call some kind of crisis, and seemed annoyed by my analysis, which was confusing at the time. But yeah, there are absolutely tradeoffs you make early in a startups life, you just have to know where to take shortcuts and where you at least leave the architecture open to scaling. My biggest critique is that they were at least a year, if not two, past the point where they should have left ultra scrappy startup mode that just throws things at the wall and started building with a longer view.

I have also seen a friend build out a flawless architecture ready to scale to millions of users, but never got close to a product fit. I felt he wasted at least 6 months building out all this infra scaffolding for nothing.

morshu9001•3mo ago

Yeah the "only if" is more like a "necessary, not sufficient." The future migration pain had better be extremely bad to worry about it so far in advance.

Or it should be a well defined problem. It's easier to determine the right solution after you've already encountered the problem, maybe in a past project. If you're unsure, just keep your options open.

simonw•3mo ago

A few years ago I coined the term PAGNI for "Probably Are Gonna Need It" to cover things that are worth putting in there from the start because they're relatively cheap to implement early but quite expensive to add later on: https://simonwillison.net/2021/Jul/1/pagnis/

hobofan•3mo ago

> When you have 100,000 vectors [...] and they'll work fine

So 95% of use-cases.

samus•3mo ago

In that case you might not even really need optimized vector search though.

Jnr•3mo ago

I think Immich (Google photos alternative) uses pgvector. And while you can't really call it a "production" system, because it is self hosted, I have about 100,000 assets there and the vector search works great!

throwway120385•3mo ago

Many of the concerns in the article could be addressed by standing up a separate PG database that's used exclusively for vector ops and then not using it for your relational data. Then your vector use cases get served from your vector DB and your relational use cases get served from your relational DB. Separating concerns like that doesn't solve the underlying concern but it limits the blast radius so you can operate in a degraded state instead of falling over completely.

SoftTalker•3mo ago

I've always tried to separate transactional databases from those supporting analytical queries if there's going to be any question that there might be contention. The latter often don't need to be real-time or even near-time.

samus•3mo ago

That is a workaround and precisely the point the author makes. It increases operational complexity and creates a divide between records in the vector DB and the relational DB.

anentropic•3mo ago

But if you do that, why use Postgres for the vector db?

jeffchuber•3mo ago

Good article - the most use cases i see of pg_vector are typically “chat over their technical docs” - small corpus - doesn’t change often / can rebuild the index - no multi-tenancy avoids much of the issues with post-filtering

Chroma implements SPANN and SPFresh (to avoid the limitations of HNSW), pre-filtering, hybrid search, and has a 100% usage-based tier (many bills are around $1 per month).

Chroma is also apache 2.0 - fully open source.

VoVAllen•3mo ago

We at https://github.com/tensorchord/VectorChord solved most of the pgvector issues mentioned in this blog:

- We're IVF + quantization, can support 15x more updates per second comparing to pgvector's HNSW. Insert or delete an element in a posting list is a super light operation comparing to modify a graph (HNSW)

- Our main branch can now index 100M 768-dim vector in 20min with 16vcpu and 32G memory. This enables user to index/reindex in a very efficient way. We'll have a detailed blog about this soon. The core idea is KMeans is just a description of the distribution, so we can do lots of approximation here to accelerate the process.

- For reindex, actually postgres support `CREATE INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY`. User won't experience any data loss or inconsistency during the whole process.

- We support both pre-filtering and post-filtering. Check https://blog.vectorchord.ai/vectorchord-04-faster-postgresql...

- We support hybrid search with BM25 through https://github.com/tensorchord/VectorChord-bm25

The author simplifies the complexity of synchronizing between an existing database and a specialized vector database, as well as how to perform joint queries on them. This is also why we see most users choosing vector solution on PostgreSQL.

VoVAllen•3mo ago

And we do have user hosting 3 Billion vectors with Postgres + VectorChord with sharding. And they're using vectors to save the earth! Check https://blog.vectorchord.ai/3-billion-vectors-in-postgresql-...

nostrebored•3mo ago

So you’re quantizing and using IVF — what are your recall numbers with actual use cases?

VoVAllen•3mo ago

We do have some benchmark number at https://blog.vectorchord.ai/vector-search-over-postgresql-a-.... It varies on different dataset, but most cases it's 2x or more QPS comparing to pgvector's hnsw at same recall.

nostrebored•3mo ago

Your graphs are measuring accuracy [1] (i'm assuming precision?), not recall? My impression is that your approach would miss surfacing potentially relevant candidates, because that is the tradeoff IVF makes for memory optimization. I'd expect that this especially struggles with high dim vectors and large datasets.

[1] https://cdn.hashnode.com/res/hashnode/image/upload/v17434120...

VoVAllen•3mo ago

It's recall. Thanks for pointing out this, we'll update the diagram.

The core part is a quantization technique called RaBitQ. We can scan over the bit vector to have an estimation about the real distance between query and data. I'm not sure what do you mean by "miss" here. As the approximate nearest neighbor index, all the index including HNSW will miss some potential candidates.

tacoooooooo•3mo ago

We actually looked into vectorchord--it looks really cool, but it's not supported by RDS so it is an additional service for us to add anyways.

inadequatespace•3mo ago

Another extremely solid win for Cunningham’s Law.

rudderdev•3mo ago

As others have commented, all the mentioned issues are resolved, I will favour in using the PGVector. If Postgres can be a good choice over Kafka to deliver 100k events/sec [1], then why not PGVector over Chroma or other specialized vector search (unless there is a specific requirement that can't be solved wit minor code/config changes)!

[1] Ref: https://news.ycombinator.com/item?id=44659678

tacoooooooo•3mo ago

how are all of the mentioned issues resolved?

hunterpayne•3mo ago

So its a longish article and doing a point by point explanation is probably too much for a single post. But several of the points are solved but just standing up a specific Postgres instance for the vector use cases instead of doing this inside an existing instance.

Most of the rest of his complaints comes down to this is complex stuff. True, but its not a solution, its a tool used in making a solution. So when using pg_vector directly, you probably need to understand databases to a more significant degree than a custom solution that won't work for you the moment your requirements change. You surely need to understand databases more than the author does. He doesn't point to a single thing that pg_vector doesn't do or doesn't do well. He just complains it hard to do.

In summary, pg_vector is a toolkit for building vector based functionality, not a custom solution for a specific use case. What is best for you comes down to your team's skills and expertise with databases and if your specific requirements will change. Choose poorly and it could go very badly.

samus•3mo ago

> He doesn't point to a single thing that pg_vector doesn't do or doesn't do well. He just complains it hard to do.

He very clearly complains that IVFFlat indexes have to be periodically rebuilt, that HNSW has high overhead (both during inserts and rebuilds) and that the query planner is not particularly good at optimizing queries involving this kind of indexes. None of this is a problem if the dataset is puny enough, but deadly if you want to scale up without investing significant engineering.

sgarland•3mo ago

> The problem is that index builds are memory-intensive operations, and Postgres doesn’t have a great way to throttle them.

maintenance_work_mem begs to differ.

> You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?

You use REINDEX CONCURRENTLY.

> But updating an HNSW graph isn’t free—you’re traversing the graph to find the right place to insert the new node and updating connections.

How do you think a B+tree gets updated?

This entire post reads like the author didn’t read Postgres’ docs, and is now upset at the poor DX/UX.

tacoooooooo•3mo ago

some fair points points on the specifics.

> maintenance_work_mem

sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.

> REINDEX CONCURRENTLY

this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.

> HNSW vs B+tree

it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them

my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)

sgarland•3mo ago

> sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.

How does it not? You should know the amount of freeable memory your DB has, and a rough idea of peak requirements. Give the index build some amount below that.

> this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.

Yes, those are the trade-offs for not locking the table during the entire build. They’re generally considered acceptable.

> it's "most teams underestimate the operational complexity.

Agreed, which is why I don’t think dev teams should be running DBs if they lack expertise. Managed solutions (for Postgres; no idea on Pinecone et al.) only remove backup and failover complexity; tuning various parameters and understanding the optimizer’s decisions are still wholly on the human. RDBMS are some of the most complicated pieces of software that exist, and it’s absurd that the hyperscalers pretend that they aren’t.

ayende•3mo ago

> maintenance_work_mem

That kills the indexing process, you cannot let it run with limited amount of memory.

> How do you think a B+tree gets updated?

In a B+Tree, you need to touch log H of the pages. In HNSW graph - you need to touch literally thousands of vectors once your graph gets big enough.

sgarland•3mo ago

> That kills the indexing process, you cannot let it run with limited amount of memory.

Considering the default value is 64 MB, it’s already throttled quite a bit.

whakim•3mo ago

> maintenance_work_mem begs to differ.

HNSW indices are big. Let's suppose I have an HNSW index which fits in a few hundred gigabytes of memory, or perhaps a few terabytes. How do I reasonably rebuild this using maintenance_work_mem? Double the size of my database for a week? What about the knock-on impacts on the performance for the rest of my database-stuff - presumably I'm relying on this memory for shared_buffers and caching? This seems like the type of workload that is being discussed here, not a toy 20GB index or something.

> You use REINDEX CONCURRENTLY.

Even with a bunch of worker processes, how do I do this within a reasonable timeframe?

> How do you think a B+tree gets updated?

Sure, the computational complexity of insertion into an HNSW index is sublinear, the constant factors are significant and do actually add up. That being said, I do find this the weakest of the author's arguments.

alanwli•3mo ago

I've seen a decent amount of production use of pgvector HNSW from our customers on GCP, but as the author noted is not without some flaws and are typically in the smallish range (0-10M vectors) for the systems characteristics that he pointed out - i.e. build times, memory use. The tradeoffs to consider are whether you want to ETL data into yet another system and deal with operational overhead, eventual consistency, application-logic to join vector search with the rest of your operational data. Whether the tradeoffs are worth it really depends on your business requirements.

And if one needs the transactional/consistency semantics, hybrid/filtered-search, low latencies, etc - consider a SOTA Postgres system like AlloyDB with AlloyDB ScaNN which has better scaling/performance (1B+ vectors), enhanced query optimization (adaptive pre-/post-/in-filtering), and improved index operations.

Full disclosure: I founded ScaNN in GCP databases and currently lead AlloyDB Semantic Search. And all these opinions are my own.

riku_iki•3mo ago

AlloyDb is not opensource, so it is kinda different niche.

epolanski•3mo ago

Curious if the author tried the new Redis module that brings HNSW vector search to redis.

From what I've seen is fast, has excellent API, and is implemented by a brilliant engineer in the space (Antirez).

But not using these things beyond local tests, I can never really hold opinions over those using these systems in production.

mkesper•3mo ago

It's fast...because everything needs to be in memory. Expect astronomical cloud costs even for mid-sized data requirements.

epolanski•3mo ago

I don't know what mid-sized data requirement is or how this is used in prod, but I have huge doubts that if performance is the need cost is the problem.

Especially in the AI and startup space.

antirez•3mo ago

It's not a module, it is part of every new Redis version now. Well, actually: it is written in the form of a module and with the modules API in order to improve modularity of the Redis internals, but it is a "merged module", a new implementation/concept I implemented in Redis exactly to support the Vector Sets use case. Thank you for mentioning this.

arunmu•3mo ago

There is pgvectorscale from timescale which uses disk ann based data structure and has support for pre and post filtering.

tacoooooooo•3mo ago

I mention this towards the end of the post. it looks like a good solution, but it's not available on RDS

akulkarni•3mo ago

pgvectorscale is 100% open source

please ask your RDS rep to support it

we (tiger data) are also happy to help push that along if we can help

remich•3mo ago

Is this something that can happen? We just ran into this limitation and I really want to keep using pgvectorscale... am exploring other solutions on EKS but RDS would be so much easier. From my reading it seems like this isn't something we can get done as a single AWS customer though.

akulkarni•3mo ago

It is up to RDS. But there should be nothing stopping them. AFAIK they respond to customer interest.

indigo945•3mo ago

    > None of the blogs mention that building an HNSW index on a few million vectors 
    > can consume 10+ GB of RAM or more (depending on your vector dimensions and 
    > dataset size). On your production database. While it’s running. For potentially 
    > hours.

10 GB? Oh jolly gosh! That will almost show up as a pixel or two on my metrics dashboard.

Who are these people that run production Postgres clusters on tiny hardware and then complain? Has AWS marketing really confused people into believing that some EC2 "instance size" is an actual server?

tacoooooooo•3mo ago

guess it depends on your scale? for some, 10+ GB of RAM being consumed on an index build is > 25% of the DB's RAM. apply that same proportion to your setup and maybe it'll make more sense

cdelsolar•3mo ago

10GB of ram is a pixel? how big is your company?

jjfoooo4•3mo ago

When using vectors / embeddings models, I think there's a lot of low hanging fruit to be had with non-massive datasets - your support documentation, your product info, a lot of search use cases. For these, the interface I really want is more like a file system than a database - I want to be able to just write and update documents like a file system and have the indexes update automatically and invisibly.

So basically, I'd love to have my storage provider give me a vector search API, which I guess is what Amazon S3 vectors is supposed to be (https://aws.amazon.com/s3/features/vectors/)?

Curious to hear what experience people have had with this.

auraham•3mo ago

Have you tried cocoindex?

[1] https://cocoindex.io/

[2] https://dev.to/cocoindex/how-to-build-index-with-text-embedd...

eigencoder•3mo ago

I think these are the salient concerns I've faced at work using pgvector. Especially getting bit by the query planning when filtering -- it's hard to predict when postgres will decide to use pre- vs post-filtering.

As for inserts being difficult, we basically don't see that because we only update the vector store weekly. We're not trying to index rapidly-changing user data, so that's not a big deal for our use case.

pqdbr•3mo ago

Id love to read a blog post like this about S3 Vector buckets. Does anyone have experience with it in production?

bashtoni•3mo ago

The service is still in preview, so AWS are explicitly telling people not to put it into production.

From my non-production experiments with it, the main limitation is that you can only retrieve up to 30 top_k results, which means you can't use it with a re-ranker, or at least not as effectively. For many production use cases that will be a deal breaker.

ComputerGuru•3mo ago

My issue with it is that it requires a lot of duplication between it and a traditional rdbms; you can’t use it alone because it doesn’t offer filtering without a search vector (i.e. what some vendors call a scroll function).

simonw•3mo ago

"HNSW index on a few million vectors can consume 10+ GB of RAM or more (depending on your vector dimensions and dataset size). On your production database. While it’s running. For potentially hours."

How hard is it to move that process to another machine? Could you grab a dump of the relevant data, spin up a cloud instance with 16GB of RAM to build the index and then cheaply copy the results back to production when it finishes?

tacoooooooo•3mo ago

i discuss that specifically!

> The problem is that index builds are memory-intensive operations, and Postgres doesn’t have a great way to throttle them. You’re essentially asking your production database to allocate multiple (possibly dozens) gigabytes of RAM for an operation that might take hours, while continuing to serve queries.

> You end up with strategies like:

    Write to a staging table, build the index offline, then swap it in (but now you have a window where searches miss new data)
    Maintain two indexes and write to both (double the memory, double the update cost)
    Build indexes on replicas and promote them
    Accept eventual consistency (users upload documents that aren’t searchable for N minutes)
    Provision significantly more RAM than your “working set” would suggest

> None of these are “wrong” exactly. But they’re all workarounds for the fact that pgvector wasn’t really designed for high-velocity real-time ingestion.

short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)

the_mitsuhiko•3mo ago

> short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)

Question is if that tradeoff is more or less complexity than maintaining a whole separate vector store.

machiaweliczny•3mo ago

Is there a way to do hybrid search that combines vector similarity with scalars fast using pg_vector? Or do I need to migrate to other tool?

dangoodmanUT•3mo ago

> What bothers me most: the majority of content about pgvector reads like it was written by someone who spun up a local Postgres instance, inserted 10,000 vectors, ran a few queries, and called it a day.

I this taste with most posts about Postgres that don’t come from “how we scaled Postgres to X”. It seems a lot of writers are trying to ride the wave of popularity, creating a ton of noise that can end up as tech debt for readers

SoftTalker•3mo ago

AI + Docker has made it really easy to set up trivial demo systems and write an article about it.

IntrepidPig•3mo ago

> Post-filter works when your filter is permissive. Here’s where it breaks: imagine you ask for 10 results with LIMIT 10. pgvector finds the 10 nearest neighbors, then applies your filter. Only 3 of those 10 are published. You get 3 results back, even though there might be hundreds of relevant published documents slightly further away in the embedding space.

Is this really how it works? That seems like it’s returning an incorrect result.

chandureddyvari•3mo ago

Is there a comprehensive leaderboard like ClickBench but for vector DBs? Something that measures both the qualitative (precision/recall) and quantitative aspects (query perf at 95th/99th percentile, QPS at load, compression ratios, etc.)?

ANN-Benchmark exists but it’s algorithm-focused rather than full-stack database testing, so it doesn’t capture real-world ops like concurrent writes, filtering, or resource management under load.

Would be great to see something more comprehensive and vendor-neutral emerge, especially testing things like: tail latencies under concurrent load, index build times vs quality tradeoffs, memory/disk usage, and behavior during failures/recovery

riku_iki•3mo ago

> Is there a comprehensive leaderboard like ClickBench

clickbench has 100m rows of data only, which makes it not comprehensive benchmark at all.

redskyluan•3mo ago

check https://github.com/zilliztech/VectorDBBench

softwaredoug•3mo ago

My real icky feeling is the layering on of postgres plugins to get a search solution to work.

Ok yeah there's PGVector. Then you need something to do full text search. And if you put all that together, you have a complex Postgres deployment.

It seems to make sense for simple operations, but I'd rather just get a search engine / vector database, than try to twist Postgres's arm into a weird setup.

riku_iki•3mo ago

> do full text search. And if you put all that together, you have a complex Postgres deployment.

search is also just extension? So, its a strong point: you have one self-contained server with simple installation/maintenance story.

antirez•3mo ago

Redis Vector Sets, my work for the last year, I believe address many of such points:

1. Updates: I wrote my own implementation of the HNSW with many changes compared to the paper. The result is that the data structure can be updated while it receives queries, like the other Redis data types. You add vectors with VADD, query for similarity with VSIM, delete with VREM. Also deleting vectors will not perform just a thumbstone deletion. The memory is actually reclaimed immediately.

2. Speed: The implementation is fast, fully threaded reads, partially threaded writes: even for insertion it is easy to stay in the few hundreds of ops/sec, and querying with VSIM is like 50k ops/sec in normal hardware.

3. Trivial: You can reimplement your use case in 10 minutes including learing how it works.

Of course it costs some memory, but less than you may guess: it supports quantization by default, transparently, and for a few millions of elements (most use cases) the memory usage is very low, totally affordable.

Bonus point: if you use vector sets you can ask my help for free. At this stage I support people using vector sets directly.

I'll link here the documentation I wrote myself as it is a bit hard to find, you know... a README inside the repository , in 2025, so odd: https://github.com/redis/redis/blob/unstable/modules/vector-...

P.S. in the README there is stale mention about replication code being not really tested. I filled the gap later and added tests, fixed bugs and so forth.

bob1029•3mo ago

I'm still stuck on whether or not vector search (regardless of vendor) is actually the right way to solve the kinds of problems that everyone seems to believe it's great at.

BM25 with query rewriting & expansion can do a lot of heavy lifting if you invest any time at all in configuring things to match your problem space. The article touches on FTS engines and hybrid approaches, but I would start there. Figure out where lexical techniques actually break down and then reach for the "semantic" technology. I'd argue that an LLM in front of a traditional lexical search engine (i.e., tool use) would generally be more powerful than a sloppy semantic vector space or a fine tuning job. It would also be significantly easier to trace and shape retrieval behavior.

Lucene is often all you need. They've recently added vector search capabilities if you think you really need some kind of hybrid abomination.

mhuffman•3mo ago

I like lucene and have used it for many years, but sometimes a conceptually close match is what you want. Lucene and friends are fantastic about word matching, fuzzy searches, stem searches, phonetic searches, faceting and more but have nothing for conceptually or semantically close searches (I understand that they recently added new document vector searches). Also vector searches usually always return something which is not ideal in a lot of cases. I like Reciprocal Rank Fusion myself as it gives the best of both worlds. As a fun trick I use duckdb to do RRF with 5million+ documents and get low double-digit ms response time even under load

kgeist•3mo ago

I'm currently building RAG for our product (using Lucene). What I've found is that embeddings alone don't help much. With hybrid search (BM25+HNSW) they gave me only like +10% boost compared to BM25 alone (on average). In my evaluation datasets, the only case where they helped tremendously was for cases like "a user asks a question in French but the documents are all in English", it went from 6% retrieval to 65% on some datasets.

I got a significant boost (from 65% on average to over 80%) by adding a proper reranker and query rewriting (3 additional phrases to search for).

I think embeddings are overrated in that blog posts often make you believe they are the end of the story. What I've found is that they should be rather treated as a lightweight filtering/screening tool to quickly find a pool of candidates as a first stage, before you do the actual stuff (apply a reranker). If BM25 already works as well as a pre-filtering tool, you don't even need embeddings (with all the indexing headaches).

semiquaver•3mo ago

  > You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?

What is wrong with REINDEX CONCURRENTLY?

jankovicsandras•3mo ago

Shameless plug: https://github.com/jankovicsandras/plpgsql_bm25 BM25 search implemented in PL/pgSQL ( Unlicense / Public domain )

The repo includes plpgsql_bm25rrf.sql : PL/pgSQL function for Hybrid search ( plpgsql_bm25 + pgvector ) with Reciprocal Rank Fusion; and Jupyter notebook examples.

dmezzetti•3mo ago

You can make it even simpler and not bother with any of this. With even something as large as 100M vectors, you can just use Torch or GGUF with compression. Even NumPy can take you a long way. Example below.

https://github.com/neuml/txtai/blob/master/examples/78_Acces...

ezekiel68•3mo ago

> Your database is now handling your normal transactional workload, analytical queries, AND maintaining graph structures in memory for vector search.

No. No one in production is trying to use the same instance for all of these use-cases at scale. The fundamental misunderstanding here is assuming or even "demanding" that one instance should be able to provide OLTP, OLAP and vector ops with no compromises. The workloads are fundamentally different and doing serious work requires architecting the solution much more intelligently.

muzani•3mo ago

"Turbopuffer starts at $64 month with generous limits."

Yup, I think this here explains the popularity of pgvector. If $64/month seems like a lot to you, just use pgvector. If it seems cheap, then your usage is complex enough to want a proper vector DB.

PeterStuer•3mo ago

It is most often not the $64. It is about being in sovereign control of your dataplane.

muzani•3mo ago

Many people using pgvector have about 200 rows of data.

Yesterday, I had a conversation with someone whether it's better to squeeze it all into the system prompt. Someone else argued that even with large context, it's difficult to fit 10k questions into a system prompt.

So I think there's just this mismatch in usage.

codingjaguar•3mo ago

This quite aligns with our observation at Milvus. Recently, we helped several users migrate from pgvector as the workload grew substantially.

It’s worth recognising the strengths of pgvector:

• For small-to-medium scale workloads (e.g., up to millions of vectors, relatively static data), embedding storage and similarity queries inside Postgres can be a simple, familiar architecture.

• If you already use Postgres and your vector workloads are light (low QPS, few dimensions, little metadata filtering / low concurrency), then piggy-backing vector search on Postgres is attractive: minimal added infrastructure.

• For teams that don’t want to introduce a separate vector service, or want to keep things within an existing RDBMS, pgvector is a compelling choice.

From our experience helping users scale vector search in production, several pain-points emerge when scaling vector workloads inside a general-purpose RDBMS like Postgres:

1. Index build / update overhead • Postgres isn’t built from the ground-up for high-velocity vector insertions plus large-scale approximate nearest neighbour (ANN) index maintenance, for example, lacking RaBitQ binary quantization supported in purpose built vector db like Milvus.

• For large datasets (tens/hundreds of millions or beyond), building or rebuilding HNSW/IVF indices inside Postgres can be memory- and time-intensive.

• In production systems where vectors are continuously ingested, updated, deleted, this becomes operationally tricky.

2. Filtered search

• Many use-cases require combining vector similarity with scalar/metadata filters (e.g., “give me top 10 similar embeddings where user_status = ‘active’ AND time > X”).

• Need to understand low level planner to juggle pre-filtering, post-filtering, and planner’s cost model wasn’t built for vector similarity search. For a system not designed primarily as a vector DB, this gets complex. Users shouldn't have to worry about such low level details.

3. Lack of support for full-text search / hybrid search

• Purpose built vector db such as Milvus has mature full-text search / BM25 / Sparse vector support.

tacoooooooo•3mo ago

well said! we demo'd milvus (or zilliz i should say,) and while we didn't ultimately go with it--it seems like a great option

jrochkind1•3mo ago

this is a big problem in programmer blog posts. It used to be I could find blog posts by peopel who had actually done the thing ("in anger").

Now it's someone who decided writing up the thing would draw clicks, and googled just enough to write the thing, may or may not have actually even fired it up at all -- may not have even written it, perhaps had AI write it.

It makes any of these blog posts pretty terrible guides.

I used to try at least downvoting these on say reddit when it was obviously not written by someone who had their own actual earned knowledge about the thing, but just gave up, because it's nearly everything.

ComputerGuru•3mo ago

I had only heard positive things about pgvector but when you Google comparisons with leading vector dbs you keep getting seo slop from Tiger Data pushing pgvector with very suspicious benchmarks that turned me off it altogether instead https://www.tigerdata.com/blog/pgvector-vs-qdrant

tjwebbnorfolk•3mo ago

> None of the blogs mention that building an HNSW index on a few million vectors can consume 10+ GB of RAM or more

Speaking of "production" -- in what world is "10+ GB" a lot of RAM for a database server?

I have to agree: the author should definitely not use Postgres or pgvector in production...

jmspring•3mo ago

'Nobody’s actually run this in production' - the majority of people who work with postgres don't talk about it or gloat about it because it's a tool that works - including it's addons.

Yes, young engineers get all hot and bothered over the most recent tools but - they have no idea how things work and run.

I worked on a project that wanted to use a hot and frothy vector database. The issue - ok, where are we getting the 1/4-1/2 time person to manage it? Product engineers - derp? what? People who live in node and python cutting edge don't really think about the actual production implications of their choices.

neya•3mo ago

Man, that table comparison definitely looks like it was AI generated. I'm starting to question the whole article itself, now :/

smallerfish•3mo ago

The copy reeks of being AI written, which is ironic given:

> It’s a compelling story. And like most of the AI influencer bullshit that fills my timeline, it glosses over the inconvenient details.

neya•3mo ago

Haha, nice catch

inbx0•3mo ago

I don't have much experience in dedicated vector databases, I've only used pgvector, so pardon me if there's an obvious answer to this, but how do people do similarity search combined with other filters and pagination with separate vector DB? It's a pretty common use case at least in my circles.

For example, give me product listings that match the search term (by vector search), and are made by company X (copanies being a separate table). Sort by vector similarity of the search term and give me top 100?.

We have even largely moved away from ElasticSearch to Postgres where we can, because it's just so much easier to implement with new complex filters without needing to add those other tables' data to the index of e.g. "products" every time.

Edit: Ah I guess this is touched a bit in the article with "Pre- vs. Post-Filtering" - I guess you just do the same as with ElasticSearch, predict what you'll want to filter with, add all of that to metadata and keep it up to date.

Xx_crazy420_xX•3mo ago

The author (human or llm) flips between performance ("millions of vectors") and semantic accuracy ("only 3 match your filter") to push its point, depending on what needs to look worse. AI framing switch that was that was probably introduced by RLHF on humans that don't think critically but want somewhat convincing answers.

For pre-filtering "You’re still searching millions of vectors" isn't valid argument, because the author does not relate to any alternative, and post-filtering is even worse.

tacoooooooo•3mo ago

Author is a human :). Performance and semantic accuracy are both important. The point about pre-filtering _youre still searching millions of vectors_ is important because once you apply a filter you can no longer use your vector index. And doing a full scan on millions of vectors is quite expensive

Xx_crazy420_xX•3mo ago

Maybe i was just too narrow focused on the comparison itself and did not get that point. Anyways, as a whole was a valuable read, along with hn comments made me reconsider current implementations details in my projects

jgalt212•3mo ago

Has anyone used PGVector and sqlite-vss or sqlite-vector?

redwood•3mo ago

MongoDB's implementation separates the vector index runtime from the transactional processing enabling independent scaling and workload isolation but preserving unified query richness and scale-out via sharding. This is a best of both worlds in my view...

mannyv•3mo ago

"It works, until it doesn't."

The question is, at what point does it not work?

Vacuuming is grief enough. Rebuilding the index sounds like more of a nightmare than with Solr/Lucene. And what happens when indexing fails? In Solr/Lucene it used to mean you were dead. I'm sure they fixed that, but at some level you need to either be behind on one while you reindex or figure out some queueing system that works like transactions.

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

I write games in C (yes, C)

Software factories and the agentic moment

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Reinforcement Learning from Human Feedback

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

We mourn our craft

Coding agents have replaced every framework I used

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

France's homegrown open source online office suite

72M Points of Interest

The AI boom is causing shortages everywhere else

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

I write games in C (yes, C)

Software factories and the agentic moment

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Reinforcement Learning from Human Feedback

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

We mourn our craft

Coding agents have replaced every framework I used

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

France's homegrown open source online office suite

72M Points of Interest

The AI boom is causing shortages everywhere else

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

The Case Against PGVector

Comments