Scaling HNSWs

81•cyndunlop•6h ago

https://en.wikipedia.org/wiki/Hierarchical_navigable_small_w...

Comments

softwaredoug•1h ago

At very high scale, there's less usage of graphs. Or there's a set of clustering on top of graphs.

Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.

Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.

HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.

Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.

https://turbopuffer.com/docs/vector

curl-up•52m ago

I'm facing the problem you describe daily. It's especially bad because it's very difficult for me to predict if the set of filters will reduce the dataset by ~1% (in which case following the original vector index is fine) or by 99.99% (in which case you just want to brute force the remaining vectors).

Tried a million different things, but haven't heard of Turbopuffer yet. Any references on how they perform with such additional filters?

inertiatic•24m ago

Lucene and ES implement a shortcut for filters that are restrictive enough. Since it's already optimized for figuring out if something falls into your filter set, you first determine the size of that. You traverse the HNSW normally, then if you have traversed more nodes than your filter set's cardinality, you just switch to brute forcing your filter set distance comparisons. So worst case scenario is you do 2x your filter set size vector distance operations. Quite neat.

curl-up•20m ago

Oh that's nice! Any references on this shortcut? How do you activate that behavior? I was playing around with ES, but the only suggestion I found was to use `count` on filters before deciding (manually) which path to take.

inertiatic•15m ago

Here you go https://github.com/apache/lucene/pull/656 - no need to do anything from the user side to trigger it as far as I know.

spullara•24m ago

Hybrid search with vector similarity and filtering I think has mostly been solved by Vespa and not even recently.

https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-sear...

softwaredoug•19m ago

For sure. But its "solved" differently by every vector database. You have to pay attention to how its solved.

cfors•4m ago

Just curious what the state of the art around filtered vector search results is? I took a quick look at the SPFresh paper and didn't see it specifically address filtering.

simonw•1h ago

This is well worth reading in full. The section about threading is particularly interesting: most of Redis is single-threaded, but antirez decided to use threads for the HNSW implementation and explains why.

dizzant•4m ago

> many programmers are smart, and if instead of creating a magic system they have no access to, you show them the data structure, the tradeoffs, they can build more things, and model their use cases in specific ways. And your system will be simpler, too.

Basically my entire full-time job is spent prosecuting this argument. It is indeed true that many programmers are smart, but it is equally true that many programmers _are not_ smart, and those programmers have to contribute too. More hands is usually better than simpler systems for reasons that have nothing to do with technical proficiency.

A Catalog of Side Effects

Terminal Latency on Windows (2024)

Scaling HNSWs

Cache-friendly, low-memory Lanczos algorithm in Rust

We ran over 600 image generations to compare AI image models

Xortran - A PDP-11 Neural Network With Backpropagation in Fortran IV

A modern 35mm film scanner for home

Pikaday: A friendly guide to front-end date pickers

Creating minimal music with code in any programming language

The history of Casio watches

Show HN: Cactoide – Federated RSVP Platform

iPhone Pocket

Weave (YC W25) is hiring a founding ML engineer

FFmpeg to Google: Fund Us or Stop Sending Bugs

Show HN: Data Formulator – interactive AI agents for data analysis (Microsoft)

Firefox expands fingerprint protections

The AI Surveillance Dystopia: Spying, Data Trafficking, & Corruption

How I fell in love with Erlang

The R47: A new physical RPN calculator

Grebedoc – static site hosting for Git forges

Drawing Text Isn't Simple: Benchmarking Console vs. Graphical Rendering

Array Programming the Mandelbrot Set

Advent of Code on the Z-Machine

Why effort scales superlinearly with the perceived quality of creative work

The 'Toy Story' You Remember

The Perplexing Appeal of the Telepathy Tapes

Show HN: Gametje – A casual online gaming platform

DARPA and Texas Bet $1.4B on Unique Foundry -3D heterogeneous integration

Welcome, the entire land - "Hello, world!" in hieroglyphics (2009)

High speed X-ray video: jumping beans, wind-up toys and more

Scaling HNSWs

Comments

A Catalog of Side Effects

Terminal Latency on Windows (2024)

Scaling HNSWs

Cache-friendly, low-memory Lanczos algorithm in Rust

We ran over 600 image generations to compare AI image models

Xortran - A PDP-11 Neural Network With Backpropagation in Fortran IV

A modern 35mm film scanner for home

Pikaday: A friendly guide to front-end date pickers

Creating minimal music with code in any programming language

The history of Casio watches

Show HN: Cactoide – Federated RSVP Platform

iPhone Pocket

Weave (YC W25) is hiring a founding ML engineer

FFmpeg to Google: Fund Us or Stop Sending Bugs

Show HN: Data Formulator – interactive AI agents for data analysis (Microsoft)

Firefox expands fingerprint protections

The AI Surveillance Dystopia: Spying, Data Trafficking, & Corruption

How I fell in love with Erlang

The R47: A new physical RPN calculator

Grebedoc – static site hosting for Git forges

Drawing Text Isn't Simple: Benchmarking Console vs. Graphical Rendering

Array Programming the Mandelbrot Set

Advent of Code on the Z-Machine

Why effort scales superlinearly with the perceived quality of creative work

The 'Toy Story' You Remember

The Perplexing Appeal of the Telepathy Tapes

Show HN: Gametje – A casual online gaming platform

DARPA and Texas Bet $1.4B on Unique Foundry -3D heterogeneous integration

Welcome, the entire land - "Hello, world!" in hieroglyphics (2009)

High speed X-ray video: jumping beans, wind-up toys and more