VectorDB bench now support S3Vector

https://github.com/zilliztech/VectorDBBench/pull/570

19•redskyluan•1d ago

Comments

falcor84•1d ago

For context for those like myself who weren't familiar with Amazon's S3 Vectors, it's a relatively new S3 bucket type that's optimized for storing vectors for RAG and similar purposes, claiming to the be up to 90% cheaper than storing them in a regular bucket.

https://aws.amazon.com/s3/features/vectors/

throwaw12•1d ago

GitHub PR shows Vespa on the image, but I can't find Vespa results on the VDBBench website https://zilliz.com/vdbbench-leaderboard

Am I missing anything? (I love Vespa.ai)

lemursage•1d ago

It seems that the leaderboard doesn't contain the results for all of the supported DBs (I was looking for the pgvector myself).

The README.md contains a screenshot from local testing that's got more results included: https://github.com/zilliztech/VectorDBBench?tab=readme-ov-fi...

antirez•1d ago

Please note that the Redis supported there is not "Vector Sets" (the new Redis data type) but one of the indexes types of RedisSearch.

And, about such benchmarks: I tested another vector db benchmark, investigated it a bit, found that it was mostly measuring client implementation latencies and other internal inefficiencies...

In Redis with VSIM I can easily get 50k vSIM/seconds with 300 components vectors with redis-benchmark, yet when I tried to write a quick test for one of those engines I got a lot lower numbers because simply vectors are large (makes serialization in Python slow if not well coded), often these tests are written in high level languages, don't account for differences in client libraries speeds.

TLDR? Benchmarking is hard, for vector systems it is harder, and the results of most of such tests are totally irrelevant.

throwaw12•1d ago

> found that it was mostly measuring client implementation latencies and other internal inefficiencies...

As you said benchmarking is hard, but isn't the end to end latency customers will see in their workloads is usually including the client library overheads?

IMO, benchmarks should closely resemble the real world scenarios (excluding variables, e.g. network latency of different cloud providers)

Apache Lucene Analyser Playground

Meeting halfway app: enter two addresses, and the type of place you want to meet

An MCP dev tool better than Anthropic's

American Tourists Are Ripped Off in Some Paris Restaurants

QMapShack: Consumer Grade GIS Software

Specificity: A Weapon of Mass Effectiveness

Tokens: The New Oil – By Kent Beck

After removal of Steam games, journalists investigating the censorship resign

Rethinking the MBA: Character education is the foundation for business leaders

How do the microplastics in our bodies affect our health?

Thermal Robotic Deer

Tax Calendar for Startups

Whitehouse executive order pushes forcible hospitalization of homeless people

Show HN: ChronVer – Chronologic Versioning

Claude Code Collaborative Team Using YAML

Why does technology create new problems for each one it solves?

Diffusion Tube

Claude Code Introduces Specialized Sub-Agents

Dwm Commented

Now I Lay Me (1927)

A command history utility with icons and colors for Windows and GNU/Linux

The "computer janitor" of the Manhattan project

git-restore

Aulasneo Unveils Owly, the AI Agent Transforming LMSs

Rotring 600 Ballpoint Pen

Show HN: BlackMagic-JS – Automatic dark mode framework that just works

Fastabase: Fast Supabase

Carrefour sells Italian branch to NewPrinces Group

Ask HN: Influx of Telegram Spam?

NPM 'accidentally' removes Stylus package, breaks builds and pipelines