Ask HN: MCP/API search vs. vector search – what's winning for you?

4•ngkw•5mo ago

TL;DR: I have a hunch that demand for classic RAG (embeddings + vector DB) will shrink. Reasons:

1. Embedding ops cost (re-indexing, freshness) is high.

2. LLMs are getting good at iterative query expansion over plain search APIs (BM25-style).

3. Embedding quality is still uneven across domains/languages. Curious what you are actually seeing in production.

Context: We’re a \~10-person team inside a large company. People use different UIs (ChatGPT, Claude, Dify, etc.). Cost/security aren’t our main issues; we just want higher throughput. We can wire MCP-style connectors (Notion/Slack/Drive) or run our own vector index—trying to pick battles that really move the needle.

Hypotheses I’m testing:

* For fast-changing corp knowledge, BM25 + LLM query expansion + light re-ranking beats maintaining a vector store (lower ops, decent recall).

* MCP/API search gives “good enough” docs if you union a few expanded queries and re-rank.

* Vectors still win for long-tail semantic matches and noisy phrasing—but only when content is relatively stable or you can afford frequent re-embeds.

What I want from HN (war stories, not vendor pitches):

1. Have you sunset or avoided vector DBs because ops/freshness pain outweighed gains? What were the data size, update rate, and latency targets?

2. If you kept vectors, what made them clearly superior (metrics, error classes, language/domain)? Any concrete thresholds (docs/day churn, avg doc length, query mix) where vectors start paying off?

3. Anyone running pure API search + LLM query expansion (multi-query, aggregation, re-rank) at scale? How many queries per task? Latency/cost vs. vector search?

4. Hybrid setups that worked: e.g., API search to narrow → vector re-rank; or vector recall → LLM judge → final set. What cut false positives/negatives the most?

5. Multilingual/Japanese/domain jargon: where do embeddings still fail you? Did re-ranking (LLM or classic) fix it?

6. Freshness strategies without vectors: caching, recency boosts, metadata filters? What actually reduced “stale answer” complaints?

7. For MCP-style connectors (Notion/Slack/Drive): do you rely on vendor search, or do you replicate content and index yourself? Why?

8. If you’d start from scratch today for a 10-person team, what baseline would you ship first?

Why I’m asking: Our goal is throughput (less time hunting, more time shipping). I’m leaning to:

* Phase 1: MCP/API search + LLM query expansion (3–5 queries), union top-N, local re-rank; no vectors. * Phase 2 (only if needed): add a vector index for the failure cases we can’t fix with expansion/re-rank.

Happy to share a summary of takeaways after the thread. Thanks!

Comments

SquidJack•5mo ago

if you want high throughput want to optimize the every component in the pipeline i try the dragonflydb pretty good comparing other database also if you add reranking like methods the ms gonna high

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body