Ask HN: How would you architect a RAG system for 10M+ documents today?

23•Ftrea•2mo ago

I'm tasked with building a private AI assistant for a corpus of 10 million text documents (living in PostgreSQL). The goal is semantic search and chat, with a requirement for regular incremental updates.

I'm trying to decide between:

Bleeding edge: Implementing something like LightRAG or GraphRAG.

Proven stack: Standard Hybrid Search (Weaviate/Elastic + Reranking) orchestrated by tools like Dify.

For those who have built RAG at this scale:

What is your preferred stack for 2025?

Is the complexity of Graph/LightRAG worth it over standard chunking/retrieval for this volume?

How do you handle maintenance and updates efficiently?

Looking for architectural advice and war stories.

Comments

parentheses•2mo ago

If it's < 100M, with vectors of 1024 size, you could fit all of that in ~100G of memory. So, maybe storing it in memory is an easy way to go about it. This ignores a lot of "database problems". If the docs are changing constantly, or uou have other scalability concerns, you may be better off using a "proper" vector db. There have been HN postings which indicate vector db choice matters. Do your research there.

Ftrea•2mo ago

Agreed. Pure in-memory is too risky for us given the persistence requirements and monthly updates. We are definitely going with a 'proper' DB (likely Postgres+pgvector or Weaviate) to handle the state and updates reliably.

walpurginacht•2mo ago

do you have an evaluation in place that necessitates complex stuffs? If not I'd start simple with proven stuffs and collect usage data to determine what's next

Ftrea•2mo ago

This is the sanity check we needed. We don't have a benchmark yet necessitating complex graph architectures. We will stick to 'Proven Stuffs' first: A solid Hybrid Search (Vector + Keyword) baseline. We'll collect usage data and only complicate the stack if the baseline fails on specific queries.

journal•2mo ago

ranked hierarchical pagination and intermediate context control. also, text documents in database or text data in worth of 10 million documents? If you OCR, why not cache result? Also, Lucene white space tokenization is pretty good for dumb exact or close enough to get a filtered result that might fit the context windows better. imagine having to ocr and llm, instantly. i would do everything to avoid architecting a system like that. not sure if you're pointing the right end of the stick at the right problem. are you intending to max out your allowed context? what's going on here? you can usually extract rough set before you llm so ideally you'd never exceed 50% of context. How big of responses do you expect? you have a lot of options, just throw everything at the problem that's easy to implement first and see what sticks. make sure you got terminal access whereever you do this for max flexibility. i obviously prefer aspnet with psql. what kind of data do you need indexed? lets say you have something stupid like origin and destination based on locations and you need geo index and maybe zipcode database, and some intermediate step to calculate assets within radius, calc some distances and make a decision, adding geo to any problem is a nightmare, but fun, but only the first time. cause you know how to do it now but it takes so long you don't want to. if you have terminal and source you have enough space to maneuver updates, it'll end up being probably a one line to execute an update that takes some time to rebuild your solution and then it seems to automatically slide it under the working app i never experienced any problems. as for database schema changes, push out your production release to where the time between schema changes go down to less than 5% or something extreme but be aware there could be schema changes that are hard to implement even later, but after you're in production it's much harder.

Ftrea•2mo ago

Thanks for the tips. We are strictly doing offline processing (docs are already converted to Markdown stored in DB) to avoid any live OCR latency. Also 100% agreed on filtering—we plan to use metadata/keyword filters (Lucene style) to narrow down the search space before hitting the LLM context window. No intention to verify zipcodes though! :)

osigurdson•2mo ago

Are the documents individually large or fairly small - like a page or two each? If they are small docs since you already have Postgres, you can just add the pgvector extension determine what embeddings that you want to use and try it out without committing to much. Maybe add a hash column first so that you can avoid paying to compute the embeddings again if you decide to use a different approach. They are all basically doing the same math to find things so you aren't going to get magically better results with other things. If the docs are larger then you have to do chunking anyway.

Would the 10M documents be searched with a single vector search or would it be pre-filtered by other columns in your table first. If some prefiltering is happening it naturally make things faster. You will likely want to use regular text / tsvector based search as well and potentially feed the LLM with this as well since vector search isn't perfect.

You would then decide if you want to do re-ranking or not before handing it to the final LLM context window. These days, models are pretty good so they will do their own re-ranking to some extent but depends a bit on cost, latency and the quality of result that you are looking for.

Ftrea•2mo ago

This is extremely helpful. Our docs are indeed small (1-2 pages mostly), so distinct chunking might not even be needed—maybe one vector per doc or page. Since we are already on Postgres, pgvector + tsvector (for hybrid search) seems like the most logical MVP. Question: In your experience, does pgvector with HNSW indexes handle the 10M row scale with low latency (<200ms) for real-time chat? Or does a dedicated DB like Weaviate still offer a significant edge there?

mikert89•2mo ago

chunk the documents, use contextual embeddings, put into the vectordb in postgres

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: Who wants to be hired? (February 2026)

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

Tell HN: Another round of Zendesk email spam

AI Regex Scientist: A self-improving regex solver

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is there anyone here who still uses slide rules?

Kernighan on Programming

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is it just me or are most businesses insane?

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: Who wants to be hired? (February 2026)

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

Tell HN: Another round of Zendesk email spam

AI Regex Scientist: A self-improving regex solver

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is there anyone here who still uses slide rules?

Kernighan on Programming

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is it just me or are most businesses insane?

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?

Ask HN: How would you architect a RAG system for 10M+ documents today?

Comments