The RAG Obituary: Killed by agents, buried by context windows

https://www.nicolasbustamante.com/p/the-rag-obituary-killed-by-agents

79•nbstme•10h ago

Comments

thenewwazoo•3h ago

Why is every article about AI so obviously written using AI? The hallmarks are so obvious and so off putting. It’s like reading ad copy. I hate it.

“This wasn’t just inconvenient; it was architecturally devastating.”

Ugh.

sebmellen•3h ago

It truly is unfortunate. Thankfully most people seem to have an innate immune response to this kind of RLHF slop.

Retr0id•3h ago

Unfortunately this can't be true, otherwise it wouldn't be a product of RLHF.

phainopepla2•3h ago

Crowds can have terrible taste, even if they're made up of people with good (or at least middling) taste

tptacek•3h ago

There are typos in it, too. I don't think this kind of style critique is really on topic for HN.

Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.

https://news.ycombinator.com/newsguidelines.html

titanomachy•3h ago

"This wasn't written by a person" isn't a tangential style critique.

sebmellen•3h ago

Those guidelines that you reference talk almost exclusively about annoyances on the webpage itself, not the content of the article.

I think it's fair to point out that many articles today are essentially a little bit of a human wrapper around a core of ChatGPT content.

Whether or not this was AI-generated, the tells of AI-written text are all throughout it. There are some people who have learned to write like the AI talks to them, which is really not much of an improvement over just using the AI as your word processor.

bigwheels•3h ago

Do you agree that bickering over AI-generated vs. not AI-generated makes for dull discussion? Sliding sewing needles deep into my fingernail bed sounds more appealing than nagging over such minutiae.

davkan•3h ago

Almost as dull as being spoon-fed AI slop articles, yeah.

bigwheels•3h ago

There's an idea - create a website which can accurately assess "Slop-o-Meter" for any link, kind of like what FakeSpot of old did for Amazon products with fake reviews.

threecheese•3h ago

It certainly makes a dull discussion, but frankly we need to have it. Post-AI HN is now a checkbox on a marketing plan - like a GitHub repository is - and I’m sick of being manipulated and sold to in one of the few forums that wasn’t gamed. It’s not minutiae, it’s an overarching theme that’s enshittifying the third places. Heck even having to discuss this is ruining it (yet here I am lol).

akerl_•1h ago

I hate to ruin the magic for you, but HN has been part of marketing plans long before AI.

EnPissant•3h ago

It's more akin to complaining about how Google search results have gotten worse.

IgorPartola•2h ago

It’s also dull to brush my teeth, but I still do it because it is necessary.

The problem is that HN is one of the few places left where original thoughts are the main reason people are here. Letting LLMs write articles for us here is just not all that useful or fun.

Maybe quarantining AI related articles to their own thing a la Show HN would be a good move. I know it is the predominant topic here for the moment but like there is other interesting stuff too. And articles about AI written by AI so that Google’s AI can rank it higher and show it to more AI models to train on is just gross.

tom_•2h ago

I'm not the person you're replying to, but for my part I do actually like to hear when people think it sounds like it's AI-generated.

serf•2h ago

minutiae to me is the effort of loading a page and reading half a paragraph in order to determine the AI tone for myself. The new AI literature frontier has actually added value to reading the comments first on HN in a surprising twist -- saves me the trouble.

xgulfie•3h ago

I agree it sucks but the reason is self-evident, AI enjoyers just love to use AI

dymk•3h ago

I don’t mind articles that have a hint of “an AI helped write this” as long as the content is actually informationally dense and well explained. But this article is an obvious ad, has almost no interesting information or summaries or insights, and has the… weirdly chipper? tone that AI loves to glaze readers with.

tptacek•3h ago

How is this an ad? It's a couple thousand words about how they built something complicated that was then obsoleted.

serf•2h ago

in the same vein that a 'Behind The Scenes Look At The Making of Jurassic Park' is , in fact, an ad.

having a company name pitched at you within the first two sentences is a pretty good give away.

tptacek•2h ago

3/4 of what hits the front page is an "ad" by that standard. I don't see how you can get less promotional than a long-form piece about why your tech is obsolete. Seems just mean-spirited.

dymk•2h ago

It’s because the article’s main goal is to sell me the company’s product, not inform me about RAG. It’s a zero calorie article.

nbstme•2h ago

haha so true!

nbstme•2h ago

Why call it an ad? It’s not even on the company site. I only mentioned my company upfront so people get context (why we had to build a complex RAG pipeline, what kinds of documents we’re working with, and why the examples come from real production use cases).

dymk•2h ago

It stands out because the flow and tone was clearly AI generated. It’s fluff, and I don’t trust it was written by a human who wasn’t hallucinating the non-company related talking points.

momojo•3h ago

I'm guessing first draft was AI. I had to re-read that part a couple times because the flow was off. That second paragraph was completely unnecessary too since the previous paragraph already got the point across that "context window small in 2022".

On the whole though, I still learned a lot.

nbstme•2h ago

Thanks! Sorry if the flow was off

themanmaran•3h ago

I'm always amazed at claude codes ability to build context by just putting grep in a for loop.

It's pretty much the same process I would use in an unfamiliar code base. Just ctrl+f the file system till I find the right starting point.

eru•2h ago

That's what I used to use as a human, but then I finally overcame my laziness in setting up integration between my editor and compiler (and similar) and got 'jump to definition' working.

(Well, I didn't overcome my laziness directly. I just switched from being lazy and not setting up vim and Emacs with the integrations, to trying out vscode where this was trivial or already built in.)

nbstme•2h ago

It's mind blowing. It's so simple, elegant and... effective! Grep+glob and a lot of iterations is all we need.

Analemma_•2h ago

We always suspected find+grep+xargs was Turing-complete, and now Claude is proving it.

nbstme•2h ago

Exactly. AGI implies minimal tooling and very primitive tools.

kingjimmy•3h ago

Has it not dawned on the author how ironic calling embeddings and retrieval pipelines "a nightmare of edge cases" when talking about LLM

nbstme•2h ago

Haha! LLMs themselves are pure edge cases because they are non-deterministic. But if you add a 7-step pipeline on top of that, it's edge cases on top of edge cases.

djoldman•3h ago

... for this specific use case (financial documents).

These corpora have a high degree of semantic ambiguity among other tricky and difficult to alleviate issues.

Other types of text are far more amenable to RAG and some are large enough that RAG will probably be the best approach for a good while.

For example: maintenance manuals and regulation compendiums.

nbstme•2h ago

Why? What if LLMs could parallelize much of their reading and then summarize the findings into a markdown file, eliminating the need for complicated search?

redwood•3h ago

Weird to see the use case referenced specifically code search when that's a very targeted one rather than what general purpose agents (or RAG) use cases might target.

nbstme•2h ago

The main use case I referenced is SEC filings search, which is quite different from code. Filings are much longer, less structured, and more complex, with tables and footnotes.

hluska•2h ago

I’m sure that was your intent but why did you get bogged down talking about code?

nbstme•2h ago

hum because Claude Code pioneered the 'grep/glob/read' paradigm, so I felt the need to explain that what works well for coding files can also be applied to more complex documents.

hluska•2h ago

Did you consider using words to explain that? I don’t think you pay yourself by the word.

aussieguy1234•2h ago

grep was invented at a time when computers had very small amounts of memory, so small that you might not even be able to load a full text file. So you had tools that would edit one line at a time, or search through a text file one line at a time.

LLMs have a similar issue with their context windows. Go back to GPT-2 and you wouldn't have been able to load a text file into its memory. Slowly the memory is increasing, same as it did for the early computers.

nbstme•2h ago

Agree. It's a context/memory issue. Soon LLMs will have a 10M context window and they won't need to search. Most codebases are less than 10M tokens.

selcuka•2h ago

I don't find this surprising. We are constantly finding workarounds for technical limitations, then ditch them when the limitation no longer exists. We will probably be saying the same thing for LLMs in a few years (when a new machine learning related TLA becomes the hype).

nbstme•2h ago

100%. The speed of change is wild. With each new model, we end up deleting thousands of lines of code (old scaffolding we built to patch the models’ failures.)

sergiotapia•2h ago

>The winners will not be the ones who maintain the biggest vector databases, but the ones who design the smartest agents to traverse abundant context and connect meaning across documents.

So if one were building say a memory system for an AI chat bot, how would you save all the data related to a user? Mother's name, favorite meals, allergies? If not a Vector database like pinecone, then what? Just a big .txt file per user?

nbstme•2h ago

Exactly. Just a markdown file per user. Anthropic recommends that.

queenkjuul•22m ago

Any kind of database is far too efficient for an LLM, just take all your markdown and turn it into less markdown.

davidmckayv•2h ago

This glosses over a fundamental scaling problem that undermines the entire argument. The author's main example is Claude Code searching through local codebases with grep and ripgrep, then extrapolates this to claim RAG is dead for all document retrieval. That's a massive logical leap.

Grep works great when you have thousands of files on a local filesystem that you can scan in milliseconds. But most enterprise RAG use cases involve millions of documents across distributed systems. Even with 2M token context windows, you can't fit an entire enterprise knowledge base into context. The author acknowledges this briefly ("might still use hybrid search") but then continues arguing RAG is obsolete.

The bigger issue is semantic understanding. Grep does exact keyword matching. If a user searches for "revenue growth drivers" and the document discusses "factors contributing to increased sales," grep returns nothing. This is the vocabulary mismatch problem that embeddings actually solve. The author spent half the article complaining about RAG's limitations with this exact scenario (his $5.1B litigation example), then proposes grep as the solution, which would perform even worse.

Also, the claim that "agentic search" replaces RAG is misleading. Recent research shows agentic RAG systems embed agents INTO the RAG pipeline to improve retrieval, they don't replace chunking and embeddings. LlamaIndex's "agentic retrieval" still uses vector databases and hybrid search, just with smarter routing.

Context windows are impressive, but they're not magic. The article reads like someone who solved a specific problem (code search) and declared victory over a much broader domain.

CuriouslyC•2h ago

Agentic retrieval is really more a form of deep research (from a product standpoint there is very little difference). The key is that LLMs > rerankers, at least when you're not at webscale where the cost differential is prohibitive.

nbstme•2h ago

LLMs > rerankers. Yes! I don't like rerankers. They are slow, the context window is small (4096 tokens), it's expensive... It's better when the LLM reads the whole file versus some top_chunks.

nbstme•2h ago

Appreciate the feedback. I’m not saying grep replaces RAG. The shift is that bigger context windows let LLMs just read whole files, so you don’t need the whole chunk/embed pipeline anymore. Grep is just a quick way to filter down candidates.

From there the model can handle 100–200 full docs and jot notes into a markdown file to stay within context. That’s a very different workflow than classic RAG.

davidmckayv•2h ago

That's fair, but how do you grep down to the right 100-200 documents from millions without semantic understanding? If someone asks "What's our supply chain exposure?" grep won't find documents discussing "vendor dependencies" or "sourcing risks."

You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG. And doing that intelligently means you're back to using embeddings anyway.

The workflow works great for codebases with consistent terminology. For enterprise knowledge bases with varied language and conceptual queries, grep alone can't get you to the right candidates.

cwyers•1h ago

Classical search

pjm331•1h ago

the agent greps for the obvious term or terms, reads the resulting documents, discovers new terms to grep for, and the process repeats until its satisfied it has enough info to answer the question

> You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG.

in this scenario "you" are not implementing anything - the agent will do this on its own

this is based on my experience using claude code in a codebase that definitely does not have consistent terminology

it doesn't always work but it seemed like you were thinking in terms of trying to get things right in a single grep when it's actually a series of greps that are informed by the results of previous ones

jgalt212•2h ago

> Grep works great when you have thousands of files on a local filesystem that you can scan in milliseconds. But most enterprise RAG use cases involve millions of documents across distributed systems

Great point, but this grep in a loop probably falls apart (i.e. becomes non-performant) at 1000s of docs, not millions and 10s of simultaneous users

nbstme•2h ago

Why does grep in a loop fall apart? It’s expensive, sure, but LLM costs are trending toward zero. With Sonnet 4.5, we’ve seen models get better at parallelization and memory management (compacting conversations and highlighting findings).

voidhorse•2h ago

Not to mention, unless you want to ship entire containers, you are beholden to the unknown quirks of tools on whatever system your agent happens to execute on. It's like taking something already nondeterministic and extremely risky and ceding even more control—let's all embrace chaos.

Generative AI is here to stay, but I have a feeling we will look back on this period of time in software engineering as a sort of dark age of the discipline. We've seemingly decided to abandon almost every hard won insight and practice about building robust and secure computational systems overnight. It's pathetic that this industry so easily sold itself to the illogical sway of marketers and capital.

queenkjuul•51m ago

Mostly, i agree, except that the industry (from where I'm standing) has never done much else but sell itself to marketers and capital.

CuriouslyC•2h ago

RAG isn't dead, RAG is just fiddly, you need to tune retrieval to the task. Also, grep is a form of RAG, it just doesn't use embeddings.

nbstme•2h ago

Yes my point is that the entire RAG pipeline like ingest, chunk, embed, search with Elastic, rerank is in decline. Grep is far simpler. It’s trivial.

dkga•2h ago

RAG is the new US dollar, now every year someone will predict its looming death…

nbstme•2h ago

HAHAHA. Ok let's call it "transformation." As i wrote "The next decade of AI search will belong to systems that read and reason end-to-end. Retrieval isn’t dead—it’s just been demoted."

jgalt212•2h ago

I'm not feeling it. Constantly pinging these yuge LLMs is not economic and not good for sensitive docs.

nbstme•2h ago

But don’t you think LLM pricing is heading toward zero? It seems to halve every six months. And on privacy, you can hope model providers won’t train on your data, (but there’s no guarantee)

queenkjuul•20m ago

I don't see how it can trend to zero when none of the vendors are profitable. Uber and doordash et. al. increased in price over time. The era of "free" LLM usage can't be permanent

cmenge•1h ago

We're processing tenders for the construction industry - this comes with a 'free' bucket sort from the start, namely that people practically always operate only on a single tender.

Still, that single tender can be on the order of a billion tokens. Even if the LLM supported that insane context window, it's roughly 4GB that need to be moved and with current LLM prices, inference would be thousands of dollars. I detailed this a bit more at https://www.tenderstrike.com/en/blog/billion-token-tender-ra...

And that's just one (though granted, a very large) tender.

For the corpus of a larger company, you'd probably be looking at trillions of tokens.

While I agree that delivering tiny, chopped up parts of context to the LLM might not be a good strategy anymore, sending thousands of ultimately irrelevant pages isn't either, and embeddings definitely give you a much superior search experience compared to (only) classic BM25 text search.

Brother, I am troubled [video]

Show HN: Notestorm – a privacy-first AI scratchpad I made for quick idea dumps

The Devastating Decline of a Brilliant Young Coder (2020)

Gene Ray (Time Cube Guy)

Show HN: Agent Message Transfer Protocol

iPhone 17 Pro Camera Review: Rule of Three

Companies Should Prioritize Culture over Obsession with AI Tools

Dynamic Denial of Crawlers

Verify Identities During Self-Service Registration

Does Your Backyard Need a Stegosaurus?

The Fatima Sun Miracle: More Than You Wanted to Know

Network State, or a Network of States?

Alcohol in Early America

Computers that want things: The search for Artificial General Intelligence

Foldit

China's Gen Z

Substack is a social media app

AMD's EPYC 9355P: Inside a 32 Core Zen 5 Server Chip

Inkscape

Bypassing TLS Certificate Validation with Ld_preload

An Open-Source Framework for Building Stable and Reliable LLM-Powered Systems

Gist of Go: Atomics

Neuro+ GBrain

More Speculations on Arenas in C++

Apple develops new chip to replace Bluetooth

Yahoo nears deal to sell AOL to Bending Spoons for $1.4B

Merriam-Webster: Our NEW Large Language Model will be released on 11.18.25

Sora 2 AI Video Generator – Cinematic 60-Second AI Videos

Measuring product success with the Joy Score

Fast, Cheap, Good: Choose Three

Brother, I am troubled [video]

Show HN: Notestorm – a privacy-first AI scratchpad I made for quick idea dumps

The Devastating Decline of a Brilliant Young Coder (2020)

Gene Ray (Time Cube Guy)

Show HN: Agent Message Transfer Protocol

iPhone 17 Pro Camera Review: Rule of Three

Companies Should Prioritize Culture over Obsession with AI Tools

Dynamic Denial of Crawlers

Verify Identities During Self-Service Registration

Does Your Backyard Need a Stegosaurus?

The Fatima Sun Miracle: More Than You Wanted to Know

Network State, or a Network of States?

Alcohol in Early America

Computers that want things: The search for Artificial General Intelligence

Foldit

China's Gen Z

Substack is a social media app

AMD's EPYC 9355P: Inside a 32 Core Zen 5 Server Chip

Inkscape

Bypassing TLS Certificate Validation with Ld_preload

An Open-Source Framework for Building Stable and Reliable LLM-Powered Systems

Gist of Go: Atomics

Neuro+ GBrain

More Speculations on Arenas in C++

Apple develops new chip to replace Bluetooth

Yahoo nears deal to sell AOL to Bending Spoons for $1.4B

Merriam-Webster: Our NEW Large Language Model will be released on 11.18.25

Sora 2 AI Video Generator – Cinematic 60-Second AI Videos

Measuring product success with the Joy Score

Fast, Cheap, Good: Choose Three

The RAG Obituary: Killed by agents, buried by context windows

Comments