It's pretty much the same process I would use in an unfamiliar code base. Just ctrl+f the file system till I find the right starting point.
(Well, I didn't overcome my laziness directly. I just switched from being lazy and not setting up vim and Emacs with the integrations, to trying out vscode where this was trivial or already built in.)
It depends, for some languages 'jump to definition' tools ask the same compiler/interpreter that you use to build your code, so it's as accurate as it gets, and it's not 'best effort'.
It also depends a bit on your project, some project are more prone to re-using names or symbols.
> If I was as quick at opening and reading files as claude code, I'd prefer grep with context around the searched term.
Well, Claude probably also doesn't want to have to 'learn' how to use all kinds of different tools for different languages and eco-systems.
I believe that was my experience with IDEs too?
I use both grep and JTD fairly frequently for different use cases.
I meant 'Jump to Definition' as one clear example, not as a definitive enumeration of everything that compiler integration can help you with.
Eg compiler integration is also really useful to show you the inferred types. Even dinosaurs like old-school Java and C have (limited) type inference: inside of expressions. But of course in a language like Haskell or Rust (or even Python) this becomes much more important.
No amount of find+grep+LLM is even remotely there yet.
What do you mean Turing complete? Obviously all 3 programs are running on a Turing complete machine. Xargs is a runner for other commands, obviously those commands can be Turing complete.
I haven't heard of anybody working on a _proof_ for the Turing completeness of xargs, and I think the only conference willing to publish it would be Sigbovik.
These corpora have a high degree of semantic ambiguity among other tricky and difficult to alleviate issues.
Other types of text are far more amenable to RAG and some are large enough that RAG will probably be the best approach for a good while.
For example: maintenance manuals and regulation compendiums.
LLMs have a similar issue with their context windows. Go back to GPT-2 and you wouldn't have been able to load a text file into its memory. Slowly the memory is increasing, same as it did for the early computers.
So if one were building say a memory system for an AI chat bot, how would you save all the data related to a user? Mother's name, favorite meals, allergies? If not a Vector database like pinecone, then what? Just a big .txt file per user?
Grep works great when you have thousands of files on a local filesystem that you can scan in milliseconds. But most enterprise RAG use cases involve millions of documents across distributed systems. Even with 2M token context windows, you can't fit an entire enterprise knowledge base into context. The author acknowledges this briefly ("might still use hybrid search") but then continues arguing RAG is obsolete.
The bigger issue is semantic understanding. Grep does exact keyword matching. If a user searches for "revenue growth drivers" and the document discusses "factors contributing to increased sales," grep returns nothing. This is the vocabulary mismatch problem that embeddings actually solve. The author spent half the article complaining about RAG's limitations with this exact scenario (his $5.1B litigation example), then proposes grep as the solution, which would perform even worse.
Also, the claim that "agentic search" replaces RAG is misleading. Recent research shows agentic RAG systems embed agents INTO the RAG pipeline to improve retrieval, they don't replace chunking and embeddings. LlamaIndex's "agentic retrieval" still uses vector databases and hybrid search, just with smarter routing.
Context windows are impressive, but they're not magic. The article reads like someone who solved a specific problem (code search) and declared victory over a much broader domain.
More importantly, it’s a lot easier to fine tune a reranker on behavior data than an LLM that makes dozens of irrelevant queries.
Of course, the devil is in the details and there’s five dozen reasons why you might choose one approach over the other. But it is not clear that using a reranker is always slower.
From there the model can handle 100–200 full docs and jot notes into a markdown file to stay within context. That’s a very different workflow than classic RAG.
You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG. And doing that intelligently means you're back to using embeddings anyway.
The workflow works great for codebases with consistent terminology. For enterprise knowledge bases with varied language and conceptual queries, grep alone can't get you to the right candidates.
The chunk, embed, similarity search method was just a way to get a decent classical search pipeline up and running with not too much effort.
> You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG.
in this scenario "you" are not implementing anything - the agent will do this on its own
this is based on my experience using claude code in a codebase that definitely does not have consistent terminology
it doesn't always work but it seemed like you were thinking in terms of trying to get things right in a single grep when it's actually a series of greps that are informed by the results of previous ones
But on top of this I would also use AI to create semantic maps, like hierarchical structure of content, and put that table of contents in the context, let the AI explore it. This helps with information spread across documents/chapters. It provides a directory to access anything without RAG, by simply following links in a tree. Deep Research agents build this kind of schema while they operate across sources.
To explore this I built an graph MCP memory system where the agent can search both by RAG and text matching, and when it finds top-k nodes it can expand out by links. Writing a node implies having the relevant nodes first loaded up, and when generating the text, place contextual links embedded [1] like this. So simply writing a node also connects it to the graph in all the right points. This structure fits better with the kind of iterative work LLMs do.
Obviously that's not the optimal approach for every use case, but there's a lot where IMO it was better. In particular I was hoping to spend more time exploring it in an enterprise context where you've got complicated sharing and permission models to take into consideration. If you have agents simply passing through the permission of the user executing the search whatever you get back is automatically constrained to only the things they had access to in that moment. As opposed to other approaches where you're storing a representation of data in one place, and then trying to work out the intersection of permissions from one of more other systems, and sanitise the results on the way out. Always seemed messy and fraught with problems and the risk of leaking something you shouldn't.
Great point, but this grep in a loop probably falls apart (i.e. becomes non-performant) at 1000s of docs, not millions and 10s of simultaneous users
And if you think those deals are bogus, like I do, you still need to explain surging electricity prices.
Generative AI is here to stay, but I have a feeling we will look back on this period of time in software engineering as a sort of dark age of the discipline. We've seemingly decided to abandon almost every hard won insight and practice about building robust and secure computational systems overnight. It's pathetic that this industry so easily sold itself to the illogical sway of marketers and capital.
What are you implying. Capital always owned the industry except some really small coops and FOSS communities.
At that point, you are just doing Agentic RAG, or even just Query Review + RAG.
I mean, yeah, agentic RAG is the future. It's still RAG though.
A great many pundits don't get, that RAG means: "a technique that enables large language models (LLMs) to retrieve and incorporate new information"
So, RAG is a pattern that is as a principle applied to almost every process. Context windows? Ok, I won't get into all the nitty gritty details here (embedded, small storage device, security, RAM defects, cost and storage of contexts for different contexts etc.), just a hint, that the act of filling a context is what? Applied RAG.
RAG is not a architecture, it is a principle. A structured approach. There is a reason, why nowadays many refer to RAG as search engine.
All we know about knowledge, there is only one entity with a infinite context window. We still call it God not cloud.
the improvements needed for the retrieval part are then another topic.
A search is a search. The architecture doesn't care if it's doing an vector search or a text search or a keyword search or a regex search, it's all the same. Deploying a RAG app means trying different search methods, or using multiple methods simultaneously or sequentially, to get the best performance for your corpus and use case.
> The architecture doesn't care
The architecture does care because latency, recall shape, and failure modes differ.
I don't know of any serious RAG deployments that don't use vectors. I'm referring to large scale systems, not hobby projects or small sites.
RAG means any kind of data lookup which improves LLM generation results. I work in this area and speak to tons of companies doing RAG and almost all these days have realised that hybrid approaches are way better than pure vector searches.
Standard understanding of RAG now is simply adding any data to the context to improve the result.
I'm not a super smart AI person, but grepping through a codebase sounds exactly like what RAG is. Isn't tool use just (more sophisticated) RAG?
Only the most basic "hello world" type RAG systems rely exclusively on vector search. Everybody has been doing hybrid search or multiple simultaneous searches exposed through tools for quite some time now.
It bugs me, because the acronym should encompass any form of retrieval - but in practice, people use RAG to specifically refer to embedding-vector-lookups, hence it making sense to say that it's "dying" now that other forms of retrieval are better.
Almost all tool calls would result in rag.
Rag is dead just means rolling out your own search and manually injecting results into context is dead (just use tools). It means the chunking techniques are dead.
If you want to know "how are tartans officially registered" you don't want to feed the entire 554kb wikipedia article on Tartan to your model, using 138,500 tokens, over 35% of gpt-5's context window, with significant monetary and latency cost. You want to feed it just the "Regulation>Registration" subsection and get an answer 1000x cheaper and faster.
If I make a semantic search over my organization's Policy As Code procedures or whatever and give it to Claude Code as an MCP, does Claude Code suddenly stop being agentic?
However, RAG has been used as a stand in for a specific design pattern where you retrieve data at the start of a conversation or request and then inject that into the request. This simple pattern has benefits compared to just using sending a prompt by itself.
The point the author is trying to make is that this pattern kind of sucks compared to Agentic Search, where instead of shoving a bunch of extra context in at the start you give the model the ability to pull context in as needed. By switching from a "push" to a "pull" pattern, we allow the model to augment and clarify the queries it's making as it goes through a task which in turn gives the model better data to work with (and thus better results).
I am definitely more aligned with needing what I would rather call 'Deep Semantic Search and Generation' - the ability to query text chunk embeddings of... a 100k PDF's, using the semantics to search for the closeness of the 'ideas', those fed into the context of the LLM, and then the LLM generate a response to the prompt citing the source PDF(s) the closest matched vectors came from...
That is the killer app of a 'deep research' assistant IMO and you don't get that via just grepping words and feeding related files into the context window.
The downside is, how to generate embeddings of massive amounts of mixed-media files and store in a database quickly and cheaply compared to just grepping a few terms from said files? A CPU grep of text in files in RAM is like five orders of magnitude faster than an embedding model on the GPU generating semantic embeddings of the chunked file and then storing those for later.
What was described as 'RAG' a year ago now is a 'knowledge search in vector db MCP', with the actual tool and mechanism of knowledge retrieval being the exact same.
Claude Code is better, but still frustrating.
Yeah I found this very confusing. Sad to see such a poor quality article being promoted to this extent.
Saying grep is also RAG is like saying ext4 + grep is a database.
grep + agentic LLM is not RAG.
The only difference is that the advertising will be much more insidious and manipulative, the data collection far easier since people are already willingly giving it up, and the business much more profitable.
I can hardly wait.
Still, that single tender can be on the order of a billion tokens. Even if the LLM supported that insane context window, it's roughly 4GB that need to be moved and with current LLM prices, inference would be thousands of dollars. I detailed this a bit more at https://www.tenderstrike.com/en/blog/billion-token-tender-ra...
And that's just one (though granted, a very large) tender.
For the corpus of a larger company, you'd probably be looking at trillions of tokens.
While I agree that delivering tiny, chopped up parts of context to the LLM might not be a good strategy anymore, sending thousands of ultimately irrelevant pages isn't either, and embeddings definitely give you a much superior search experience compared to (only) classic BM25 text search.
Embeddings had some context size limitations in our case - we were looking at large technical manuals. Gemini was the first to have a 1m context window, but for some reason its embedding window is tiny. I suspect the embeddings might start to break down when there's too much information.
I’ve used LightRAG and looking to integrate it with OpenWebUI and possibly air weave which was a show HN earlier.
My data is highly structured and has references between documents, so I wanted to leverage that structure for better retrieval and reasoning.
For graph/tree document representations, it’s common in RAG to use summaries and aggregation. For example, the search yields a match on a chunk, but you want to include context from adjacent chunks — either laterally, in the same document section, or vertically, going up a level to include the title and summary of the parent node. How you integrate and aggregate the surrounding context is up to you. Different RAG systems handle it differently, each with its own trade offs. The point is that the system is static and hardcoded.
The agentic approach is: instead of trying to synthesize and rank/re-rank your search results into a single deliverable, why not leave that to the LLM, which can dynamically traverse your data. For a document tree, I would try exposing the tree structure to the LLM. Return the result with pointers to relevant neighbor nodes, each with a short description. Then the LLM can decide, based on what it finds, to run a new search or explore local nodes.
The trick that has elevated RAG, at least for my use cases, has been having different representations of your documents, as well as sending multiple permutations of the input query. Do as much as you can in the VectorDB for speed. I'll sometimes have 10-11 different "batched" calls to our vectorDB that are lightning quick. Then also being smart about what payloads I'm actually pulling so that if I do use the LLM to re-rank in the end, I'm not blowing up the context.
TLDR: Yes, you actually do have to put in significant work to build an efficient RAG pipeline, but that's fine and probably should be expected. And I don't think we are in a world yet where we can just "assume" that large context windows will be viable for really precise work, or that costs will drop to 0 anytime soon for those context windows.
Actually, let me be specific: everything from "The Rise of Retrieval-Augmented Generation" up to "The Fundamental Limitations of RAG for Complex Documents" is good and fine as given, then from "The Emergence of Agentic Search - A New Paradigm" to "The Claude Code Insight: Why Context Changes Everything" (okay, so the tone of these generated headings is cringey but not entirely beyond the pale) is also workable. Everything else should have been cut. The last four paragraphs are embarrassing and I really want to caution non-native English speakers: you may not intuitively pick up on the associations that your reader has built with this loudly LLM prose style, but they're closer to quotidian versions of the [NYT] delusion reporting than you likely mean to associate with your ideas.
[NYT]: https://www.nytimes.com/2025/08/08/technology/ai-chatbots-de...
Frankly, reading through this at makes me feel as though I am a business analyst or engineering manager being presented with a project proposal from someone very worried that a competing proposal will take away their chance to shine.
As it reaches the end, I feel like I'm reading the same thing, but presented to a Buzzfeed reader.
This makes it possible to quickly deploy this on coolify and quickly build an agent that can use ripgrep on any of your uploaded files.
These types of articles regularly come from people who don't actually build SCALE systems with LLMs. Or, people who want to sell you on a new tech. And the frustrating thing is: They ain't even wrong.
Top-K RAG via vector search is not a sufficient solution. It never really was for most interesting use-cases.
Of course, take easiest and most structured - in a sense perfectly indexed - data (code repos) and claim that "RAG is dead". Again. Now try this with billions of unstructured tokens where the LLM really needs to do something with the entire context (like, confirm that something is NOT in the documents), where even the best LLM loses context coherence after like 64k tokens for complex tasks. Good luck!
The truth is: Whether its Agentic RAG, Graph RAG, or a combination of these with ye olde top-k RAG - it's still RAG. You are going to Retrieve, and then you are going to use a system of LLM agents to generate stuff with it. You may now be able to do the first step smarter. It's still Rag tho.
The latest Antrophic whoopsy showed that they also haven't solved the context rot issue. Yes you can get a 1M context scaled version of Claude, but then the small/detail scale performance is so garbage that misrouted customers loose their effin mind.
"My LLM is just gonna ripgrep through millions of technical doc pdfs identified only via undecipherable number-based filenames and inconsistent folder structures"
lol, and also, lmao
My first real AI use (beyond copy-paste ChatGPT) was Claude Code. I figured out in a few days to just write scripts and CLAUDE.md how to use them. For instance, one that prints comments and function names in a file is a few lines of python. MCP seemed like context bloat when a `tools/my-script -h` would put it in context on request.
Eventually stumbled on some more RAG a few weeks later, so decided to read up on it and... what? That's it? A 'prelude function' to dump 'probably related' things into the context?
It seems so obviously the wrong way to go from my perspective, so am I missing something here?
This is a weakness, not a strength of agentic search
In 10-k and 10-q often there are no table headers. This is particularly true for the consolidated notes to financial statements section. Standalone tables could be pretty much meaningless because you won't even know what they are reporting. For example, a table that simply mentions terms like beginning balance and ending balance can be reporting inventory, warranty, or short term debt. But the table does not mention these metrics at all and there are no headers. So I am curious to know how Fintool uses standalone tables. Do you retain text surrounding the tables in the same chunk as the table?
One can use any and all available search mechanisms, SQL, graph db, regex, keyword and so on, for the retrieval part.
If you ever saw Claude Code/Codex use grep, you will find that it constructs complex queries that encompass a whole range of keywords which may not even be present in the original user query. So the 'semantic meaning' isn't actually lost.
And nobody is putting an entire enterprise's knowledge base inside the context window. How many enterprise tasks are there that need referencing more that a dozen docs? And even those that do, can be broken down into sub-tasks of manageable size.
Lastly, nobody here mentions how much of a pain it is to build, maintain and secure an enterprise vector database. People spend months cleaning the data, chunking and vectorizing it, only for newer versions of the same data making it redundant overnight. And good look recreating your entire permissioning and access control stack on top of the vector database you just created.
The RAG obituary is a bit provocative, and maybe that's intentional. But it's surprising how negative/dismissive the reactions in this thread are.
I think this sums it up well. Working with LLMs is already confusing and unpredictable. Adding a convoluted RAG pipeline (unless it is truly necessary because of context size limitations) only makes things worse compared to simply emulating what we would normally do.
I was using qdrant, but im considering moving to OpenSearch since i want something more complete w/ a dashboard that i can muck around with
thenewwazoo•4mo ago
sebmellen•4mo ago
Retr0id•4mo ago
phainopepla2•4mo ago
sebmellen•4mo ago
tptacek•4mo ago
Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
https://news.ycombinator.com/newsguidelines.html
titanomachy•4mo ago
sebmellen•4mo ago
I think it's fair to point out that many articles today are essentially a little bit of a human wrapper around a core of ChatGPT content.
Whether or not this was AI-generated, the tells of AI-written text are all throughout it. There are some people who have learned to write like the AI talks to them, which is really not much of an improvement over just using the AI as your word processor.
bigwheels•4mo ago
davkan•4mo ago
bigwheels•4mo ago
sebmellen•4mo ago
threecheese•4mo ago
akerl_•4mo ago
EnPissant•4mo ago
IgorPartola•4mo ago
The problem is that HN is one of the few places left where original thoughts are the main reason people are here. Letting LLMs write articles for us here is just not all that useful or fun.
Maybe quarantining AI related articles to their own thing a la Show HN would be a good move. I know it is the predominant topic here for the moment but like there is other interesting stuff too. And articles about AI written by AI so that Google’s AI can rank it higher and show it to more AI models to train on is just gross.
tom_•4mo ago
serf•4mo ago
dymk•4mo ago
tptacek•4mo ago
serf•4mo ago
having a company name pitched at you within the first two sentences is a pretty good give away.
tptacek•4mo ago
dymk•4mo ago
nbstme•4mo ago
SV_BubbleTime•4mo ago
Is anyone disagreeing with that?
nbstme•4mo ago
dymk•4mo ago
momojo•4mo ago
On the whole though, I still learned a lot.
nbstme•4mo ago
tomhow•4mo ago
We don't want LLM-generated content on HN, but we also don't want a substantial portion of any thread being devoted to meta-discussion about whether a post is LLM-generated, and the merits of discussing whether a post is LLM-generated, etc. This all belongs in the generic tangent category that we're explicitly trying to avoid here.
If you suspect it, please use the the established approaches for reacting to inappropriate content: if it's bad content for HN, flag it; if it's a bad comment downvote it; and if there's evidence that it's LLM-generated, email us to point it out. We'll investigate it the same way we do when there are accusations of shilling etc, and we'll take the appropriate action. This way we can cut down on repetitive, generic tangents, and unfair accusations.