Traditional RAG for code uses vector embeddings and similarity search. We use filesystem traversal and AST parsing - following imports, tracing dependencies, reading files in logical order. It's retrieval guided by code structure rather than semantic similarity.
I highly recommend checking out what the Claude Code team discovered (48:00 https://youtu.be/zDmW5hJPsvQ?si=wdGyiBGqmo4YHjrn&t=2880). They initially experimented with RAG using embeddings but found that giving the agent filesystem tools to explore code naturally delivered significantly better results.
From our experience, vector similarity often retrieves fragments that mention the right keywords but miss the actual implementation logic. Following code structure retrieves the files a developer would actually need to understand the problem.
So yes -- I should have been clearer about the terminology. It's not "no retrieval" -- it's structured retrieval vs similarity-based retrieval. And with today's frontier models having massive context windows and sophisticated reasoning capabilities, they're perfectly designed to build understanding by exploring code the way developers do, rather than needing pre-digested embeddings.
Indeed, industry at large sees RAG as equivalent to "vector indexes and cosine similarity w.r.t. input query", and the rest of the article explains thoroughly why that's not the right approach.
Yep, and this is getting really old. Information retrieval is not a new problem domain. Somehow, when retrieved info is fed into an LLM, all nuance is lost and we end up with endless pronouncements about whether retrieval is/is not "dead".
For instance, everyone seems to believe that em dashes are something only an AI would use -- but I've been using them in my writing for a long time.
It would be wondeful if some of the tools the projects uses are exposed to build on. Like the tools related to AST, finding definitions, and many more
Yes by technicality RAG could mean any retrieval, but in practice when people use the term it’s almost always referring to some sort of vector embedding and similarity searching.
Is it? None of these terms even existed a couple of years ago, and their meaning is changing day by day.
But more importantly, why double and triple down on no RAG? As with most techniques, it has its merits in certain scenarios. I understand getting VC money so you have to prove differentiation and conviction in your approach, but why do it like this? What if RAG does end up being useful? You'll just have to admit you were wrong and cursor and others were right? I don't get it.
Just say we don't believe RAG is as useful for now and we take a different approach. But tripling down on a technique so early into such a new field seems immature to me. It screams of wanting to look different for the sake of it.
Cursor, Zed, Cline (VSCode), and anything else?
I have not tried either. I wanted to try Cursor but it has a bug that is a blocker.
My workflow involves manual copying and pasting from the browser into my own text editor.
Cline with mcp-playwright: https://github.com/executeautomation/mcp-playwright
There is no free tier like other tools, but it's far more transparent in that regard; it uses an Anthropic/OpenAI/etc. API key directly, and your only costs are direct token usage.
The real "aha" moment with this kind of thing is in realizing that the agent can not just use the tool as provided, but it can write scripts to dynamically execute on the page as well. So it's not just going to a page and dumping a bunch of HTML back into the client; it actually analyzes the DOM structure and writes JS snippets to extract relevant information which are then executed in the browser's runtime.
You can also just use Claude Desktop and hook it up to MCP servers.
> ...and this choice isn't an oversight's a fundamental design decision that delivers better code quality, stronger security, and more reliable results
RAG is useful for natural text because there is no innate logic in how it's structured. RAG chunking based on punctuation for natural language doesn't work well because people use punctuation pretty poorly and the RAG models are too small to learn how they can do it themselves.
Source code, unlike natural text, comes with grammar that must be followed for it to even run. From being able to find a definition deterministically, to having explicit code blocks, you've gotten rid of 90% of the reason why you need chunking and ranking in RAG systems.
Just using etags with a rule that captures all the scope of a function I've gotten much higher than sota results when it comes to working with large existing code bases. Of course the fact I was working in lisp made dealing with code blocks and context essentially trivial. If you want to look at blub languages like python and javascript you need a whole team of engineers to deal with all the syntactic cancer.
Since that rag system doesn't, and probably will never, exist we are stuck with vector embeddings as the common definition everyone working in the field uses and understands.
For alternatives to vector search, see GraphRAG and AST parsing; e.g., https://vxrl.medium.com/enhancing-llm-code-generation-with-r... or https://github.com/sankalp1999/code_qa
Which incidentally shows why RAG just means vector store + embedding model, since your definition means different things to different people and an implementation can't exist until we figure out AGI.
1. This argument seems flawed. Codebase search gives it a "foot in the door"; from that point it can read the rest of the file to get the remaining context. This is what Cursor does. It's the benefit of the agentic loop; no single tool call needs to provide the whole picture.
2. This argument is "because it's hard we shouldn't do it". Cursor does it. Just update the index when the code changes. Come on.
3. This argument is also "because it's hard we shouldn't do it". Cursor does it. The embeddings go in the cloud and the code is local. Enforced Privacy Mode exists. You can just actually implement these features rather than throwing your hands up and saying it's too hard.
This honestly makes me think less of Cline. They're wrong about this and it seems like they're trying to do damage control because they're missing a major feature.
Claude Code doesn't do vector indexing, and neither does Zed. There aren't any rigorous studies comparing these tools, but you can find plenty of anecdotes of people preferring the output of Claude Code and/or Zed to Cursor's, and search technique is certainly a factor there!
The code is the authoritative reference.
The files are generated from external sources, pulling together as much information as I could collect. It's a script so I can keep it up to date. I think there is roughly no programmer out there who needs to be told that documentation needs to be up-to-date; this is obvious enough that I'm trying not to be offended by your strawman. You could have politely assumed, since I said I have it working, that it does actually work. I am doing productive work with this; it's not theoretical.
External documentation is presumably already in the LLM's training data, so it should be extraneous to pull it into context. Obviously there's a huge difference between "should be" and "is" otherwise you wouldn't be putting in the work to pull it into context.
- 80%: Information about databases. Schemas, sample rows, sample SQL usages (including buried inside string literals and obscured by ORMs), comments, hand-written docs. I collect everything I can find about each table/view/procedure and stick it in a file named after it.
- 10%: Swagger JSONs for internal APIs I have access to, plus sample responses.
- 10%: Public API documentation that it should know but doesn't.
The last 10% isn't nothing; I shouldn't have to do that and it's as you say. I've particularly had problems with Apple's documentation; higher than expected hallucunation rate in Swift when I don't provide the docs explicitly. Their docs require JavaScript (and don't work with Cursor's documentation indexing feature) which gives me a hunch about what might have happened. It was a pain in the neck for me to scrape it. I expect this part to go away as tooling gets better.
The first 90% I expect to be replaced by better MCP tools over time, which integrate vector indexing along with traditional indexing/exploration techniques. I've got one written to allow AI to interactively poke around the database, but I've found it's not as effective as the vector index.
LLMs are already stochastic. I don't want yet another layer of randomness on top.
Do most models support that much context? I don't think anything close to "most" models support 1M+ context. I'm only aware of Gemini, but I'd love to learn about others.
> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.
Ideally, isn't this a metric that should be included on all model cards? It seems like a crucial metric.
I think the pattern that coined "RAG" is outdated, that pattern being relying on cosine similarities against an index. It was a stop gap for the 4K token window era. For AI copilots, I love Claude Code, Cline's approach of just following imports and dependencies naturally. Land on a file and let it traverse.
No more crossing your fingers with match consign and hoping your reranker did drop a critical piece.
Using Gemini 2.5 pro is also pretty cheap, I think they figured out prompt caching because it definitely was not cheap when it came out.
But the killer app that keeps me using Cursor is Cursor Tab, which helps you WHILE you code.
Whatever model they have for that works beautifully for me, whereas Zed's autocomplete model is the last thing that keeps me away from it.
What do Cline users use for the inline autocomplete model?
I also don't love that Cursor generally plays "context compression" games since they have an incentive to keep their costs minimal. I just don't love any of these tools that try to be a bit too smart as a middleman between you and the LLM (where smart is often defined as 'try to save us the most money or be maximally efficient without the user noticing').
Cline also tries to be smart, but it's all for the benefit of the user. I like the transparent pricing -- you bring your own API key so you're paying the underlying API costs without a middle man markup.
Am I being pennywise and should I just use Cursor directly? Maybe...but I've found my Cline results to be generally better for the more complex queries.
Wasting time and tokens like this is not something to brag about. If indexing is so hard maybe someone should start a company just to do that, like https://news.ycombinator.com/item?id=44097699
It doesn’t seems like what they are doing necessarily replaced RAG, even if it can
The times I’ve implemented RAG, I’ve seen an immediate significant improvement in the answers provided by the model
Maybe they need some metrics to properly assess RAG vs no-RAG
First, like some other comments have mentioned RAG is more than result = library.rag(). I get that a lot of people feel RAG is overhyped, but it's important to have the right mind model around it. It is a technique first. A pattern. Whenever you choose what to include in the context you are performing RAG. Retrieve something from somewhere and put it in context. Cline seems to delegate this task to the model via agentic flows, and that's OK. But it's still RAG. The model chooses (via tool calls) what to Retrieve.
I'm also not convinced that embedding can't be productive. I think nick is right to point out some flaws in the current implementations, but that doesn't mean the concept in itself is flawed. You can always improve the flows. I think there's a lot to gain from having embeddings, especially since they capture things that ASTs don't (comments, doc files, etc).
Another aspect is the overall efficiency. If you have somewhat repetitive tasks, you'll do this dance every time. Hey, fix that thing in auth. Well, let's see where's auth. Read file1. Read file2. Read fileN. OK, the issue is in ... You can RAG this whole process once and re-use (some) of this computation. Or you can do "graphRAG" and do this heavy lifting once per project and have AST + graph + model dump that can be RAGd. There's a lot of cool things you can do.
In general I don't think we know enough about the subject, best practices and useful flows to confidently say "NO, never, nuh-huuh". I think there might be value there, and efficiencies to be gained, and some of them seem like really low hanging fruit. Why not take them?
Cline doesn't.
Aider goes the middle way with repo maps.
Let's see what works best.
Btw. You can also trigger Aider via comments. This way you can mix it with your changes immediately.
Anyway context to me enables a lot more assurance and guarantees. RAG never did.
My favorite workflow right now is:
- Create context with https://github.com/backnotprop/prompt-tower
- Feed it to Gemini
- Gemini Plans
- I pass the plan into my local PM framework
- Claude Code picks it up and executes
- repeat
It’s not clear how context is used to plan by Gemini then the plan is fed to local framework. Do I have to replan every time context changes?
Then they put the plan into their "PM framework" (some markdown files?) to have Claude Code pick tasks out of.
The vector/keyword based RAG results I've seen so far for large code bases (my experience is Cody) has been quite bad. For a smaller projects (using Cursor) it seems to work quite well though.
Roo Code experimental code indexing using vector DB dropped 3 days ago. Theire using Tree-sitter (the same as Aider) to parse sources into ASTs and do vector embedding on that product, instead of plaintext.
This is a hilariously obvious LLM sentence by the way:
> Your codebase isn't just text – it's your competitive advantage
When creating articles aimed at LLM power users (which this one is), just have a human write it. We can see through the slop. Come on, you're VC backed, if you were bootstrapping I wouldn't even be calling this out.
The other arguments I used to agree with - compression and RAG means loss of quality, increasing context windows and decreasing prices means you should just send a lot of context.
Then I tried Augment and their indexing/RAG just works, period, so now I'm not convinced anymore.
> > Your codebase isn't just text – it's your competitive advantage
An LLM would have correctly used an em dash, not an en dash. ;)
This pattern is so prevalent that in any decent LLM business content generation product you're forced to hardcode avoidance/removal of that phrase, otherwise it's bound to show up in every article.
Using Gemini 2.5’s 1MM token context window to work with large systems of code at once immediately feels far superior to any other approach. It allows using an LLM for things that are not possible otherwise.
Of course it’s damn expensive and so hard to do in a high quality way it’s rare luxury, for now…
I feed long context tasks to each new model and snapshot just to test the performance improvements, and every time it's immediately obvious that no current model can handle its own max context. I do not believe any benchmarks, because contrary to the results of many of them, no matter what the (coding) task is, the results start getting worse after just a couple dozen thousand tokens, and after a hundred the accuracy becomes unacceptable. Lost-in-the-middle is still a big issue as well, at least for reasoning if not for direct recall - despite benchmarks showing it's not. LLMs are still pretty unreliable at one-shotting big things, and everything around it is still alchemy.
Ultimately we came to a similar conclusion and put the project on ice: chunking and vector similarity search are fundamentally not great approaches for code RAG.
I don't really agree with most of Cline's other assertions because those are pretty easy to work around (I suspect they may just be content slop?). It's pretty easy to vectorize and re-chunk code as you change it as long as you have a fixed way of encoding vectors, and you can also generate indices or do more expensive changes to encoding as part of your CI/CD. Indices can be stored in your git repo itself so theres not really a security risk either. Our tool made this pretty easy to do. An index can literally just be a file.
No, the problem is really that vector search (especially with a-kNN) is fundamentally a fuzzy/lossy kind of search, and even when the vector search part works perfectly, your choice of k will usually either include more information than you intend or miss information that didn't meet the top-K threshold. And naive implementations that don't add additional context or are unconditionally searching based on your prompt will probably bias or confuse your model with code that might seem relevant but isn't (eg if you are trying to debug deployments, you include a bunch of your code related to deployments, but the bug is in the application code, and also you have a bunch of deployment scripts in your codebase that are for different platforms and are extra irrelevant).
It's significantly more work to make a vector based approach to code-RAG actually good than it is to get a naive implementation working. We have a different approach to Cline but it's similar in that it uses things like references and how the developer actually understands their codebase.
First, large context models essentially index their context as it grows bigger, or else they can't access the relevant parts of it. However it can't be as comprehensive as with RAG. There is also nothing that makes navigating the context from point to point easier than with RAG.
It seems they're trying to convince people of their superiority, but it's BS, so they're trying to bank on less knowledgeable customers.
Indexing is essentially a sorted projection of a larger space, based on the traits and context you care about. There's no magical way for a context to be more accessible, if it has no such semantical indexing, implicit or explicit. Also RAG doesn't mean you can't embed AST and file structure as a concern. A vector is a set of dimensions, a dimension can be literally anything at all. AI is about finding suitable meaning for each dimension and embedding instances in that dimension (and others in combo).
Would that work?
https://docs.roocode.com/features/experimental/codebase-inde...
Augment Code's secret sauce is largely its code indexer, and I find it to be the best coding agent around.
coreyh14444•1d ago
spenczar5•1d ago
theturtle32•1d ago
I agree with you that RAG should be a generic term that is agnostic to the method of retrieval and augmentation. But at the moment, in a lot of people's minds, it specifically means using a vector database.
wat10000•1d ago
PhilippGille•1d ago