I 100% agree with the author here. Most of the "LLMs are slowing me down/are trash/etc" discussions I've had at work usually come from people who are not great developers to begin with - they end up tangled into a net of barely vetted code that was generated for them.
I also like how it uses git, and it’s good at using less context (tool calling eats context like crazy!)
This might be your anecdotal experience but in mine, reviewing large diffs of (unvetted agent-written) code is usually not much faster than writing it yourself (especially when you have some mileage in the codebase), nor does it offset the mental burden of thinking how things interconnect and what the side effects might be.
What IMO moves the needle towards slower is that you have to steer the robot (often back and forth to keep it from undoing its own previous changes). You can say it's bad prompting but there's no guarantee that a certain prompt will yield the desired results.
You have to be really good at skimming code quickly and looking at abstractions/logic.
Through my career, I almost never ask another team a question about their services, etc. I always built the habit of just looking at their code first. 9 times out of 10, I could answer my question or find a workaround for the bug in their code.
I think this built the skill of holding a lot of structure in my head and quickly accumulating it.
This is the exact thing you need when letting an agent run wild and then making sure everything looks ok.
EDIT: It shows the side-by-side view by default, but it is easy to toggle to a unified view. There's probably a way to permanently set this somewhere.
This seems to be something both sides of the debate agree on: Their opponents are wrong because they are subpar developers.
It seems uncharitable to me in both cases, and of course it is a textbook example of an ad hominem fallacy.
Claude Code does this, you just have to not click “Yes and accept all changes”
Can't wait until we have useful heuristics for comparing LLM's. This is a problem that comes up constantly (especially in HN comments...)
Agree
> and AI Studio is the only serious product for human-in-the-loop SWE
Disagree. I use Claude Code and Codex daily, and I couldn’t be happier. Had started with Cursor, switched to CLI based agents and never looked back. I use WezTerm, tmux, neovim, Zoxide, and create several tabs and panes and run claude code not only for vibe coding, scripting, analysing files, letting it write concepts, texts, documentation. Totally different kind of computing experience. As if I have several assistants 24/7 at my fingertips.
I was always hesitant to jump into the vibe coding buzz.
A month ago I tried Codex w/ CLI agents and they now take care of all the menial tasks I used to hate that come w/ coding.
My last use case was like this : I had a old codebase code that was using bakbone.js for ui with jquery and a bunch of old js with little documentation to generat UI for a clojure web application.
Gemini was able to unravel this hairball of code and guiding me step by step to htmx. I am not using AI studio; I am using Gemini subscription.
Since I manually patch the code, its like pair programming with an incredibly patient and smart programmer.
For the record, I am too old for vibe coding .. I like to maintain total control over my code and all the abstractions and logic.
AI studio is just another IDE like cursor so its a very odd choice to say one is bad and the other is the holy grail:)
But I guess this is what guerilla advertising is these days.
Just another random account with 8 karma points that just happens to post an article about how one IDE is bad and its almost identical cousin is the bestGoogle does tend to have large contexts and sometimes reasonable prices for it. So if one of the main takeaway is load everything into context then I can certainly understand why author is a fan
OP is actually advocating against Google's latest products here. Surely a submarine would hype Antigravity and Gemini 3 Pro instead?
if google wants to send a check, my email is open, lmao, but for now i'm optimizing for tokens per dollar
This has made a big difference my side. prompt.md that is mostly very natural language markdown. Then ask LLM to turn that into a plan.md that contains phases emphasising that each should be fairly selfcontained. This usually needs some editing but is mostly fine. And then just have it implement each phase one by one.
There are literally hundreds of engineering improvements that we will see along the way like a intelligent replacement to compacting to deal with diff explosion, more raw memory availability and dedicated inference hardware, models that can actually handle >1M context windows without attention loss, and so on.
> By prioritizing the vibe coding use case, Cursor made itself unusable for full-time SWEs.
This has actually been the opposite direction we're building for. If you are just vibing, building prototypes or throwaway code or whatever, then you don't even need to use an IDE or look at the code. That doesn't really make sense for most people, which is why Cursor has different levels of autonomy you can use it for. Write the code manually, or just autocomplete assistance, or use the agent with guardrails - or use the agent in yolo mode.
> One way to achieve that would be to limit the number of lines seen by an LLM in a single read: read first 100 lines
Cursor uses shell commands like `grep` and `ripgrep`, similar to other coding agents, as well as semantic search (by indexing the codebase). The agent has only been around for a year (pretty wild how fast things have moved) and 8 months or so ago, when models weren't as good, you had to be more careful about how much context you let the agent read. For example, not immediately putting a massive file into the context window and blowing it up. This is basically a solved problem today, more or less, as models and agents are much better are reliably calling tools and only pulling in relevant bits, in Cursor and elsewhere.
> Try to write a prompt in build mode, and then separately first run it in plan mode before switching to build mode. The difference will be night and day.
Agree. Cursor has plan mode, and I generally recommend everyone start with a plan before building anything of significance. Much higher quality context and results.
> Very careful with asking the models to write tests or fix code when some of those tests are failing. If the problem is not trivial, and the model reaches the innate context limit, it might just comment out certain assertions to ensure the test passes.
Agree you have to be careful, but with the latest models (Codex Max / Opus 4.5) this is becoming less of a problem. They're much better now. Starting with TDD actually helps quite a bit.
On substance: my critique is less about the quality of the retrieval tools (ripgrep/semantic search are great) and more about the epistemic limits of search. An agent only sees what its query retrieves. For complex architectural changes, the most critical file might be one that shares no keywords with the task but contains a structural pattern that must be mirrored. In those cases, tunnel vision isn't a bug in the search tool but in the concept of search vs. full-context reasoning.
One other friction point I hit before churning was what felt like prompt-level regression to the mean. For trivial changes, the agent would sometimes spin up a full planning phase, creating todo lists and implementation strategies for what should have been a one-shot diff. It felt like a guardrail designed for users who don't know how to decompose tasks, ergo the conclusion about emphasis on vibe coders.
That said, Cursor moves fast, and I'll be curious to see what solution you'll come up with to the unknown unknown dependency problem!
Appropriate.
tcdent•2h ago
And then goes on to recommend AI Studio is a primary dev tool?! Baffling.
esafak•2h ago
> Second, and no less important, AI Studio is genuinely the best chat interface on the market. It was the first platform where you could edit any message in the conversation, not just the last one, and I think it's still the only platform where you can edit AI responses as well! So if the model goes on an unnecessary tangent, you can just remove it from the context. It's still the only platform where if you have a long conversation like R(equest)1, O(utput)1, R2, O2, R3, O3, R4, O4, R5, O5, you can click regenerate on R3 and it will only regenerate O3, keeping R4 and all subsequent messages intact.
NitpickLawyer•1h ago
What's a use case for this? I'm trying to imagine why you'd want that, but I can't see it. Is it for the horny people? If you're trying to do anything useful, having messages edited should re-generate the following conversation as well (tool calls, etc).
badsectoracula•1h ago
hiddenseal•44m ago
badsectoracula•1h ago
And yeah it can be useful for coding since you can edit the LLM's response to fix mistakes (and add minor features/tweaks to the code) and pretend it was correct from the get go instead of trying to roleplay with someone who makes mistakes you then have to correct :-P
hoppp•2h ago
margalabargala•1h ago