I can see where the author is coming from, but “marketing image is dumb” is by far the least interesting critique possible for these tools.
Give me an argument for why SWE bench is flawed, or an analysis of what areas didn’t improve in Claude4.
Meanwhile I’m vibe coding with Claude and having a great time. Sure, I wouldn’t use it for anything high stakes, but vs. Claude 3.7 I’m seeing a much higher success rate on tens to hundreds of lines of code.
It’s really, really easy to get a better code sample than the one shown here.
The problem is that in a team setting, you can't control the type of "code sample" that your teammates send for review.
It's frequently the case that some teammates will request review for this kind of slop, while outsourcing all the "thinking" to the reviewer.
As a revwiewer, I now have to review much more code (LLMs absolutely increase the amount of code per hour you can put out), which is often unfiltered LLM output, so it needs more careful and thorough review.
Essentially, a careless developer with access to an LLM can now perform a cheap DoS on reviewers.
Exactly. And not just a careless developer, but one intentionally doing so.
Recently [this project][1] made it to the front page and amassed quite a few upvotes, positive feedback, and GitHub stars. Yet anyone giving it a cursory look would be able to tell that it's AI-generated slop, that is completely unnecessary at best, and possible malware at worst.
Not only that, but the author[2] fired off several AI slop PRs to popular projects in a single day (May 12th), which is exactly the cheap DoS you mention. The author of cURL wrote about AI-generated bogus security reports last year[3], and the work they add to open source maintainers who are already stretched thin. So this is a real problem that most AI proponents are ignoring.
[1]: https://news.ycombinator.com/item?id=44009321
[2]: https://github.com/dipampaul17
[3]: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f...
I'm curious: do you review the AI generated code? In your experience, what is the percentage of times you've found it to be correct and shippable? I.e. not just that it compiles and does what you asked it to do, but that it doesn't contain security or performance issues, that it doesn't include useless or dead code, that is idiomatic, etc. Do you add tests for that code, written by yourself or the AI, and if so, how exhaustive are they?
I review a lot, revert a lot, reevaluate my life choices at times, it’s like supervising a toddler with 20 years of experience. It definitely contains security issues at times, perhaps less obvious ones, but subtle and harder to detect ones (the issues move up the stack).
It’s almost never correct and shippable on the first go, but it still saves me and my team a lot of time on boilerplate and heavy lifting, and let us focus on creating value for customers. It’s still software engineering, just the actual coding part is much more productive (Coding has never been the main activity that software engineers spend most of their time on. They design, gather requirements, define scope, architect, plan, test, review, gather feedback, iterate, etc. Now they simply have more time to do these activities)
Do this in C++ and I'm pretty sure you won't have any issue
Then just ask it to do that
I think this perfectly encompasses "vibe coding" [0], "Hey, it looks cool. Ship it. Don't worry about what's under the hood!".
[0] I'm using this term to specifically mean people using LLMs to write code with with doing very little or no checking of the code it writes, just what the website/app looks like.
nyrulez•16h ago
I think part of that comes from the difficulty of working with probabilistic tools that needs plenty of prompting to get things right, especially for more complex things. To me, it's a training issue for programmers, not a fundamental flaw in the approach. They have different strengths and it can take a few weeks of working closely to get to a level where it starts feeling natural. I personally can't imagine going back to the pre LLM era of coding for me and my team.
add-sub-mul-div•15h ago
nyrulez•15h ago
I can give you a concrete example since things sometimes can be so philosophical. The other day I needed a LIS code (Longest Increasing subsequence) with some very specific constraints. It would've honestly taken me a few hours to get it right as it's been a while I coded that kind of thing. I was able to generate the solution with o3 in around 10 minutes, with some back and forth. It wasn't one shot, but took me 2-3 iteration cycles. I was able to get highly performant code that worked for a very specific constraint. It used Fenwick trees (https://en.wikipedia.org/wiki/Fenwick_tree) which I honestly hadn't programmed myself before. It felt like a science fiction moment to me as the code certainly wasn't trivial. In fact I am pretty sure most senior programmers would fail at this task, let alone be fast at it.
As a professional programmer, I deal with 20 examples every day where using a quality LLM saved me significant time, sometimes hours per task. I still do manual surgery a bunch of times everyday but I see no need to write most functions anymore or do multi-file refactors myself. In a few weeks, you get very good at applying Cursor and all its various features intelligently, like an amazing pair programmer who has different strengths than you. I'll go so far as to say I wouldn't hire an engineer who isn't very adept at utilizing the latest LLMs. The difference is just so stark - it really is like science fiction.
Cursor is popular for a reason. Lot of incredible programmers still get incredible value out of it, it isn't just for vibe coding. Implying that Cursor can be a net negative to programmers based on an example is a lot of fear mongering.
codr7•14h ago
norir•14h ago
add-sub-mul-div•13h ago
It means you shouldn't run with weights on your shoes even if running with weights on shoes is a more efficient way for others to run.
LLM tech is popular because (1) people like taking shortcuts and (2) their bosses like the prospect of hiring fewer people.
tonyedgecombe•2h ago
>which I honestly hadn't programmed myself before.
How can you be sure it is correct if you haven't mastered the data structure yourself?
alehlopeh•15h ago
nyrulez•15h ago
alehlopeh•14h ago
That said, this article is very obviously not rhetoric. It seems almost dumb to argue this point. Maybe we should ask an AI if it is or not. I mean, I don’t know the author nor do I have anything to gain from debating this, but you can’t just go calling everything “rhetoric” when it’s clearly not. Yes there’s plenty of negative rhetoric about LLMs out there. But that doesn’t make everything critical of LLMs negative rhetoric. I’m very much pro-AI btw.
nyrulez•14h ago
anyways, it doesn't matter that much :) we could be both right.
sjdrc•13h ago
naikrovek•14h ago
Those of us that consider software development to be “typing until you more or less get the outcome that you want” love LLMs. Non-deterministic vibes all around.
This is also why executives love LLMs; executives speak words and little people do what was asked of them, generally, sometimes wrong, but are later corrected. An LLM takes instructions and does what was asked, generally, sometimes wrong, and is later corrected, but much faster than unreliable human plebs who get sick all the time and demand vacation and time to mourn deaths of other plebs.
nyrulez•14h ago
If you choose to accept bad code, that's on you. But I am not seeing that in practice, especially if you learn how to give quality prompts with proper rules. You have to get good at prompts - there is no escaping that. Now programmers do suck at communicating sometimes and that might be an issue. But in my experience, it can write far higher quality code than most programmers if used correctly.
o11c•13h ago
selcuka•13h ago
Curious. Do you write deterministic code? Because I don't think I can write the same code for any non-trivial task twice. Granted, I would probably remember which algorithm or design pattern I used before, and I can try and use the same methods, but you can also prompt that information to an LLM.
Another question: Can you hire software developers who write code in a deterministic way? If you give the same task to multiple developers with the same seniority level, do you always get the same output?
> "typing until you more or less get the outcome that you want”
For the record, I don't use LLMs for anything that is beyond auto-completion, but I think you are being unfair to them. They are actually pretty good at getting atomic tasks right when prompted properly.
thegrim33•14h ago
saithound•13h ago
The other hypotheses in this thread (e.g. that it's largely a matter of programming language) seem much more plausible.
ost-ing•12h ago
But, there is a difference between using LLMs and relying on LLMs. The hype is geared toward this idea that we can rely on these tools to do all the work for us, we can fire everyone, but its bollocks.
It becomes an increasingly ridiculous proposition as the work becomes more specialized indepth, cross functional, regulated and critical.
You can use it to help at any level of complexity, but nobody is going to vibe code a flight control system.
saithound•10h ago
tensorturtle•9h ago
codr7•14h ago
Did you read the post? Have you read any of them?
Everything people claim about them as far as writing code goes is delusional, this is clearly the wrong tool.
moozilla•13h ago
I struggled to find benchmark data to support this hunch, best I could find was [1] which shows a performance of 81% with Python/Typescript vs 62% with Rust, but this fits with my intuition. I primarily code in Python for work and despite trying I didn't get that much use out of LLMs until the Claude 3.6 release, where it suddenly crossed over that invisible threshold and became dramatically more useful. I suspect for devs that are not using Python or JS, LLMs have just not yet crossed this threshold.
[1] https://terhech.de/posts/2025-01-31-llms-vs-programming-lang...
imiric•7h ago
LLMs will routinely generate code that uses non-existent APIs, and has subtle and not-so-subtle bugs. They will make useless suggestions, often leading me on the wrong path, or going in circles. The worst part is that they do so confidently and reassuringly. I.e. if I give any hint to what I think the issue might be, after spending time reviewing their non-working code, then the answer is almost certainly "You're right! Here's the fix..."—which either turns out to be that I was wrong and that wasn't the issue, or their fix ends up creating new issues. It's a huge waste of my time, which would be better spent by reading documentation and writing the code myself.
I suspect that vibe coding is popular with developers who don't bother reviewing the generated code, either due to inexperience or laziness. They will prompt their way into building something that on the surface does what they want, but will fail spectacularly in any scenario they didn't consider. Not to speak of the amount of security and other issues that would get flagged by an actual code review from an experienced human programmer.