It also takes a while to learn using an LLM and get value from it.
The keys are how to build prompts, ways of working, and guidelines that help the AI stay focused.
You end up spending much more time guiding and coaching rather than coding, that can take a while to get used to.
Eventually though, you will master it and be able to write secure, fast code far beyond what you could have done by yourself.
Note: Also, prep yourself for incoming hate every time you make claims like that! If you write bad code, it's your fault. If your LLM writes bad code, you're a moron! hah
You’re going to get vastly different responses if you’re using Opus versus 4o.
Frontier models seems remarkably similar in performance.
Yeah some nuances for sure, but the whole article could apply to every model.
I did have some great luck producing quite useful and impactful code. But also lost time chasing tiny changes.
Having the AI ask me questions and think about the PRD/spec ultimately made me a better system designer.
This all feels like spinning the roulette wheel. I sometimes wonder if AI proponents are just gamblers who had the unfortunate luck of winning the first few prompts.
I recently started a company in another field and haven’t done any real development for about 4 years.
Earlier this summer I took a vacation and decided to start a small software hobby project specific to my industry. I decided to try out Cursor for the first time.
I found it incredibly helpful at saving time implementing all the bullshit involved in starting a new code base - setting up a build system, looking up libraries and APIs, implementing a framework for configuration and I/O, etc.
Yes, I still had to do some of the hard parts myself, and (probably most relevant) I still had to understand the code it was writing and correct it when it went down the wrong direction. I literally just told Cursor “No, why do it that way when you could do it much simpler by X”, and usually it fixed it.
A few times, after writing a bunch of code myself, I compiled the project for the first time in a while and (as one does) ran into a forest of inscrutable C++ template errors. Rather than spend my time scrolling through all of them I just told cursor “fix the compile errors”, and sure enough, it did it.
Another example - you can tell it things like “implement comparison operators for this class”, and it’s done in 5 seconds.
As the project got more complicated, I found it super useful to write tests for behaviors I wanted, and just tell it “make this test pass”. It really does a decent job of understanding the codebase and adding onto it like a junior developer would.
Using an IDE that gives it access to your whole codebase (including build system and tests) is key. Using ChatGPT standalone and pasting stuff in is not where the value is.
It’s nowhere near able to do the entire project from scratch, but it saved me from a bunch of tedious work that I don’t enjoy anyway.
Seems valuable enough to me!
I haven't had this much fun programming since I was at university hacking away on sun workstations, but admittedly I only write about 10% of the code myself these days.
I'm currently getting Claude Code to pair program with GPT-5 and they delegate the file edits to Gemini Flash. It's pretty cool.
Using, like, gpt-4o is extremely not useful for programming. But using Claude Code in your actual repo is insanely useful.
Gotta use the right tool + model.
With my experience, I wonder what the author of this blog post has tried to do to complete a task as that might make a difference on why they couldn't get much use out of it. Maybe other posters can chime in on how big of a difference programming language and size of project can make. I did find that it was able to glean how I had architected an app and it was able to give feedback on potential refactors, although I didn't ask it to go that far.
Prior to trying out Claude Code, I had only used ChatGPT and DeepSeek to post general questions on how to use APIs and frameworks and asking for short snippets of code like functions to do text parsing with regexes, so to be honest I was very surprised at what the state of the art could actually do, at least for my projects.
I've went through multiple phases of LLM usage for development.
GPT3.5 era: wow this is amazing, oh. everything is hallucinated. not actually as useful as I first thought
GPT4 era: very helpful as stackoverflow on steroids.
Claude 3.5 Sonnet: have it open pretty much all the time, constantly asking questions and getting it to generate simple code (in the web UI) when it goes down actually feels very old school googling stuff. Tried a lot of in IDE AI "chat" stuff but hugely underwhelmed.
Now: rarely open IDE as I can do (nearly) absolutely everything in Claude Code. I do have to refactor stuff every so often "manually", but this is more for my sanity and understanding of the codebase..
To give an example of a task I got Claude code to do today in a few minutes which would take me hours. Had a janky looking old admin panel in bootstrap styles that I wanted to make look nice. Told Claude code to fetch the marketing site for the project. Got it to pull CSS, logos, fonts from there using curl and apply similar styling to the admin panel project. Within 10 mins it was looking far, far better than I would have ever got it looking (at least without a designers help). Then got it to go through the entire project (dozens of screens) and update "explanation" copy - most of which was TODO placeholders to explain what everything did properly. I then got it to add an e2e test suite to the core flows.
This took less than an hour while I was watching TV. I would have almost certainly _never_ got around to this before. I'd been meaning to do all this and I always sigh when I go into this panel at how clunky it all is and hard to explain to people.
Basically, a lot of people who are experts are being told this story and they think they are the only one who doesn't get it.
There are plenty of gains to be had with AI/LLMs but just not in the way it's typically marketed.
PaulHoule•1h ago
People think "Oh, it works better when somebody else does it" or "There must be some model that does better than the one I am using" or "If I knew how to prompt better I'd get better results" or "There must be some other agentic IDE which is better than the one I am using."
All those things might be true but they just change the odds, they don't change the fact that it works sometimes and fails other times.
For instance I asked an agent to write me a screen to display some well-typed data. It came up with something great right away that was missing some fields and had some inconsistent formatting but it fixed all those problems when I mentioned them -- all speaking the language of product managers and end users. The code quality was just great, as good as if I wrote it, maybe better.
Plenty of times it doesn't work out like that.
I was working on some code where I didn't really understand the typescript types and fed it the crazy error messages I was getting and it made a try to understand them and didn't really, I used it as a "rubber duck" over the course of a day or two and working with it I eventually came to understand what was wrong and how to fix and I got into a place that I like and when there is an error I can understand it and it can understand it too.
Sometimes it writes something that doesn't typecheck and I tell it to run tsc and fix the errors and sometimes it does a job I am proud of and other times it adds lame typeguards like
Give it essentially the same problem, say writing tests in Java, and it might take very different approaches. One time it will use the same dependency injection framework used in other tests to inject mocks into private fields, other times it will write some a helper method to inject the mocks into private fields with introspection directly.You might be able to somewhat tame this randomness with better techniques but sometimes it works and sometimes it doesn't and if I just told you about the good times or just told you about the bad times it would be a very different story.
leptons•29m ago
I have to wonder if you tried a simple google search and read through some docs if you couldn't have figured this out quicker than trying to coax a result out of the LLM.
solarkraft•2m ago
I have had cases in which a web search and some good old fashioned thinking have yielded better results than using an LLM, but on average I’m pretty sure the LLM has the edge.