Generative AI coding tools and agents do not work for me

https://blog.miguelgrinberg.com/post/why-generative-ai-coding-tools-and-agents-do-not-work-for-me

399•nomdep•7mo ago

Comments

jumploops•7mo ago

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

As someone who uses Claude Code heavily, this is spot on.

LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.

I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).

hooverd•7mo ago

Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.

jumploops•7mo ago

My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.

Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.

I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.

adastra22•7mo ago

I do similar and find that this is the best compromise that I have tried. But I still find myself nodding along with OP. I am more and more finding that this is not actually faster, even though it certainly seems so.

bdamm•7mo ago

Isn't that what Windsurf or Cursor are?

cbsmith•7mo ago

There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.

There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".

Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).

It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.

Terr_•7mo ago

There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p

cbsmith•7mo ago

Fair enough. ;-)

I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.

Terr_•7mo ago

I expect that comes from the contrast and synthesis between how the author is anticipating things will develop or be explained, versus what the other person actually provided and trying to understand their thought process.

What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?

I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."

cbsmith•7mo ago

It seems counter intuitive that the reader would no longer have that authorial instinct due to lack of writing. Like, maybe they never had it, in which case, yes. But being exposed to a lot of different "writing opinions" tends to hone your own.

Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.

layer8•7mo ago

I don’t find this convincing. People generally don’t learn how to write a good novel just by reading a lot of them.

jyounker•7mo ago

On the other hand, people who write good novels tend to read a lot. Reading isn't sufficient, but intensive reading generally seems to be required.

ramraj07•7mo ago

That's a great perspective but its possible you're in a thread where no one wants to believe AI actually helps with coding.

adriand•7mo ago

Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.

jumploops•7mo ago

Yes, I’m actually working on an another project with the goal of never looking at the code.

For context, it’s just a reimplementation of a tool I built.

Let’s just say it’s going a lot slower than the first time I built it by hand :)

hatefulmoron•7mo ago

It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.

If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.

The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.

vidarh•7mo ago

If you let it paint you into a corner, why are you doing so?

I don't know what you mean by "a lot of guidance". Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.

Another issue is that as long as you ensure it builds good enough tests, the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.

The code is increasingly becoming throwaway.

hatefulmoron•7mo ago

> If you let it paint you into a corner, why are you doing so?

What do you mean? If it were as simple as not letting it do so, I would do as you suggest. I may as well stop letting it be incorrect in general. Lots of guidance helps avoid it.

> Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.

Well yeah. You need to give it lots of guidance, like someone who works for you.

> the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.

It's a moving target for sure. My confidence with this in more complex scenarios is much smaller.

vidarh•7mo ago

> What do you mean? If it were as simple as not letting it do so, I would do as you suggest.

I'm arguing it is as simple as that. Don't accept changes that muddle up the architecture. Take attempts to do so as evidence that you need to add direction. Same as you presumably would - at least I would - with a developer.

hatefulmoron•7mo ago

My concern isn't that it's messing up my architecture as I scream in protest from the other room, powerless to stop it. I agree with you and I think I'm being quite clear. Without relatively close guidance, it will paint you into a corner in terms of architecture. Guide it, direct it, whatever you want to call it.

swader999•7mo ago

Vertical slice architecture keeps you out of the corner.

mleonhard•7mo ago

I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?

jumploops•7mo ago

My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.

Years of PT have enabled me to work quite effectively and minimize the flare ups :)

sagarpatil•7mo ago

I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first) o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient. Say what you want but the time savings is real.

susshshshah•7mo ago

How do you know what tests to write if you don’t understand the code?

9rx•7mo ago

Same way you normally would? Tests are concerned with behaviour. The code that implements the behaviour is immaterial.

wiseowise•7mo ago

How do you do TDD without having code in the first place? How do QA verifies without reading the source?

adastra22•7mo ago

I’m not sure I understand this statement. You give your program parameters X and expect result Y, but instead get Z. There is your test, embedded in the problem statement.

layer8•7mo ago

Tests can never verify the correctness of code, they only spot-check for incorrectness.

genewitch•7mo ago

"hey claude, does this function terminate?"

royal__•7mo ago

I get confused when I see stances like this, because it gives me the sense that maybe people just aren't using coding tools efficiently.

90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.

kibibu•7mo ago

My experience is that the "fancy autocomplete" is a focus destroyer.

I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.

Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.

Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.

Perhaps I need to bind to a hot key instead of using the default always-on setting.

---

I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.

atq2119•7mo ago

It's fascinating how differently people's brains work.

Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.

I believe your reported experience, but mine (and presumably many others') is different.

skydhash•7mo ago

That usage is the most disruptive for me. With normal intellisense and a library you're familiar with, you can predict the completion and just type normally with minimal interruption. With no completion, I can just touch type and fix the errors after the short burst. But having whole lines pop up break that flow state.

With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.

Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.

Editing code is where I focus on it, but I'm also reading docs, running tests,...

zmmmmm•7mo ago

I think there's a key context difference here in play which is that AI tools aren't better than an expert on the language and code base that is being written. But the problem is that most software isn't written by such experts. It's written by people with very hazy knowledge of the domain and only partial knowledge of the languages and frameworks they are using. Getting it to be stylistically consistent or 100% optimal is far from the main problem. In these contexts AI is a huge help, I find.

handfuloflight•7mo ago

Will we be having these conversations for the next decade?

ken47•7mo ago

Longer.

adventured•7mo ago

The conversations will climb the ladder and narrow.

Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.

And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.

The skeptics will forever move the goal posts.

jdbernard•7mo ago

If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.

wiseowise•7mo ago

> If the AI actually outperforms humans in the full context of the work, then no, we won't.

IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).

jdbernard•7mo ago

You're getting down-voted, I think, because you're missing the point. I would argue that the "I use Vim, btw" articles are themselves proof that there are still people for whom Vim is actually more performant. Similarly, you don't see people being passed over in hiring because of editor choice because it does but less to consistent results.

Compare to a hand saw. You still see them in specialty work and hobby shops, but you don't see them on construction sites. You see circular saws. Same with hammers. You'll probably still see them in job sites, but with far less usage than nail guns. And in many contexts nail guns have completely replaced hammers. There are still people griping about power tools but the industry doesn't care. I know a fair number of people in the trades and I can't imagine any of them seriously suggesting that you don't need to know how to use power tools.

My argument is that, assuming AI fulfills the expectation of those who hype it (and that assumption has yet to be proven), we will see a similar effect in software. The results will speak for themselves and make the arguments irrelevant. That hasn't happened yet, leaving room for genuine debate.

ukprogrammer•7mo ago

> If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

This. The dev's outcompeting by using AI today are too busy shipping, rather than wasting time writing blog posts about what ultimately, is a skill-issue.

wiseowise•7mo ago

It’s the new “I use Vim/Emacs/Ed over IDE”.

iLemming•7mo ago

Jokes on you. LLMs integrate into Emacs so seamlessly, you probably have no idea. I can ask LLMs to help me at any point, whether I'm writing some notes, sending a Slack message to a colleague, editing a comment in a codebase or a git commit message, or even when running shell commands. You can easily manipulate the context applied to the conversation, see the payload, repeat with variability, swap models anytime, call external tools, replace things in place, examine the diff of the changes, search through your prior conversations, etc.

fshafique•7mo ago

"do not work for me", I believe, is the key message here. I think a lot of AI companies have crafted their tools such that adoption has increased as the tools and the output got better. But there will always be a few stragglers, non-normative types, or situations where the AI agent is just not suitable.

lexandstuff•7mo ago

Maybe, but there's also some evidence that AI coding tools aren't making anyone more productive. One study from last year found that there was no increase in developer velocity but a dramatic increase in bugs.[1] Granted, the technology has advanced since this study, but many of the fundamental issues of LLM unreliability remain. Additionally, a recent study has highlighted the significant cognitive costs associated with offloading problem-solving onto LLMs, revealing that individuals who do so develop significantly weaker neural connectivity than those who don't [2].

It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.

[1] https://devops.com/study-finds-no-devops-productivity-gains-...

[2] https://arxiv.org/abs/2506.08872

fshafique•7mo ago

One year ago I probably would've said the same. But I started dabbling with it recently, and I'm awed by it.

frankc•7mo ago

I just don't agree with this. I am generally telling the model how to do the work according to an architecture I specify using technology I understand. The hardest part for me in reviewing someone else's code is understanding their overall solution and how everything fits together as it's not likely to be exactly the way I would have structured the code or solved the problem. However, with an LLM it generally isn't since we have pre-agreed upon a solution path. If that is not what is happening than likely you are letting the model get too far ahead.

There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.

ken47•7mo ago

You can pre-agree on a solution path with human engineers too, with a similar effect.

bigbuppo•7mo ago

Don't try to argue with those using AI coding tools. They don't interact well with actual humans, which is why they've been relegated to talking to the computer. We'll eventually have them all working on some busy projects to help with "marketing" to keep them distracted while the decent programmers that can actually work in a team environment can get back to useful work free of the terrible programmers and marketing departments.

wiseowise•7mo ago

> that can actually work in a team environment can get back to useful work free of the terrible programmers

Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?

bigbuppo•7mo ago

I mostly just look at numbers every once in a while and try to keep them going in the right direction.

SpaceNugget•7mo ago

I think the point of the comment you replied to is that "reviewing code" is different in a regular work situation of reviewing a coworkers PR vs checking that the LLM generated something that matches what you requested.

I don't send my coworkers lists of micromanaged directions that give me a pretty clear expectation of what their PR is going to look like. I do however, occasionally get tagged on a review for some feature I had no part in designing, in a part of some code base I have almost no experience with.

Reviewing that the components you asked for do what you asked is a much easier scenario.

Maybe if people are asking an LLM to build an entire product from scratch with no guidance it would take a lot more effort to read and understand the output. But I don't think most people do that on a daily basis.

strangescript•7mo ago

Everyone is still thinking about this problem the wrong way. If you are still running one agent, on one project at a time, yes, its not going to be all that helpful if you are already a fast, solid coder.

Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.

I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.

I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.

kibibu•7mo ago

I have never seen people work harder to dismantle their own industry than software engineers are right now.

strangescript•7mo ago

What exactly is the alternative? Wish it away? Developers have been automating away jobs for decades, its seems hypocritical to complain about it now.

hooverd•7mo ago

who gets the spoils?

marssaxman•7mo ago

We've been automating ourselves out of our jobs as long as we've had them; somehow, despite it all, we never run out of work to do.

kibibu•7mo ago

We've automated bullshit tedium work, like building and deploying, but this is the first time in my memory that people are actively trying to automate all the fun parts away.

Closest parallel I can think of is the code-generation-from-UML era, but that explicitly kept the design decisions on the human side, and never really took over the world.

hooverd•7mo ago

Sounds like a way to blast your focus into a thousand pieces

sponnath•7mo ago

Can you actually demonstrate this workflow producing good software?

bdamm•7mo ago

No offense intended, but this is written by a guy who has the spare time to write the blog. I can only assume his problem space is pretty narrow. I'm not sure what his workflow is like, but personally I am interacting with so many different tools, in so many different environments, with so many unique problem sets, that being able to use AIs for error evaluation, and yes, for writing code, has indeed been a game changer. In my experience it doesn't replace people at all, but they sure are powerful tools. Can they write unsupervised code? No. Do you need to read the code they write? Yes, absolutely. Can the AIs produce bugs that take time to find? Yes.

But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.

This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.

ken47•7mo ago

> rise of IDEs, or Google search, or AWS.

None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.

bdamm•7mo ago

This is not the right way to look at it. You don't have to have the LLMs directly coding your work unsupervised to see the enormous power that is there.

And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.

stray•7mo ago

A human deftly wielding an LLM can surpass that hurdle. I laugh at the idea of telling Claude Code to do the needful and then blindly pushing to prod.

wiseowise•7mo ago

Literally everything in this list, except AWS, introduces risk of breaking your code base without close oversight. Same people who copy paste LLM code into IDEs are yesterday’s copy paste from SO and random Google searches.

satisfice•7mo ago

You think he's not using the tools correctly. I think you aren't doing your job responsibly. You must think he isn't trying very hard. I think you are not trying very hard...

That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.

But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.

bdamm•7mo ago

I am absolutely responsible for my work. That's why I spend so much time reading the code that I and others on my team write, and it's why I spend so much time building enormous test systems, and pulling deeply on the work of others. Thousands and thousands of hours go into work that the customer will never see, because I am responsible.

People's lives are literally at stake. If my systems screw up, people can die.

And I will continue to use AI to help get through all that. It doesn't make me any less responsible for the result.

satisfice•7mo ago

Why should any responsible person believe you? I'm serious. You assure us that you are properly supervising a tool that is famous for doing shockingly bad work at random times. The only way anyone could verify that is by watching you literally always.

We don't have a theory of LLMs that provides a basis on which to trust them. The people who create them do not test them in a way that passes muster with experts in the field of testing. Numerous articles by people at least as qualified as you cast strong doubt on the reliability of LLMs.

But you say "trust me!"

Stockton Rush assured us that his submersible was safe, despite warnings from experts. He also made noises about being responsible.

bdamm•7mo ago

I'm responsible for my work and I don't need to prove that to you or anyone else except the people who pay me for my work.

The fact there is AI involved doesn't change the nature of the work. Engineers and coders are paid to produce functioning results, and thorough code review is sometimes but not always involved. None of that changes. Software developers make mistakes, regardless of whether there is an AI involved or not. So introducing AI literally changes nothing in terms of the validation chain.

If you're trying to prevent a Stockton Rush type personality from creating larger social problems, then you're talking about regulating the software industry presumably like how the Engineering industry is regulated. However again, that doesn't change anything about the tools, only who and how responsibility flows.

skydhash•7mo ago

I do agree with these points in my situation. I don't actually care for speed or having generated snippets for unfamiliar domains. Coding for me has always be about learning. Whether I'm building out a new feature or solving a bug, programming is always a learning experience. The goal is to bring forth a solution that a computer can then perform, but in the process you learn about how and more importantly why you should solve a problem.

The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).

Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.

Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.

waprin•7mo ago

To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.

But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".

It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

dingnuts•7mo ago

> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.

goalieca•7mo ago

And how often do your prompting skills change as the models evolve.

stray•7mo ago

You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.

dingnuts•7mo ago

do I get a refund if I spend a grand and I'm still not convinced? at some point I'm going to start lying to myself to justify the cost and I don't know how much y'all earn but $1k is getting close

theoreticalmal•7mo ago

Would you ask for a refund from a university class if you didn’t get a job or skill from it? Investing in a potential skill is a risk and carries an opportunity cost, that’s part of what makes it a risk

HDThoreaun•7mo ago

No one is forcing you to improve. If you don’t want to invest in yourself that is fine, you’ll just be left behind.

asciimov•7mo ago

How are those without that kind of scratch supposed to keep up with those that do?

theoreticalmal•7mo ago

This kind of seems like asking “how are poor people supposed to keep up with rich people” which we seem to not have a long term viable answer for right now

throwawaysleep•7mo ago

If you lack "that kind of scratch", you are at the learning stage for software development, not the keeping up stage. Either that or horribly underpaid.

15123123•7mo ago

$100 per month for a SaaS is quite a lot outside of Western countries. People are not even spending that much on VPN or Password Manager.

bevr1337•7mo ago

I recently had a coworker tell me he liked his last workplace because "we all spoke the same language." It was incredible how much he revealed about himself with what he thought was a simple fact about engineer culture. Your comment reminds me of that exchange.

- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.

- the workplace is not a meritocracy. People are not defined by their wealth.

- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.

- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?

- Folks can have different priorities and it's as simple as that

We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.

swader999•7mo ago

I agree with all this but the simple fact is that if you don't keep up you'll be out of a job faster than the rest of us. My strategy for being replaced by AI is to replace the company that replaces me. Software is getting trivial to implement, especially if you know how to specify it.

sagarpatil•7mo ago

Use free tiers?

wiseowise•7mo ago

What makes you think those without that kind of scratch are supposed to keep up?

asciimov•7mo ago

For the past 10 years we have been telling everyone learn to code, now it’s learn to build AI prompts.

Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.

wiseowise•7mo ago

For the past 10 years scammers and profiteers been telling everyone to learn to code, not we.

viraptor•7mo ago

Not even close. I'm still under $100, creating full apps. Stick to reasonable models and you can achieve and learn a lot. You don't need latest and greatest in max mode (or whatever the new one calls it) for majority of the tasks. You can have to throw the whole project at the service every time either.

viraptor•7mo ago

Typo: ...you don't have to throw the whole project context...

grogenaut•7mo ago

how much time did you spend learning your last language to become comfortable with it?

qinsig•7mo ago

Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).

Instead you can get comfortable prompting and managing context with aider.

Or you can use claude code with a pro subscription for a fair amount of usage.

I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.

badsectoracula•7mo ago

It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).

While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).

It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.

IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...

[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things

[3] the code it produced for [2] was fixable to do a simple BVH

[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.

skydhash•7mo ago

> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize

No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.

oxidant•7mo ago

I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.

If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?

I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.

AI isn't a panacea, but it can be the right tool for the job.

15123123•7mo ago

I am also interested in how much of these skills are at the mercy of OpenAI ? Like IIRC 1 or 2 years ago there was an uproar of AI "artists" saying that their art is ruined because of model changes ( or maybe the system prompt changed ).

>I do not agree it is something you can pick up in an hour.

But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).

>You have to learn what AI is good at.

More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.

solumunus•7mo ago

OpenAI? They are far from the forefront here. No one is using their models for this.

15123123•7mo ago

You can substitute for whatever saas company of your choice.

oxidant•7mo ago

Of course that's what the industry is selling because they want to make money. Yes, it's easy to create a proof of concept but once you get out of greenfield into 50-100k tokens needed in the context (reading multiple 500 line files, thinking, etc) the quality drops and you need to know how to focus the models to maintain the quality.

"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?

I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.

That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.

skydhash•7mo ago

> What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?

That is software engineering realm, not using LLMs realm. You have to answer all of these questions even with traditional coding. Because they’re not coding questions, they’re software design questions. And before that, there were software analysis questions preceded by requirements gathering questions.

A lot of replies around the thread is conflating coding activities with the parent set of software engineering activities.

oxidant•7mo ago

Agreed, but people sell "vibe coding" without acknowledging you need more than vibes.

LLMs can help answer the questions. However, they're not going to necessarily make the correct choices or implementation without significant input from the user.

__MatrixMan__•7mo ago

It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.

viraptor•7mo ago

> It's something you can pick in a few minutes

You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?

Correct prompting these days what makes a difference in tasks like SWE-verified.

sothatsit•7mo ago

I feel like there is also a very high ceiling to how much scaffolding you can produce for the agents to get them to work better. This includes custom prompts, custom CLAUDE.md files, other documentation files for Claude to read, and especially how well and quickly your linting and tests can run, and how much functionality they cover. That's not to mention MCP and getting Claude to talk to your database or open your website using Playwright, which I have not even tried yet.

For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.

This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.

And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.

viraptor•7mo ago

> then it will produce a specification document from that

I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.

bcrosby95•7mo ago

When you say specification, what, specifically, does that mean? Do you have an example?

I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).

I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.

viraptor•7mo ago

Like a spec you'd hand to a contractor. List of requirements, some business context, etc. Not a formal algorithm spec.

My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."

sothatsit•7mo ago

A "specification" as in a text document outlining all the changes to make.

For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.

It sounds like a lot when I type it out, but it is pretty quick to read through and edit.

The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.

I'm not sure if this is the best process, but it seems to work pretty well.

freehorse•7mo ago

And where all this skill will go when newer models after one year use different tools and require different scaffolding?

The problem with overinvesting in a brand new, developping field is that you get skills that are soon to be redundant. You can hope that the skills are gonna transfer to what will be needed after, but I am not sure if that will be the case here. There was a lot of talk about prompting techniques ("prompt engineering") last year, and now most of these are redundant and I really don't think I have learnt something that is useful enough for the new models, nor have I actually understood sth. These are all tricks and tips level, shallow stuff.

I think these skills are just like learning how to use some tools in an ide. They increase productivity, it's great but if you have to switch ide they may not actually help you with the new things you have to learn in the new environment. Moreover, these are just skills in how to use some tools; they allow you to do things, but we cannot compare learning how to use tools vs actually learning and understanding the structure of a program. The former is obviously a shallow form of knowledge/skill, easily replaceable, easily redundant and probably not transferable (in the current context). I would rather invest more time in the latter and actually get somewhere.

sothatsit•7mo ago

A lot of the changes to get agents to work well is just good practice anyway. That's what is nice about getting these agents to work well - often, it just involves improving your dev tooling and documentation, which can help real human developers as well. I don't think this is going to become irrelevant any time soon.

The things that will change may be prompts or MCP setups or more specific optimisations like subagents. Those may require more consideration of how much you want to invest in setting them up. But the majority of setup you do for Claude Code is not only useful to Claude Code. It is useful to human developers and other agent systems as well.

> There was a lot of talk about prompting techniques ("prompt engineering") last year and now most of these are redundant.

Not true, prompting techniques still matter a lot to a lot of applications. It's just less flashy now. In fact, prompting techniques matter a ton for optimising Claude Code and creating commands like the planning prompt I created. It matters a lot when you are trying to optimise for costs and use cheaper models.

> I think these skills are just like learning how to use some tools in an ide. > if you have to switch ide they may not actually help you

A lot of the skills you learn in one IDE do transfer to new IDEs. I started using Eclipse and that was a steep learning curve. But later I switched to IntelliJ IDEA and all I had to re-learn were key-bindings and some other minor differences. The core functionality is the same.

Similarly, a lot of these "agent frameworks" like Claude Code are very similar in functionality, and switching between them as the landscape shifts is probably not as large of a cost as you think it is. Often it is just a matter of changing a model parameter or changing the command that you pass your prompt to.

Of course it is a tradeoff, and that tradeoff probably changes a lot depending upon what type of work you do, your level of experience, how old your codebases are, how big your codebases are, the size of your team, etc... it's not a slam dunk that it is definitely worthwhile, but it is at least interesting.

sagarpatil•7mo ago

Yeah, you can’t do sh*t in an hour. I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.

Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.

furyofantares•7mo ago

> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.

I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.

I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.

Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.

philomath_mn•7mo ago

Anywhere I can follow your takes on LLM-assisted coding?

JimDabell•7mo ago

> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).

This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.

mitthrowaway2•7mo ago

The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.

dr_dshiv•7mo ago

It’s mostly attitude that you are learning. Playfulness, persistence and a willingness to start from scratch again and again.

suddenlybananas•7mo ago

>persistence and a willingness to start from scratch again and again.

i.e. continually gambling and praying the model spits something out that works instead of thinking.

HPsquared•7mo ago

Most things in life are like that.

tsurba•7mo ago

Gambling is where I end up if I’m tired and try to get an LLM to build my hobby project for me from scratch in one go, not really bothering to read the code properly. It’s stupid and a waste of time. Sometimes it’s easier to get started this way though.

But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.

But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.

chii•7mo ago

why is the process important? If they can continuously trial and error their way into a good output/result, then it's a fine outcome.

suddenlybananas•7mo ago

Why is thinking important? Think about it a bit.

chii•7mo ago

is it more important for a chess engine to be able to think? Or is it able to win by brute force through searching a sufficient outcome?

If the outcome is indistinguisable from using "thinking" as the process rather than brute force, why would the process matter regarding how the outcome was achieved?

suddenlybananas•7mo ago

maybe if programming were a well-defined game like chess, but it's not.

chii•7mo ago

the grammar of a programming language is just as well defined. And the defined-ness of the "game" isn't required for my argument.

Your concept of thinking is the classic retoric - as soon as some "ai" manages to achieve something which previously wasn't capable, it's no longer AI and is just xyz process. It happened with chess engines, with alphago, and with LLMs. The implication being that human "thinking" is somehow unique and only the AI that replicate it can be considered to have "thinking".

freehorse•7mo ago

Moreover, the "ceiling" may still be below the "code works" level, and you have no idea when you start if it is or not.

notnullorvoid•7mo ago

Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?

From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.

serpix•7mo ago

Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.

autobodie•7mo ago

Why use AI to do something so simple? You're only increasing the possibility that it gets done wrong. Multi-cursor editing wil be faster anyway.

barsonme•7mo ago

Why not? I regularly have a couple Claude instances running in the background chewing through simple yet time consuming tasks. It’s saved me many hours of work and given me more time to focus on the important parts.

dotancohen•7mo ago

  > a couple Claude instances running in the background chewing through simple yet time consuming tasks.

If you don't mind, I'd love to hear more about this. How exactly are they running the background? What are they doing? How do you interact with them? Do they have access to your file system?

Thank you!

Philpax•7mo ago

I would guess that they're running multiple instances of Claude Code [0] in the background. You can give it arbitrary tasks up to a complexity ceiling that you have to figure out for yourself. It's a CLI agent, so you can just give it directives in the relevant terminal. Yes, they have access to the filesystem, but only what you give them.

[0]: https://www.anthropic.com/claude-code

dotancohen•7mo ago

Those tasks can take hours, or at least long enough where multiple tasks are running in the background? The page says $17 per month. That's unlimited usage?

If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.

Philpax•7mo ago

> Those tasks can take hours, or at least long enough where multiple tasks are running in the background?

Maybe not hours, but extended periods of time, yes. Agents are very quick, so they can frequently complete tasks that would have taken me hours in minutes.

> The page says $17 per month. That's unlimited usage?

Each plan has a limited quota; the Pro plan offers you enough to get in and try out Claude Code, but not enough for serious use. The $100 and $200 plans still have quotas, but they're quite generous; people have been able to get orders of magnitude of API-cost-equivalents out of them [0].

> If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.

Perhaps, but for now, you still need to have some degree of vague competence to know what to look out for and what works best. Might I suggest using the tools to get work done faster so that you can relax for the rest of the day? ;)

[0]: https://xcancel.com/HaasOnSaaS/status/1932713637371916341

barsonme•7mo ago

Yeah, Claude Code. I have the $100/month Max plan. I run 3-5 instances “in the background” (just another terminal) while I work. It really helps work through the backlog of “easy-ish things that I eventually need to do, but are relatively low priority.”

notnullorvoid•7mo ago

With such tedious tasks does it not take you just as long to verify it didn't screw up than if you had done it yourself?

stitched2gethr•7mo ago

It will very soon be the only way.

vidarh•7mo ago

Given I see people insisting these tools don't work for them at all, and some of my results recently include spitting out a 1k line API client with about 5 brief paragraphs of prompts, and designing a website (the lot, including CSS, HTML, copy, database access) and populating the directory on it with entries, I'd think the output quality improves a very great deal.

From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.

notnullorvoid•7mo ago

If I had a green field project that was low novelty I would happily use AI to get a prototype out the door quickly. I basically never work on those kinds of projects though, and I've seen AI tools royal screw up enough times given clear direction on both novel and trivial tasks in existing code bases.

Detailed specs are certainly a transferable skill, what isn't is the tedious hand holding and defensive prompting. In my entire career I've worked with a lot of people, only one required as much hand holding as AI. That person was using AI to do all their work.

npilk•7mo ago

Maybe this is yet another application of the bitter lesson. It's not worth learning complex processes for partnering with AI models, because any productivity gains will pale in comparison to the performance improvement from future generations.

notnullorvoid•7mo ago

Perhaps... Even if I'm being optimistic though there is a ceiling for just how much productivity can be gained. Natural language is much more lossy compared to programming languages, so you'll still need a lot of natural language input to get the desired output.

jyounker•7mo ago

Describing things in enough detail that someone else can implement them is a pretty important skill. Learning how to break up a large project into smaller tasks that you can then delegate to others is also a pretty important skill.

dspillett•7mo ago

> To some degree, traditional coding and AI coding are not the same thing

LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.

This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.

--------

[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.

rwmj•7mo ago

Another difference if your junior will, over time, learn, and you'll also get a sense of whether you can trust them. If after a while they aren't learning and you can't trust them, you get rid of them. GenAI doesn't gain knowledge in the same way, and you're always going to have the same level of trust in it (which in my experience is limited).

Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...

averageRoyalty•7mo ago

Cline, Roocode etc have the concept of rules that can be added to over time. There are heaps of memory bank and orchestration methods for AI.

LLMs absolutely can improve over time.

danielbln•7mo ago

> I want to be a tinkerer, not a manager of tinkerers.

We all want many things, doesn't mean someone will pay you for it. You want to tinker? Great, awesome, more power to you, tinker on personal projects to your heart's content. However, if someone pays you to solve a problem, then it is our job to find the best, most efficient way to cleanly do it. Can LLMs do this on their own most of the time? I think not, not right now at least. The combination of skilled human and LLM? Most likely, yes.

dspillett•7mo ago

If it gets to the point where I can't compete in the role with those using LLMs, I'll move on. I'm not happy with remote teams essentially being the only way of working these days (if you aren't working alone) anyway, and various other directions the industry has moved in (the shit-show that is client-side stack for instance!).

Maybe I'll retrain for lab work, I know a few people in the area, yeah I'd need a pay cut, but… Heck, I've got the mortgage paid, so I could take quite a cut and not be destitute, especially if I get sensible and keep my savings where they are and building instead of getting tempted to spend them! I don't think it'll get to that point for quite a few years though, and I might have been due to throw the towel in by that point anyway. It might be nice to reclaim tinkering as a hobby rather than a chore!

thefz•7mo ago

> I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it

A million times yes.

And we live in a time in which people want to be called "programmers" because it's oh-so-cool but not doing the work necessary to earn the title.

roxolotl•7mo ago

> But interns learn and get better over time. The time that you spend reviewing code or providing feedback to an intern is not wasted, it is an investment in the future. The intern absorbs the knowledge you share and uses it for new tasks you assign to them later on.

This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.

freeone3000•7mo ago

Put the business context in the system prompt.

xarope•7mo ago

I think this is how certain LLMs end up with 14k worth of system prompts

Terr_•7mo ago

"Be fast", "Be Cheap", "Be Good".

*dusts off hands* Problem solved! Man, am I great at management or what?

viraptor•7mo ago

You don't have to reexplain the business context. Save it to the mdc file if it's important. The added benefit is that the next real person looking at the code can also use that to learn - it's actually cool for having good up to date documentation is now an asset.

adastra22•7mo ago

Do you find your agent actually respecting the mdc file? I don’t.

viraptor•7mo ago

There should be no difference between the mdc and the text in the prompt. Try something drastic like "All of responses should be in Chinese". If it doesn't happen, they're not included correctly. Otherwise, yeah, they work modulo the usual issues of prompt adherence.

adastra22•7mo ago

I suspect that Cursor is summarizing the context window, and the .mdc directives are the first thing on the chopping room floor.

0x500x79•7mo ago

Interns usually CARE as well.

ed_mercer•7mo ago

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.

__loam•7mo ago

You are probably not being thorough enough.

Winsaucerer•7mo ago

There's a difference between reviewing code by developers you trust, and reviewing code by developers you don't trust or AI you don't trust.

Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.

bluefirebrand•7mo ago

> also less thorough/robust than writing for me.

This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR

sensanaty•7mo ago

I'm fast at reviewing PRs because I know the person on the other end and can trust that they got things correctly. I'll focus on the meaty, tricky parts of their PR, but I can rest assured that they matched the design, for example, and not have to verify every line of CSS they wrote.

This is a recipe for disaster with AI agents. You have to read every single line carefully, and this is much more difficult for the large majority of people out there than if you had written it yourself. It's like reviewing a Junior's work, except I don't mind reviewing my Junior colleague's work because I know they'll at least learn from the mistakes and they're not a black box that just spews bullshit.

satisfice•7mo ago

Thank you for writing what I feel and experience, so that I don't have to.

Which is kind of like if AI wrote it: except someone is standing behind those words.

block_dagger•7mo ago

> For every new task this "AI intern" resets back to square one without having learned a thing!

I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.

stray•7mo ago

You can lead a horse to the documentation, but you can't make him think.

wiseowise•7mo ago

Think is means to an end, not the end goal.

Or are you talking about OP not knowing AI tools enough?

stray•7mo ago

The word "think" rhymes with "drink."

The saying is, "You can lead a horse to water, but you can't make him drink." I intended no more profound meaning than that. A quip. Nothing more.

adastra22•7mo ago

Do they? I have found that with Cursor at least, the model very quickly starts ignoring rules.

hooverd•7mo ago

The moat is that juniors, never having worked without these tools, provide revenue to AI middlemen. Ideally they're blasting their focus to hell on short form video and stimulants, and are mentally unable to do the job without them.

Terr_•7mo ago

Given some the creeping appeal of LLMs as cheating tools in education, some of them may be arriving in the labor market with their brains already cooked.

danieltanfh95•7mo ago

AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.

Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.

Even when explicitly prompted to use first-principles analysis, AI models can struggle because:

- They lack the intuitive understanding of when to discard prior assumptions

- They don't naturally distinguish between surface-level similarity and deep structural similarity

- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics

This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.

Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.

adastra22•7mo ago

So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..

danieltanfh95•7mo ago

Of course there is. Humans can pattern match as a means to save time. LLM pattern match as the only mode of communication and “thought”.

Humans are also not as susceptible to context poisoning, unlike llms.

adastra22•7mo ago

Human thought is associative (pattern matching) as well. This is very well established.

danieltanfh95•7mo ago

Human thought is not a solved problem. It is clear that humans can abandon conventional patterns and try a novel approach instead, which is not shown by our current implementation of LLMs.

esailija•7mo ago

There is a difference between extrapolating from just a few examples vs interpolating between trillion examples

lexandstuff•7mo ago

Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.

I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

nerevarthelame•7mo ago

> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.

But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.

[1] https://storage.googleapis.com/gweb-research2023-media/pubto...

wiseowise•7mo ago

You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.

didibus•7mo ago

If I follow what you are saying, employers won't see any benefits, but employees, while they will take the same time and create the same output in the same amount of time, will be able to do so at a reduced mental strain?

Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.

tsurba•7mo ago

And how long have you been doing this? Because that sounds naive.

After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.

>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).

In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).

To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.

didibus•7mo ago

I think you make a good point. There is an issue of people talking over each other. The reality is, we don't all do the same work. It's possible my job and someone else's involves having to deliver very different code where the challenges to it differ.

One person might feel like their job is just coding the same CRUD app over and over re-skinned. Where-as I feel my job is to simplify code by figuring out better structures and abstractions to model the problem domain which together solve systemic issues with the delivered system and enables more features to work together without issue and be added to the system, as well as making changes and new features/use-cases delivery faster.

The latter I find a creative exercise, the former I might get bored and wish AI could automate it away.

I think what it is you are tasked with doing exactly at your job will also mean that your use of agentic AI actually makes you more productive or not.

Vinnl•7mo ago

Steelmanning their argument, employers will see benefits because while the employee might be more productive than with an LLM in the first two hours of the day, the cognitive load reduces their productivity as the day goes on. If employees are able to function at a higher level for longer during their day with an LLM, that should benefit the employer.

didibus•7mo ago

I think we are all working without data here, it's all conjecture.

I went with OP's hypothesis that you are not faster, you throw things at the wall, wait, and see if it sticks, or re-throw it until it does. This reduces your cognitive load, but might not actually make you more productive.

I'm assuming here that "you are not more productive" already accounted for what you are saying. Like in a 8h day, without AI, you get X done, and with AI you also get X done, likely because during the peak productivity hours of your day you get more done without AI, but when you are mentally tired you get less done, and it evens out with a full day of AI work.

There's no data here, it's all just people's intuition and impression, not actually measuring their productivity in any quantifiable way.

What you hypothesize could also be true, it the mental load is reduced, can you sustain a higher productivity for longer? We don't know, maybe.

wiseowise•7mo ago

> What you hypothesize could also be true, it the mental load is reduced, can you sustain a higher productivity for longer? We don't know, maybe.

It's not maybe, it's confirmed fact. Otherwise there wouldn't be burnout epidemic.

computably•7mo ago

Except the causes of burnout have almost nothing to do with the type of cognitive load associated with coding, debugging, etc.

https://www.mayoclinic.org/healthy-lifestyle/adult-health/in...

Of the six general causes listed, four are institutional or social, having to do more with the workplace or coworkers: lack of control, lack of clarity, interpersonal conflicts, lack of support. IME, in tech, these are far more common causes and more deeply tied to the root of the issue than specifics of work.

The remaining two are productivity-related issues: too much/little to do, problems with WLB.

I would note these are tied into lack of control/clarity/support, and conflict. In a healthy work environment, expectations should be clear and at least somewhat flexible depending on employee feedback, and adequate support should be provided by the employer.

That aside, it's unclear, and I would argue unlikely, that AI-related productivity gains will help with workload issues. If you do disproportionately more work in an overworked team/org, you will simply be given more work. If many people see gains in productivity, then either the bar for productivity goes up, or there's layoffs. Even if you manage to squeak by / quiet quit with much reduced cognitive load for coding, and that's most of your job, unless you are fully remote the most likely change is your butt-in-seat time will go from "mentally taxing coding" to "mentally toxic doomscrolling."

0x500x79•7mo ago

Great tools should be: 1. More efficient 2. Better Quality 3. Allow you to be lazier

AI hits 3, but not the other two. Given the current human condition, this is a dangerous combination! It will win, but at the cost of the other two.

nerevarthelame•7mo ago

They took more time and, on average, had fewer correct answers. Decreased "productivity" was the language the researchers used.

wiseowise•7mo ago

There's more to productivity than fewer correct answers.

https://news.ycombinator.com/item?id=44297190

Already replied better.

AstroBen•7mo ago

> not having to expend brain energy to solve problems, and they're mistaking that for productivity

Couldn't this result in being able to work longer for less energy, though? With really hard mentally challenging tasks I find I cap out at around 3-4 hours a day currently

Like imagine if you could walk at running speed. You're not going faster.. but you can do it for way longer so your output goes up if you want it to

Mentlo•7mo ago

There’s software domains where there’s very little work past solving the business problem. And there’s software domains where once you’ve architected the solution, there isn’t much problem solving to continue, there’s just a long slog of stuff to write.

The later is not making any neuron embedding tradeoff when they hand of the slog to agents.

There’s a lot of software development in that latter category.

p1dda•7mo ago

It would be interesting to see which is faster/better in competitive coding, the human coder or the human using AI to assist in coding.

asciimov•7mo ago

It would only be interesting if the problem was truly novel. If the AI has already been trained on the problem it’ll just push out a solution.

wiseowise•7mo ago

It already happened. Last year AI submissions completely destroyed AoC, as far as I remember.

Snuggly73•7mo ago

New benchmark for competitive coding dropped yesterday - https://livecodebenchpro.com/

Apparently models are not doing great for problems out of distribution.

p1dda•7mo ago

It goes to show that the LLMs aren't intelligent in the way humans are. LLMs are a really great replacement for googling though

karl11•7mo ago

There is an important concept alluded to here around skin in the game: "the AI is not going to assume any liability if this code ever malfunctions" -- it is one of the issues I see w/ self-driving cars, planes, etc. If it malfunctions, there is no consequence for the 'AI' (no skin in the game) but there are definitely consequences for any humans involved.

sneak•7mo ago

It’s harder to read code than it is to write it, that’s true.

But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.

AlotOfReading•7mo ago

I've written plenty of code that's much faster to write than to read. Most dense, concise code will require a lot more time building a mental model to read than it took to turn that mental model into code in the first place.

dvt•7mo ago

I'm actually quite bearish on AI in the generative space, but even I have to admit that writing boilerplate is "N" times faster using AI (use your favorite N). I hate when people claim this without any proof, so literally today this is what I asked ChatGPT:

    write a stub for a react context based on this section (which will function as a modal):
    ```
        <section>
         // a bunch of stuff
        </section>
    ```

Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).

Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)

Winsaucerer•7mo ago

This kind of thing is my main use, boilerplate stuff And for scripts that I don't care about -- e.g., if I need a quick bash script to do a once off task.

For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.

skydhash•7mo ago

For this case, i will probably lift off the example from the library docs. Or spend 5 minutes writing a bare implementation as it would be all I need at the time.

That’s an issue I have with generated code. More often, I start with a basic design that evolves based on the project needs. It’s an iterative process that can span the whole timeline. But with generated code, it’s a whole solution that fits the current needs, but it’s a pain to refactor.

dvt•7mo ago

> For this case, i will probably lift off the example from the library docs. Or spend 5 minutes writing a bare implementation as it would be all I need at the time.

Both of these would take longer than 5 minutes. There's also no "lifting" as this case involves both Provider and Context, so you'd have to combine React doc examples.

The only alternative would be knowing it by heart, which you might, but I don't (nor do I particularly care to). There's definitely a force multiplier here, even if just in the boring boilerplate cases.

skydhash•7mo ago

What about understanding. I've not touched React for some times, but I'm familiar enough with the library to locate every piece of exact information I need. And this for a lot of framework, languages, and libraries I've used over the year. There's documentation browsers like dash.app, zeal, devdocs, and various cheat sheets that help.

dpcan•7mo ago

This article is just simply not true for most people who have figured out how to use AI properly when coding. Since switching to Cursor, my coding speed and efficiency has probably increased 10x conservatively. When I'm using it to code in languages I've used for 25+ years, it's a breeze to look over the function it just saved me time by pre-thinking and typing it out for me. Could I have done it myself, yeah, but it would have taken longer if I even had to go lookup one tiny thing in the documentation, like order of parameters for a function, or that little syntax thing I never use...

Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.

My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.

asciimov•7mo ago

Out of curiosity how much are you spending on AI?

How much do you believe a programmer needs to layout to “get good”?

epiccoleman•7mo ago

I am currently subscribed to Claude Pro, which is $20/mo and gives you plenty to experiment with by giving you access to Projects and MCP in Claude Desktop and also Claude Code for a flat monthly fee. (I think there are usage limits but I haven't hit them).

I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.

I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.

At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.

I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.

dpcan•7mo ago

I have a $20/month GPT subscription, and the $20/month cursor plan. I've yet to come close to going over my limits with either service. I use the unlimited Tab completions in cursor which are what end up saving me an enormous amount of time. I probably use 5 to maybe 10 chats a day in cursor, but I jump over to GPT if I think I'm going to require a few extra chats to get to the bottom of something.

I think that getting "good" at using AI means that you figure out exactly how to formulate your prompts so that the results are what you are looking for given your code base. It also means knowing when to start new chats, and when to have it focus on very specific pieces of code, and finally, knowing what it's really bad at doing.

For example, if I need to have it take a list of 20 fields and create the HTML view for the form, it can do it in a few seconds, and I know to tell it, for example, to use Bootstrap, Bootstrap icons, Bootstrap modals, responsive rows and columns, and I may want certain fields aligned certain ways, buttons in certain places for later, etc, and then I have a form - and just saved myself probably 30 minutes of typing it out and testing the alignment etc. If I do things like this 8 times a day, that's 4 hours of saved time, which is game changing for me.

amlib•7mo ago

What do you even mean with a 10x increase in efficiency? Does that means you commit 10x more code every day? Or that "you" essentially "type" code 10x faster? In the later case all the other tasks surrounding code would still take around the same netting you much less than 10x increase in overall productivity, probably less than 2x?

dpcan•7mo ago

My favorite example, and the ones I show my team and my employer, is that I can have AI look at a string of fields for my database table and generate all the views for the display, add, and edit forms for those fields in exactly the way I instruct, and that saves me as much as 30 minutes every time I do it. If I do this 8 times in a day, that would save me about 4 hours. Especially when those forms require things like lookups and extra JavaScript functionality.

Another great example, is the power of tabbing with Cursor. If I want to change the parameters of a function in my React app, I can be at one of the functions anywhere in my screen, add a variable that relates to what is being rendered, and I can now quickly tab through to find all the spots that also are affected in that screen, and then it usually helps apply the changes to the function. It's like smart search and replace where I can see every change that needs made but it knows how to make it more intelligently than just replacing a line of code - and I didn't have to write the regex to find it, AND it usually helps get the work done in the function as well to reflect the change. That could save me 3-5 minutes, and I could do that 5 times a day maybe, and another almost half-hour is saved.

The point is, these small things add up SO fast. Now I'm incredibly efficient because the tedious part of programming has been sped up so much.

sagarpatil•7mo ago

What really baffles me is the claims from: Anthropic: 80% of the code is generated by AI OpenAI: 70-80% Google/Microsoft: 30%

nojs•7mo ago

This does not contradict the article - it may be true, and yet not significantly more productive, because of the increased review burden.

root_axis•7mo ago

The use of various AI coding tools is so diffuse that there isn't even a practical way to measure this. You can be assured those numbers are more or less napkin math based on some arbitrary AI performance factor applied to the total code writing population of the company.

layer8•7mo ago

Microsoft and Google have the much larger and older code bases.

euleriancon•7mo ago

> The truth that may be shocking to some is that open source contributions submitted by users do not really save me time either, because I also feel I have to do a rigorous review of them.

This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?

abenga•7mo ago

You do not need to review every line of every package you use, just the subset of the interface you import/link and use. You have to review every line of code you commit into your project. I think attempting to equate the two is dishonest dissembling.

euleriancon•7mo ago

To me, the point the friend is making is, just like you said, that you don't need to review every line of code in a package, just the interface. The author misses the point that there truly is code that you trust without seeing it. At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state. As AI code becomes more reliable, it will likely become the case that you only need to read the subset of the interface you import/link and use.

root_axis•7mo ago

> At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state

This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.

bluefirebrand•7mo ago

This absolutely is intrinsic to the workflow

Using a package that hundreds of thousands of other people use is low risk, it is battle tested

It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments

And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it

Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding

adastra22•7mo ago

That’s not what he said. He said he reviews every line of every pull request he receives to his own projects. Wouldn’t you?

kachapopopow•7mo ago

AI is a tool like any other, you have to learn to use it.

I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.

I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.

Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.

The only thing AI got wrong is that the virtual functions were symlinks and not directories.

The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.

marssaxman•7mo ago

So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.

Jonovono•7mo ago

Are you a plumber perhaps?

worik•7mo ago

I am.

That is the mental model I have for the work (computer programing) i like to do and am good at.

Plumbing

Jonovono•7mo ago

I like it!

kevinventullo•7mo ago

I’m not sure I follow the question. I think of plumbing as being the exact kind of verbose boilerplate that LLM’s are quite good at automating.

In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.

On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!

marssaxman•7mo ago

Not if I can help it, no; I don't have the patience.

resource_waste•7mo ago

I have it write algorithms, explain why my code isnt working, write API calls, or make specific functions.

The entire code? Not there, but with debuggers, I've even started doing that a bit.

nsonha•7mo ago

No software engineer needs any help if they keep working in the same stack and problem domain that they already know front to back after a few years doing the same thing. They wouldn't need any coding tool even. But that a pretty useless thing to say. To each their own.

nreece•7mo ago

Heard someone say the other day "AI coding is just advanced scaffolding right now." Made me wonder if we're expecting too much out of it, at-least for now.

tptacek•7mo ago

I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

kenjackson•7mo ago

I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.

autobodie•7mo ago

I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.

kenjackson•7mo ago

I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

dagw•7mo ago

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.

omnicognate•7mo ago

The problem is that at this stage we mostly just have people's estimates of their own success to go on, and nobody thinks they're incompetent. Nobody's going to say "AI works really well for, me but I just pump out dross my colleagues have to fix" or "AI doesn't work for me but I'm an unproductive, burnt out hack pretending I'm some sort of craftsman as the world leaves me behind".

This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.

So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.

autobodie•7mo ago

My experiences so far suggest that you might be right.

mwcampbell•7mo ago

I aspire to live up to your description of the best sort of developer. But I think there might also be a danger that that approach can turn into an excuse for spending the week overthinking (possibly while goofing off as well; I've done it), then writing a first cut on Friday, leaving no time for the multiple iterations that are often necessary to get to the best solution. In other words, I think sometimes it's necessary to just start coding sooner than we'd like so we can start iterating toward the right solution. But that "unproductive, burnt out hack" line hits a bit too close to home for me these days, and I'm starting to entertain the possibility that an LLM-based agent might have more energy for doing those multiple iterations than I do.

globnomulous•7mo ago

> I can read code much faster than I can write it.

I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.

I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.

kenjackson•7mo ago

You definitely can. For example I know x86. I can read it and understand it quite well. But if you asked me to write even a basic program in it, it would take me a considerable amount of time.

The same goes with shell scripting.

But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.

globnomulous•7mo ago

Huh, I don't know x86 but I do plenty of shell-scripting and am surprised, and a little embarrassed, that it had never dawned on me: you're right; they are easier to read, at least with a view to understanding intent, than to write. In fact, are there shell-scripting languages of which this isn't true?

smaudet•7mo ago

I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?

There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.

The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.

And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...

Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.

tptacek•7mo ago

Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.

Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.

smaudet•7mo ago

> We get in trouble trying to be clever (or DRY)

Certainly, however:

> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work

The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".

My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.

vidarh•7mo ago

> The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

You're missing the point.

tptacek is saying he isn't the one who needs to fix the issue because he can just reject the PR and either have the AI agent refine it or start over. Or ultimately resort to writing the code himself.

He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.

> My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code.

There's a vast chasm between simple enough that a non-AI code generator can generate it using templates and simple enough that a fast read-through is enough to show that it's okay to run.

As an example, the other day I had my own agent generate a 1kloc API client for an API. The worst case scenario other than failing to work would be that it would do something really stupid, like deleting all my files. Since it passes its tests, skimming it was enough for me to have confidence that nowhere does it do any file manipulation other than reading the files passed in. For that use, that's sufficient since it otherwise passes the tests and I'll be the only user for some time during development of the server it's a client for.

But no template based generator could write that code, even though it's fairly trivial - it involved reading the backend API implementation and rote-implementation of a client that matched the server.

smaudet•7mo ago

> But no template based generator could write that code, even though it's fairly trivial

Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...

> He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.

I think you are missing the point as well, that's still review, that's still being on the hook.

Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.

But I hear you say "all software works like that", well, yes, to some degree. The difference being, one you hopefully actually wrote and have some idea what's going wrong, the other one?

Well, you just have to sort of hope it works and when it doesn't, well you said it yourself. Your code was garbage anyways, time to "kill" it and generate some new slop...

vidarh•7mo ago

> Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...

Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?

I'm 30 years of development, I've seen nothing like it.

> I think you are missing the point as well, that's still review, that's still being on the hook.

I don't know if you're being intentionally obtuse, or what, but while, yes, you're on the hook for the final deliverable, you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over.

The point you seem intent on missing is that the cost of throwing out the work of another developer is high, while the cost of throwing out the work of an AI assistant is next to nothing, and so where you need to carefully review a co-workers code because throwing it away and starting over from scratch is rarely an option, with AI generated code you can do that at the slightest whiff of an issue.

> Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.

No, they are not a problem at all. They point to a difference in opportunity cost. If the rate at which you kill code is too high, it's a problem irrespective of source. But the point is that this rate can be much higher for AI code than for co-workers before it becomes a problem, because the cost of starting over is orders of magnitude different, and this allows for a very different way of treating code.

> Well, you just have to sort of hope it works and when it doesn't

No, I don't "hope it works" - I have tests.

smaudet•7mo ago

> Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?

I'd argue you are quite a bit beyond "rote" code at that point (with the understanding and protocol bits). But, generating client code is not hard, there are numerous generators around e.g. swagger:

https://swagger.io/ https://swagger.io/tools/swagger-codegen/

In ten years I expect other generators/platforms exist too, that's merely one I'm familiar with.

> you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over. > ... > No, I don't "hope it works" - I have tests.

These are contradictory statements. Every instance of that code you are responsible for, or you wouldn't test it and you wouldn't deign to "need" to throw it away.

> They point to a difference in opportunity cost.

Yes, we are all ultimately concerned with this. However this is not an easy metric to quantify, clearly you feel your OC (Opportunity Cost) because maybe you don't work well with other humans, ok whatever, however you are likely overestimating the supposed savings, and underestimating the lost OC of working with other developers, or simply writing code that doesn't need to be thrown out at all...

tptacek•7mo ago

I know you two are off on your own thing right now, which is cool, but I just want to say that the point of my comment is solely that the kind of code review involved in LLM output is different and easier than human code review (because of the lack of obligation to salvage the code if it's suspect), and we all seem to have reached a consensus on that point.

I explicitly wasn't trying to persuade anyone that the cost/benefit tradeoff for LLM coding was positive. I obviously believe it is, but reasonable people can disagree.

sesm•7mo ago

I would put it differently: when you already have a mental model of what the code is supposed to do and how, then reviewing is easy: just check that the code conforms to that model.

With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.

monero-xmr•7mo ago

I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.

Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.

"Ship it!" - me

autobodie•7mo ago

Haha, doing this with AI will bury you in a very deep hole.

theK•7mo ago

I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.

loandbehold•7mo ago

Most AI coding tools are working on this problem. E.g. say with Claude Code you can add your preferences to claude.md file. When I notice repeatedly correcting AI's mistake I add instruction to claude.md to avoid it in the future. claude.md is exactly that: memory of your preferences, idiosyncrasies and other project-related info.

vidarh•7mo ago

I do something to the effect of "Update LLM.md with what you've learned" at the end of every session, coupled with telling it what is wrong when I reject a change. It works. It could work better, but it works.

112233•7mo ago

This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

tptacek•7mo ago

I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.

bluefirebrand•7mo ago

> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

saulpw•7mo ago

Seriously. I've taken to thinking of most submitters as adversarial agents--even the ones I know to be well-meaning humans. I've seen enough code that looks obviously right and yet has some subtle bug (that I then have to tease apart and fix), or worse, a security flaw that lies in wait like a sleeper cell for the right moment to unleash havoc and ruin your day.

So with this "obviously right" rubric I would wind up rejecting 95% of submissions, which is a waste of my time and energy. How about instead I just write it myself? At least then I know who's responsible for cleaning up after the it.

vidarh•7mo ago

Auto-complete means having to babysit it.

The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.

My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.

stitched2gethr•7mo ago

Why would you review agent generated code any differently than human generated code?

tptacek•7mo ago

Because you don't care about the effort the agent took and can just ask for a do-over.

sensanaty•7mo ago

But how is this a more efficient way of working? What if you have to have it open 30 PRs before 1 of them is acceptable enough to not outright ignore? It sounds absolutely miserable, I'd rather review my human colleague's work because in 95% of cases I can trust that it's not garbage.

The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.

kasey_junk•7mo ago

If you get to 2 or 3 and it hasn’t done what you want you fall back to writing it yourself.

But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.

The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.

The worse case is you took the time to write 2 prompts that didn’t work.

greybox•7mo ago

For simple tedious or rote tasks, I have templates bound to hotkeys in my IDE. They even come with configurable variable sections that you can fill in afterwards, or base on some highlighted code before hitting the hot key. Also, its free

worik•7mo ago

There are tasks I find AI (I use DeepSeek) useful for

I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful

afarviral•7mo ago

This has been my experience as well, but there are plenty of assertions here that are not always true, e.g. "AI coding tools are sophisticated enough (they are not) to fix issues in my projects" … but how do you know this if you are not constantly checking whether the tooling has improved? I think for a certain level of issue AI can tackle it and improve things, but there's only a subset of the available models and of a multitude of workflows that will work well, but unfortunately we are drowning in many that are mediocre at best and many like me give up before finding the winning combination.

layer8•7mo ago

You omitted “with little or no supervision”, which I think is crucial to that quote. It’s pretty undisputed that having an AI fix issues in your code requires some amount of supervision that isn’t negligible. I.e. you have to review the fixes, and possibly make some adjustments.

bilalq•7mo ago

I've found "agents" to be an utter disappointment in their current state. You can never trust what they've done and need to spend so much time verifying their solution that you may as well have just done it yourself in the first place.

However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.

The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.

andrewstuart•7mo ago

He’s saying it’s not faster because he needs to impose his human analysis on it which is slow.

That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.

Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.

You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.

bluefirebrand•7mo ago

> no-one reviews the machine code that a compiler outputs

That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work

If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly

purerandomness•7mo ago

A compiler produces the same, deterministic output, every single time.

A stochastic parrot can never be trusted, let alone one that tweaks its model every other night.

I totally get that not all code ever written needs to be correct.

Some throw-away experiments can totally be one-shot by AI, nothing wrong with that. Depending on the industry one works in, people might be on different points of the expectation spectrum for correctness, and so their experience with LLMs vary.

It's the RAD tool discussion of the 2000s, or the "No-Code" tools debate of the last decade, all over again.

didibus•7mo ago

You could argue that AI-generated code is a black box, but let's adjust our perspective here. When was the last time you thoroughly reviewed the source code of a library you imported? We already work with black boxes daily as we evaluate libraries by their interfaces and behaviors, not by reading every line.

The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.

The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.

I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?

bluefirebrand•7mo ago

> When was the last time you thoroughly reviewed the source code of a library you imported?

Doesn't matter, I'm not responsible for maintaining that particular code

The code in my PRs has my name attached, and I'm not trusting any LLM with my name

didibus•7mo ago

Exactly, that's what I'm saying. Commit AI code under its own name. Then the code under your name can use the AI code as a black box. If your code that uses AI code works as expected, it is similar to when using libraries.

If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.

benediktwerner•7mo ago

I don't see what that does. The AI hardly cares about it's reputation and I also can't really blame the AI when my boss or a customer asks me why something failed, so what does committing under its name do?

I'm ultimately still responsible for the code. And unlike AI, library authors but their and their libraries reputation on the line.

bluefirebrand•7mo ago

> Commit AI code under its own name.

"A computer can never be held accountable therefore a computer should never make a management decision"

I think we need to go back to this. I think a computer cannot be held accountable so a computer should never make any decision with any kind of real world impact

materielle•7mo ago

I’m not sure that the library comparison really works.

Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.

Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.

Who is accountable for the code that AI writes?

genewitch•7mo ago

> Who is accountable for the code that AI writes?

i say we make it the original publishers of the data ingested by the AI during training. Just for the court battles.

adastra22•7mo ago

These days, I review external dependencies pretty thoroughly. I did not use to. This is because of AI slop though.

layer8•7mo ago

Would you use a library that was written by AI without anyone having supervised it and thoroughly reviewed the code? We are using libraries without checking its source code because of the human thought process and quality control that has gone into it, and existing reputation. Nobody would use a library that no one else has ever seen and whose source code no human has ever laid their eyes on. (Excluding code generated by deterministic vetted tools here, like transpilers or parser generators.)

animex•7mo ago

I write mostly boilerplate and I'd rather have the AI do it. The AI is also slow, which is great, which allows me to run 2 or 3 AI workspaces working on different tickets/problems at the same time.

Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.

I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.

edg5000•7mo ago

A huge bottleneck seems the lack of memory between sessions, at least with Claude Code. Sure, I can write things into a text file, but it's not the same as having an AI actually remember the work done earlier.

Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?

nurettin•7mo ago

I simply don't like the code it writes. Whenever I try using llms, it is like wrestling for conciseness. Terrible code which is almost certainly 1/10 error or "extras" I don't need. At this point I am simply using it to motivate me to move forward.

Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.

Using a third party Library? I force it to look up every function doc online and it still has errors.

Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.

edg5000•7mo ago

It's a bit like going from assembly to C++, except we don't have good rigid rules for high-level program specification. If we had a rigid "high-level language" to express programs, orders or magnitude more high-level than C++ and other, than we could maybe evaluate it for correctness and get 100% output reliability, perhaps. All the languages I picked up, I picked them up when they were at least 10 years old. I'm trying to use AI a bit these days for programming, but it feels like what it must have felt like using C++ when it just came available; promising but not usable (yet?) for most programming situations.

globnomulous•7mo ago

Decided to post my comment here rather than on the author's blog. Dang and tonhow, if the tone is too personal or polemical, I apologize. I don't think I'm breaking any HN rules.

Commenter Doug asks:

> > what AI coding tools have you utilized

Miguel replies:

> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?

Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.

I can't help but point out the irony here: you write a great deal on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.

And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:

> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.

Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.

> my personal experience with these tools, from a strictly technical point of view

> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.

Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.

I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.

I also write all of this, I guess, as someone who thinks commenter Doug seems like a way cool guy, a decent chap who asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.

Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of everybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence paired with the seductive illusion of understanding.

socalgal2•7mo ago

> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.

I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.

Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.

socalgal2•7mo ago

To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.

nottorp•7mo ago

You remember when Google used to do the same thing for you way before "AI"?

Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.

It's not that LLMs are better, it's search that got entshittified.

Philpax•7mo ago

Google has never identified the logical error in a block of code for me. I could find what an error code was, yes, but it's of very little help when you don't have a keyword to search.

FranzFerdiNaN•7mo ago

It was just as likely that Google would point you towards a stackoverflow question that was closed because it was considered a duplicate of a completely different question.

jasode•7mo ago

>You remember when Google used to do the same thing for you way before "AI"? [...] stack trace [...], but a plain search used to be capable of giving the same answer as a LLM.

The "plain" Google Search before LLM never had the capability to copy&paste an entire lengthy stack trace (e.g. ~60 frames of verbose text) because long strings like that exceeds Google's UI. Various answers say limit of 32 words and 5784 characters: https://www.google.com/search?q=limit+of+google+search+strin...

Before LLM, the human had to manually visually hunt through the entire stack trace to guess at a relevant smaller substring and paste that into Google the search box. Of course, that's do-able but that's a different workflow than an LLM doing it for you.

To clarify, I'm not arguing that the LLM method is "better". I'm just saying it's different.

nottorp•7mo ago

That's a good point, because now that I think of it, I never pasted a full stack trace in a search engine. I selected what looked to be the relevant part and pasted that.

But I did it subconsciously. I never thought of it until today.

Another skill that LLM use can kill? :)

swader999•7mo ago

Those truly were the dark ages. I don't know how people did it. They were a different breed.

socalgal2•7mo ago

I remember when I could paste an error message into Google and get an answer. I do not remember pasting a 60 line stack trace into Google and getting an answer, though I'm pretty sure I honestly never tried that. Did it work?

0x000xca0xfe•7mo ago

Yes, pasting lots of seemingly random context into Google used to work shockingly well.

I could break most passwords of an internal company application by googling the SHA1 hashes.

It was possible to reliably identify plants or insects by just googling all the random words or sentences that would come to mind describing it.

(None of that works nowadays, not even remotely)

nsonha•7mo ago

> when Google used to do the same thing for you way before "AI"?

Which is never? Do you often just lie to win arguments? LLM gives you a synthesized answer, search engine only returns what already exists. By definition it can not give you anything that is not a super obvious match

nottorp•7mo ago

> Which is never?

In my experience it was "a lot". Because my stack traces were mostly hardware related problems on arm linux in that period.

But I suppose your stack traces were much different and superior and no one can have stack traces that are different from yours. The world is composed of just you and your project.

> Do you often just lie to win arguments?

I do not enjoy being accused of lying by someone stuck in their own bubble.

When you said "Which is never" did you lie consciously or subconsciously btw?

SpaceNugget•7mo ago

According to a quick search on google, which is not very useful these days, the maximum query length is 32 words or 2000 characters and change depending on which answer you trust.

Whatever it is specifically, the idea that you could just paste a 600 line stack trace unmodified into google, especially "way before AI" and get pointed to the relevant bit for your exact problem is obviously untrue.

nottorp•7mo ago

"is" not "was" though.

Pasting stack traces and kernel oopses hasn't worked in quite a while, I think. It's very possible that the maximum query was longer in the past.

2000 characters is also more than a double spaced manuscript page as defined by the book industry (which seems to be about 1500). You can fit the top of a stack trace in there. And if you're dealing with talking to hardware, the top can be enough.

SpaceNugget•7mo ago

I'm really struggling to see why you might assume that google decreases the maximum query length over time when that's generally the exact opposite of how things develop?

And indeed, in the early days the maximum query length was 10 words. So no, you have never been able to paste an entire stack trace into google and magically get a concise summary.

If you are changing the original claim that you were responding to to "I can do my job without llms if I have google search" Sure of course anyone can. But you can't use that to dismiss that some people find it quite convenient to just dump the entire stack trace into a text chat and have a decent summary of what is important without having to read a single part of it.

nsonha•7mo ago

> your stack traces were much different and superior and no one can have stack traces that are different from yours

Very few devs bother to post stack traces (or generally any programming question) online. They only do that when they're stuck so badly.

Most people work out their problem then move on. If no one posts about it your search never hits.

averageRoyalty•7mo ago

A horse used to get you places just like a car could. A wisk worked as well as a blender.

We have a habit of finding efficiencies in our processes, even if the original process did work.

chasd00•7mo ago

I don’t think search use to do everything LLMs do now but you have a very good point. Search has gotten much worse. I would say search is about the quality it was just before google launched. My general search needs are being met more and more by Claude, I use google only when I know very specific keywords because of seo spam and ads.

adriancr1•7mo ago

You missed out on developing debugging skills.

Analyzing crash dumps and figuring out what's going on is a pretty useful skill.

turtlebits•7mo ago

It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.

At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.

thedelanyo•7mo ago

So AI is basically best as a search engine.

groestl•7mo ago

That's right.

cess11•7mo ago

I mean, it's just a compressed database with a weird query engine.

jrm4•7mo ago

As I've said a bunch.

AI is a search engine that can also remix its results, often to good effect.

antisthenes•7mo ago

Alwayshasbeen.jpg meme.

I mean yes, current large models are essentially compressing incredible amounts of content into something manageable by a single Accelerator/GPU, and making it available for retrieval through inference.

nikanj•7mo ago

And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago

nottorp•7mo ago

But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.

addandsubtract•7mo ago

Not if you prompt it to explain the answer it gives you.

nottorp•7mo ago

Not the same thing. Copying code, even with comprehensive explanations, teaches less than writing/adjusting your own code based on advice.

elbear•7mo ago

It can also do that if you ask it. It can give you exercises that you can solve. But you have to specifically ask, because by default it just gives you code.

nottorp•7mo ago

Of course, I originally was picking on Stack Overflow's moderation.

Which strongly discouraged trying to teach people.

elbear•7mo ago

Oh, I missed that. I also missed that about StackOverflow, although I wasn't active there. It sounds like the motto of SO was "Solve problems!" instead of "Teach people how to solve problems!".

nikanj•7mo ago

Alas, someone bungled the incentive system and a closed question counts as a solved problem

nottorp•7mo ago

It kinda went downhill at some point. But currently:

> And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago

ChatGPT acts exactly opposite to the SO mods.

> But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.

Here it acts exactly like what SO mods like.

The other comments are mostly people thinking this is about ChatGPT...

nikanj•7mo ago

I'd rather have a copy paste answer than a "go fish" answer

plasticeagle•7mo ago

ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.

Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.

socalgal2•7mo ago

> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.

b112•7mo ago

It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.

That doesn't mean it knows the answer. That means it guessed or hallucinated correctly. Guessing isn't knowing.

edit: people seem to be missing my point, so let me rephrase. Of course AIs don't think, but that wasn't what I was getting at. There is a vast difference between knowing something, and guessing.

Guessing, even in humans, is just the human mind statistically and automatically weighing probabilities and suggesting what may be the answer.

This is akin to what a model might do, without any real information. Yet in both cases, there's zero validation that anything is even remotely correct. It's 100% conjecture.

It therefore doesn't know the answer, it guessed it.

When it comes to being correct about a language or API that there's zero info on, it's just pure happenstance that it got it correct. It's important to know the differences, and not say it "knows" the answer. It doesn't. It guessed.

One of the most massive issues with LLMs is we don't get a probability response back. You ask a human "Do you know how this works", and an honest and helpful human might say "No" or "No, but you should try this. It might work".

That's helpful.

Conversely a human pretending it knows and speaking with deep authority when it doesn't is a liar.

LLMs need more of this type of response, which indicates certainty or not. They're useless without this. But of course, an LLM indicating a lack of certainty, means that customers might use it less, or not trust it as much, so... profits first! Speak with certainty on all things!

lechatonnoir•7mo ago

This is such a pointless, tired take.

You want to say this guy's experience isn't reproducible? That's one thing, but that's probably not the case unless you're assuming they're pretty stupid themselves.

You want to say that it Is reproducible, but that "that doesn't mean AI can think"? Okay, but that's not what the thread was about.

PeterStuer•7mo ago

What would convince you otherwise? The reason I ask is because you sound like you have made up your mind phylosophically, not based on practical experience.

rsanheim•7mo ago

It's just Pattern matching. Most APIs, and hell, most code is not unique or special. Its all been done a thousands of times before. Thats why an LLM can be helpful on some tool you've written just for yourself and never released anywhere.

As to 'knows the answer', I'm don't even know what that means with these tools. All I know is if it is helpful or not.

danielbln•7mo ago

Also, most problems are decomposable into simpler, certainly not novel parts. That intractable unicorn problem I hear so much about is probably composed of very pedestrian sub-problems.

genewitch•7mo ago

to what does "unicorn problem" refer to? a specific thing or a general idea?

danielbln•7mo ago

General idea, it comes up often in LLM discussion from engineers lamenting that LLMs can never solve any novel problems. This might as well be true but as said, almost any problem can be decomposed into bite sized, non-novel parts.

CamperBob2•7mo ago

'Pattern matching' isn't just all you need, it's all there is.

jumploops•7mo ago

> It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before.

The amazing thing about LLMs is that we still don’t know how (or why) they work!

Yes, they’re magic mirrors that regurgitate the corpus of human knowledge.

But as it turns out, most human knowledge is already regurgitation (see: the patent system).

Novelty is rare, and LLMs have an incredible ability to pattern match and see issues in “novel” code, because they’ve seen those same patterns elsewhere.

Do they hallucinate? Absolutely.

Does that mean they’re useless? Or does that mean some bespoke code doesn’t provide the most obvious interface?

Having dealt with humans, the confidence problem isn’t unique to LLMs…

skydhash•7mo ago

> The amazing thing about LLMs is that we still don’t know how (or why) they work!

You may want to take a course in machine learning and read a few papers.

js8•7mo ago

Sorry, but that's reductionism. We don't know how human brain works, and that you won't get there by studying quantum electrodynamics.

LLMs are insanely complex systems and their emergent behavior is not explained by the algorithm alone.

dboreham•7mo ago

Suspect you and the parent poster are thinking on different levels.

whateverbrah•7mo ago

That was sarcasm by the poster, in case you failed to notice.

semiquaver•7mo ago

Parent is right. We know mechanically how LLMs are trained and used but why they work as well as they do is very much not known.

rainonmoon•7mo ago

> the corpus of human knowledge.

Goodness this is a dim view on the breadth of human knowledge.

jamesrcole•7mo ago

what do you object to about it? I don't see an issue with referring to "the corpus of human knowledge". "Corpus" pretty much just means the "collection of".

jazzyjackson•7mo ago

Human knowledge != Reddit/Twitter/Wikipedia

jamesrcole•7mo ago

Who said it was? I’m pretty sure they’re trained on a lot more than just those.

oezi•7mo ago

Conversely, what do you posit is part of human knowledge but isn't scrapable from the internet?

jazzyjackson•7mo ago

I mean, as far as a corpus goes, I suppose all text on the internet gets pretty close if most books are included, but even then you’re mostly looking at English language books that have been OCR’d.

But I look down my nose at conceptions that human knowledge is packagable as plain text, our lives, experience, and intelligence is so much more than the cognitive strings we assemble in our heads in order to reason. It’s like in that movie Contact when Jodie Foster muses that they should have sent a poet. Our empathy and curiosity and desires are not encoded in UTF8. You might say these are realms other than knowledge, but woe to the engineer who thinks they’re building anything superhuman while leaving these dimensions out, they’re left with a cold super-rationalist with no impulse to create of its own.

hombre_fatal•7mo ago

This doesn't seem like a useful nor accurate way of describing LLMs.

When I built my own programming language and used it to build a unique toy reactivity system and then asked the LLM "what can I improve in this file", you're essentially saying it "only" could help me because it learned how it could improve arbitrary code before in other languages and then it generalized those patterns to help me with novel code and my novel reactivity system.

"It just saw that before on Stack Overflow" is a bad trivialization of that.

It saw what on Stack Overflow? Concrete code examples that it generalized into abstract concepts it could apply to novel applications? Because that's the whole damn point.

skydhash•7mo ago

Programming languages, by their nature of being formal notation, only have a few patterns to follow, all of them listed in the grammar of that language. And then there’s only so much libraries out there. I believe there’s more unique comments and other code explanations out there than unique code patterns. Take something like MDN where there’s a full page of text for every JavaScript, html, css symbol.

demosthanos•7mo ago

This is wrong. I write toy languages and frameworks for fun. These are APIs that simply don't exist outside of my code base, and LLMs are consistently able to:

* Read the signatures of the functions.

* Use the code correctly.

* Answer questions about the behavior of the underlying API by consulting the code.

Of course they're just guessing if they go beyond what's in their context window, but don't underestimate context window!

b112•7mo ago

So, you're saying you provided examples of the code and APIs and more, in the context window, and it succeeds? That sounds very much unlike the post I responded to, which claimed "no knowledge". You're also seemingly missing this:

"If you're getting answers, it has seen it elsewhere"

The context window is 'elsewhere'.

demosthanos•7mo ago

If that's the distinction you're drawing then it's totally meaningless in the context of the question of where the information is going to come from if not Stack Overflow. We're never in a situation where we're using an open source library that has zero information about it: The code is by definition available to be put in the context window.

As they say, it sounds like you're technically correct, which is the best kind of correct. You're correct within the extremely artificial parameters that you created for yourself, but not in any real world context that matters when it comes to real people using these tools.

fnordpiglet•7mo ago

The argument is futile as the goal posts move constantly. In one moment the assertion is it’s just megacopy paste, then the next when evidence is shown that it’s able to one shot construct seemingly novel and correct answers from an api spec or grammar never seen before, the goal posts move to “it’s unable to produce results on things it’s never been trained on or in its context” - as if making up a fake language and asking it write code in it and its inability to do so without a grammar is an indication of literally anything.

To anyone who has used these tools in anger it’s remarkable given they’re only trained on large corpuses of language and feedback they’re able to produce what they do. I don’t claim they exist outside their weights, that’s absurd. But the entire point of non linear function activations with many layers and parameters is to learn highly complex non linear relationships. The fact they can be trained as much as they are with as much data as they have without overfitting or gradient explosions means the very nature of language contains immense information in its encoding and structure, and the network by definition of how it works and is trained does -not- just return what it was trained on. It’s able to curve fit complex functions that inter relate semantic concepts that are clearly not understood as we understand them, but in some ways it represents an “understanding” that’s sometimes perhaps more complex and nuanced than even we can.

Anyway the stochastic parrot euphemism misses the point that parrots are incredibly intelligent animals - which is apt since those who use that phrase are missing the point.

semiquaver•7mo ago

This is moving goalposts vs the original claim upthread that LLMs are just regurgitating human-authored stackoverflow answers and without those answers it would be useless.

It’s silly to say that something LLMs can reliably do is impossible and every time it happens it’s “dumb luck”.

Workaccount2•7mo ago

>It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.

How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.

Correct answers couldn't be dumb luck either, because otherwise the models would pretty much only hallucinate (the space of wrong answers is many orders of magnitude larger than the space of correct answers), similar to the early proto GPT models.

efavdb•7mo ago

Could it be that there is a lot of redundancy in the training data?

daveguy•7mo ago

> How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.

This is false. You are off by ~4 orders of magnitude by claiming these models are trained on exabytes of data. It is closer to 500TB of more curated data at most. Contrary to popular belief LLMs are not trained on "all of the data on the internet". I responded to another one of your posts that makes this false claim here:

https://news.ycombinator.com/item?id=44283713

gejose•7mo ago

I'm sorry but this is a gross oversimplification. You can also apply this to the human brain.

"<the human brain> cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck."

gwhr•7mo ago

What kind of API is it? Curious if it's a common problem that the AI was able to solve?

olmo23•7mo ago

Where does the knowledge come from? People can only post to SO if they've read the code or the documentation. I don't see why LLMs couldn't do that.

nobunaga•7mo ago

ITT: People who think LLMs are AGI and can produce output that the LLM has come up with out of thin air or by doing research. Go speak with someone who is actually an expert in this field how LLMs work and why the training data is so important. Im amazed that people in the CS industry seem to talk like they know everything about a tech after using it but never even writing a line of code for an LLM. Our indsutry is doomed with people like this.

usef-•7mo ago

This isn't about being AGI or not, and it's not "out of thin air".

Modern implementations of LLMs can "do research" by performing searches (whose results are fed into the context), or in many code editors/plugins, the editor will index the project codebase/docs and feed relevant parts into the context.

My guess is they either were using the LLM from a code editor, or one of the many LLMs that do web searches automatically (ie. all of the popular ones).

They are answering non-stackoverflow questions every day, already.

nobunaga•7mo ago

Yeah, doing web searches could be called research but thats not what we are talking bout. Read the parent of the parent. Its about being able to answer questions thats not in its training data. People are talking about LLMs making scientific discoveries that humans haven't. A ridiculous take. Its not possible and with the current state of tech never will be. I know what LLMs are trained on. Thats not the topic of conversation.

oezi•7mo ago

A large part of research is just about creatively re-arranging symbolic information and LLMs are great at this kind of research. For example discovering relevant protein sequences.

semiquaver•7mo ago

> Its about being able to answer questions thats not in its training data.

This happens all the time via RAG. The model “knows” certain things via its weights, but it can also inject much more concrete post-training data into its context window via RAG (e.g. web searches for documentation), from which it can usefully answer questions about information that may be “not in its training data”.

planb•7mo ago

I think the time has come to not mean LLMs when talking about AI. An agent with web access can do so much more and hallucinates way less than "just" the model. We should start seeing the model as a building block of an AI system.

raincole•7mo ago

> LLM has come up with out of thin air

People don't think that. Especially not the commentor you replied to. You're human-hallucinating.

People think LLM are trained on raw documents and code besides StackOverflow. Which is very likely true.

nobunaga•7mo ago

Read the parent of the parent. Its about being able to answer questions thats not in its training data. People are talking about LLMs making scientific discoveries that humans havent. A ridiculous take. Its not possible and with the current state of tech never will be. I know what LLMs are trained on. Thats not the topic of conversation.

kypro•7mo ago

The idea that LLMs can only spew out text they've been trained on is a fundamental miss-understanding of how modern backprop training algorithms work. A lot of work goes into refining training algorithms to preventing overfitting of the training data.

Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.

fpoling•7mo ago

Yet the models do not (yet) reason. Try to ask them to solve a programming puzzle or exercise from an old paper book that was not scanned. They will produce total garbage.

Models work across programming languages because it turned out programming languages and API are much more similar than one could have expected.

erikerikson•7mo ago

I broadly agree that cutting new knowledge will need to continue being done and that overuse of LLMs could undermine that, yet... When was the last time you paid to read an APIs' docs? It costs money for companies to make those too.

AlwaysRock•7mo ago

> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

I mean... They also can read actual documentation. If I'm working on any api work or a language I'm not familiar with, I ask the LLM to include the source they got their answer from and use official documentation when possible.

That lowers the hallucination rate significantly and also lets me ensure said function or code actually does what the llm reports it does.

In theory, all stackoverflow answers are just regurgitated documentation, no?

sothatsit•7mo ago

> I mean... They also can read actual documentation.

This 100%. I use o3 as my primary search engine now. It is brilliant at finding relevant sources, summarising what is relevant from them, and then also providing the links to those sources so I can go read them myself. The release of o3 was a turning point for me where it felt like these models could finally go and fetch information for themselves. 4o with web search always felt inadequate, but o3 does a very good job.

> In theory, all stackoverflow answers are just regurgitated documentation, no?

This is unfair to StackOverflow. There is a lot of debugging and problem solving that has happened on that platform of undocumented bugs or behaviour.

CamperBob2•7mo ago

We'll start writing documentation for primary consumption by LLMs rather than human readers. The need for sites like SO will not vanish overnight but it will diminish drastically.

semiquaver•7mo ago

> ChatGPT and Gemini literally only know the answer because they read StackOverflow

Obviously this isn’t true. You can easily verify this by inventing and documenting an API and feeding that description to an LLM and asking it how to use it. This works well. LLMs are quite good at reading technical documentation and synthesizing contextual answers from it.

reaperducer•7mo ago

We'll be back in the world of Encyclopedias

On a related note, I recently learned that you can still subscribe to the Encyclopedia Britannica. It's $9/month, or $75/year.

Considering the declining state of Wikipedia, and the untrustworthiness of A.I., I'm considering it.

PeterStuer•7mo ago

I love leaning new things. With ai I am learning more and faster.

I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .

Got tired of MS's licensing BS and I made the switch.

This meant learning Proxmox, Linux, Pangolin, UV, Python, JS, Bootstrap, NGinx, Plausible, SQLite, Postgress ...

Not all of these were completely new, but I had never dove in seriously.

Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.

It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.

AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.

rootnod3•7mo ago

I might be an outlier, but I much prefer reading the documentation myself. One of the reasons I love using FreeBSD and OpenBSD as daily drivers. The documentation is just so damn good. Is it a pain in the ass at the beginning? Maybe. But I require way less documentation lookups over time and do not have to rely on AI for that.

Don't get me wrong, I tried. But even when pasting the documentation in, the amount of times it just hallucinated parameters and arguments that were not even there were such a huge waste of time, I don't see the value in it.

greybox•7mo ago

I trust chatgpt and gemini a lot less than stackoverflow. On stackoverflow I can see the context that the answer to the original question was given in. AI does not do this. I've asked chatgpt questions about cmake for instance that it got subtly wrong, if I had not noticed this it would have cost me aa lot of time.

BlackFly•7mo ago

One of the many ways that search got worse over time was the promotion of blog spam over actual documentation. Generally, I would rather have good API documentation or a user guide that leads me through the problem so that next time I know how to help myself. Reading through good API documentation often also educates you about the overall design and associated functionality that you may need to use later. Reading the manual for technology that you will be regularly using is generally quite profitable.

Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.

It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.

blueflow•7mo ago

You parent searches for answers, you search for documentation. Thats why AI works for him and not for you.

ryanackley•7mo ago

You're completely missing his point. If nobody figures things out for themselves, there's a risk that at some point, AI won't have anything to learn on since people will stop writing blog posts on how they figured something out and answering stack overflow questions.

Sure, there is a chance that one day AI will be smart enough to read an entire codebase and chug out exhaustively comprehensive and accurate documentation. I'm not convinced that is guaranteed to happen before our collective knowledge falls off a cliff.

blueflow•7mo ago

Read it again, slowly. FSVO "works":

  Thats why AI works for him and not for you.

We both agree.

BlackFly•7mo ago

Did I say AI didn't work for me? Nope, so it is unclear to me why you are projecting some kind of perceived deficit onto me.

The difference between me and the person I responded to is that I feel I understand the perspective of the OP and I was trying to help the person who it didn't make sense to to understand the perspective.

yard2010•7mo ago

I think the main issue here is trust. When you google something you develop a sense for bullshit so you can "feel" the sources and weigh them accordingly. Using a chat bot, this bias doesn't hold, so you don't know what is just SEO bullshit reiterated in sweet words and what's not.

raxxorraxor•7mo ago

For anything non-trivial you have to verify the results.

I disabled AI autocomplete and cannot understand how people can use it. It was mostly an extra key press on backspace for me.

That said, learning new languages is possible without searching anything. With a local model, you can do that offline and have a vast library of knowledge at hand.

The Gemini results integrated in Google are very bad as well.

I don't see the main problem to be people just lazily asking AI for how to use the toilet, but that real knowledge bases like stack overflow and similar will vanish because of lacking participation.

perrygeo•7mo ago

> Getting answers to those question is learning, regardless of where the answer comes from.

Sort of. The process of working through the question is what drives learning. If you just receive the answer with zero effort, you are explicitly bypassing the brain's learning mechanism.

There's huge difference between your workflow and fully Agentic AIs though.

Asking an AI for the answer in the way you describe isn't exactly zero effort. You need to formulate the question and mold the prompt to get your response, and integrate the response back into the project. And in doing so you're learning! So YOUR workflow has learning built in, because you actually use your brain before and after the prompt.

But not so with vibe coding and Agentic LLMs. When you hit submit and get the tokens automatically dumped into your files, there is no learning happening. Considering AI agents are effectively trying to remove any pre-work (ie automating prompt eng) and post-work (ie automating debugging, integrating), we can see Agentic AI as explicitly anti-learning.

Here's my recent vibe coding anecdote to back this up. I was working on an app for an e-ink tablet dashboard and the tech stack of least resistance was C++ with QT SDK and their QML markup language with embedded javascript. Yikes, lots of unfamiliar tech. So I tossed the entire problem at Claude and vibe coded my way to a working application. It works! But could I write a C++/QT/QML app again today - absolutely not. I learned almost nothing. But I got working software!

Eisenstein•7mo ago

The logical conclusion of this is 'the AI just solves the problem by coding without telling you about it'. If we think about 'what happens when everyone vibe-codes to solve their problems' then we get to 'the AI solves the problem for you, and you don't even see the code'.

Vibe-coding is just a stop on the road to a more useful AI and we shouldn't think of it as programming.

icedchai•7mo ago

It "tells you about it" with code. You can still learn from the code AI has produced. It may be suboptimal or messy... but so is code produced by many of our fellow humans.

rich_sasha•7mo ago

I sort of disagree with this argument in TFA, as you say, though the rest of the article highlights a limitation. If I'm unfamiliar with the API, I can't judge whether the answer is good.

There is a sweet spot of situations I know well enough to judge a solution quickly, but not well enough to write code quickly, but that's a rather narrow case.

0x500x79•7mo ago

For one-offs, sure! Go for it. For production/things you will have to manage long-term I would recommend learning some of the space given the output of AI and your capability to surpass that pretty quickly.

joelthelion•7mo ago

I think it's getting clear that, in the current stage, Ai coding agent are mostly useful for people working either on small projects, or isolated new features. People who maintain a large framework find it less useful.

aryehof•7mo ago

These days, many programmers and projects are happy to leave testing and defect discovery to end users, under the guise of “but we have unit tests and CI”. That’s exacerbated when using LLM driven code with abandon.

The author is one who appears unwilling to do so.

Zaylan•7mo ago

I've had a similar experience. These tools are pretty helpful for small scripts or quick utility code, but once you're working on something with a more complex structure and lots of dependencies, they tend to slow down. Sometimes it takes more effort to fix what they generate than to just write it myself.

I still use them, but more as a support tool than a real assistant.

Aeolun•7mo ago

> The part that I enjoy the most about working as a software engineer is learning new things, so not knowing something has never been a barrier for me.

To me the part I enjoy most is making things. Typing all that nonsense out is completely incidental to what I enjoy about it.

pSYoniK•7mo ago

I've been reading these posts for the past few months and the comments too. I've tried Junie a bit and I've used ChatGPT in the past for some bash scripts (which, for the most part, did what they were supposed to do), but I can't seem to find the use case.

Using them for larger bits of code feels silly as I find subtle bugs or subtle issues in places, so I don't necessarily feel comfortable passing in more things. Also, large bits of code I work with are very business logic specific and well abstracted, so it's hard to try and get ALL that context into the agent.

I guess what I'm trying to ask here is what exactly do you use agents for? I've seen youtube videos but a good chunk of those are people getting a bunch of typescript generated and have some front-end or generate some cobbled together front end that has Stripe added in and everyone is celebrating as if this is some massive breakthrough.

So when people say "regular tasks" or "rote tasks" what do you mean? You can't be bothered to write a db access method/function using some DB access library? You are writing the same regex testing method for the 50th time? You keep running into the same problem and you're still writing the same bit of code over and over again? You can't write some basic sql queries?

Also not sure about others, but I really dislike having to do code reviews when I am unable to really gauge the skill of the dev I'm reviewing. If I know I have a junior with 1-2 years maybe, then I know to focus a lot on logic issues (people can end up cobbling toghether the previous simple bits of code) and if it's later down the road at 2-5 years then I know that I might focus on patterns or look to ensure that the code meets the standards, look for more discreet or hidden bugs. With an agent output it could oscilate wildly between those. It could be a solidly written search function, well optimized or it could be a nightmarish sql querry that's impossible to untangle.

Thoughts?

I do have to say I found it good when working on my own to get another set of "eyes" and ask things like "are there more efficient ways to do X" or "can you split this larger method into multiple ones" etc

cinbun8•7mo ago

As someone who heavily utilizes AI for writing code, I disagree with all the points listed. AI is faster, a multiplier, and in many instances, the equivalent of an intern. Perhaps the code it writes is not like the code written by humans, but it serves as a force multiplier. Cursor makes $500 million for a reason.

jpcrs•7mo ago

I use AI daily, currently paying for Claude Code, Gemini and Cursor. It really helps me on my personal toy projects, it’s amazing at getting a POC running and validate my ideas.

My company just had internal models that were mediocre at best, but at the beginning this year they finally enabled Copilot for everyone.

At the beginning I was really excited for it, but it’s absolutely useless for work. It just doesn’t work on big old enterprise projects. In an enterprise environment everything is composed of so many moving pieces, knowledge scattered across places, internal terminology, etc. Maybe in the future, with better MCP servers or whatever, it’ll be possible to feed all the context into it to make it spit something useful, but right now, at work, I just use AI as search engine (and it’s pretty good at it, when you have the knowledge to detect when it have subtle problems)

HPsquared•7mo ago

I think a first step for these big enterprise codebases (also applicable to documentation) is to collect it into a big ball and finetune on it.

Kiro•7mo ago

> I believe people who claim that it makes them faster or more productive are making a conscious decision to relax their quality standards to achieve those gains.

Yep, this is pretty much it. However, I honestly feel that AI writes so much better code than me that I seldom need to actually fix much in the review, so it doesn't need to be as thorough. AI always takes more tedious edge-cases into account and applies best practices where I'm much sloppier and take more shortcuts.

noiv•7mo ago

I've started to finish some abandoned half-ready side projects with Claude Pro on Desktop with filesystem MCP. Used to high quality code, it took me some time to teach Claude to follow conventions. Now it works like a charm, we work on a requirements.md until all questions are answered and then I let Claude go. Only thing left is convincing clients to embrace code assistents.

nottorp•7mo ago

> The problem is that I'm going to be responsible for that code, so I cannot blindly add it to my project and hope for the best.

Responsability and "AI" marketing are two non intersecting sets.

s_ting765•7mo ago

Author makes very good points. Someone has to be responsible for the AI generated code, and if it's not going to be you then no one should feel obligated to pull the auto-generated PR.

blueboo•7mo ago

Skeptics find Talking themselves out of trying them is marvellously effective for convincing themselves they’re right

nilirl•7mo ago

The main claim made: When there's money or reputation to be lost, code requires the same amount of cognition; irrespective of who wrote the code, AI or not.

Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.

The real problem: Disinformation. Needless extrapolation, poor analogies, over valuing anecdotes.

But there's money to be made. What can we do, sometimes the invisible hand slaps us silly.

freehorse•7mo ago

> Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.

Counter counter claim for these use cases: when I do proof of concept, I actually want to increase my understanding of said concept at the same time, learn challenges involved, and in general get a better idea how feasible things are. An AI can be useful for asking questions, asking for reviews, alternative solutions, inspiration etc (it may have something interesting to add or not) but if we are still in the territory "this matters" I would rather not substitute the actual learning experience and deeper understanding with having an AI generate code faster. Similar for hobby projects, do I need that thing to just work or I actually care to learn how it is done? If the learning/understanding is not important in a context, I would say then using AI to generate the code is a great time-saver. Otherwise, I may still use AI but not in the same way.

nilirl•7mo ago

Fair. I rescind those examples and revise my counter: When you gain much more from speed than you lose with errors, AI makes sense.

Revised example: Software where the goal is design experimentation; like with trying out variations of UX ideas.

ritz_labringue•7mo ago

AI is really useful when you already know what code needs to be written. If you can explain it properly, the AI will write it faster than you can and you'll save time because it is quick to check that this is actually the code you wanted to write. So "programming with AI" means programming in your mind and then using the AI to materialize it in the codebase.

Tzt•7mo ago

Well, kinda? I often know what chunks / functions I need, but too lazy to think how to implement them exactly, how they should works inside. Yeah, you need to have overall idea of what you are trying to make.

rwmj•7mo ago

> What I think happens is that these people save time because they only spot review the AI generated code, or skip the review phase altogether, which as I said above would be a deal breaker for me.

In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.

anelson•7mo ago

I’ve experienced this as well. If management is not competent they can’t tell (or don’t want to hear) when a “star” performer is actually a very expensive wrapper around a $20/mo cursor subscription.

Unlike the author of the article I do get a ton of value from coding agents, but as with all tools they are less than useless when wielded incompetently. This becomes more damaging in an org that already has perverse incentives which reward performative slop over diligent and thoughtful engineering.

skydhash•7mo ago

Git blame can do a lot in those situations. Find the general location of the bug, then assign everyone that has touched it to the ticket.

cardanome•7mo ago

Is that really something you are doing in your job?

Most of my teams have been very allergic to assigning personal blame and management very focused on making sure everyone can do everything and we are always replaceable. So maybe I could phrase it like "X could help me with this" but saying X is responsible for the bug would be a no no.

skydhash•7mo ago

Not really. I was talking more in the context of the parent comment. If your management is dysfunctional, allowing AI slop without the accountability, then you go with this extreme measure.

I don't mind fixing bugs, but I do mind reckless practices that introduce them.

danielbln•7mo ago

I don't understand this, the buck stops with the PR submitter. If they get repeated feedback about their PRs that are just passed-through AI slop, then the team lead or whatever should give them a stern talking to.

pera•7mo ago

That would be a reasonable thing to do, unfortunately this doesn't always happen. Say for example that your company is quite behind schedule and decides to pay some cheap contractors to work on anything that doesn't require domain expertise: In 2025 these cheap contractors will 100% vibe code their way through their assigned tickets. They will open PRs that look "nearly there" and basically hope for all green checks in your CI/CD pipeline. If that doesn't happen then they will try to bruteforce^W vibe code the PR for a couple of hours. If it still doesn't pass then claim that the PR is ready but there is something wrong for example with an external component which they can't touch due to contractual reasons...

One of the most bizarre experiences I have had over this past year was dealing with a developer who would screen share a ChatGPT session where they were trying to generate a test payload with a given schema, getting something that didn't pass schema validation, and then immediately telling me that there must be a bug in the validator (from Apache foundation). I was truly out of words.

zacksiri•7mo ago

LLMs are relatively new technology. I think it's important to recognize the tool for what it is and how it works for you. Everyone is going to get different usage from these tools.

What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.

There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.

However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.

It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.

If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.

If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).

I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.

zengyue•7mo ago

I think it is more suitable for creation rather than modification, so when repeated attempts still don't work, I will delete it and let it rewrite, which often solves the problem.

redhale•7mo ago

This line by the author, in response to one of the comments, betrays the core of the article imo:

> The quality of the code these tools produce is not the problem.

So even if an AI could produce code of a quality equal to or surpassing the author's own code quality, they would still be uninterested in using it.

To each their own, but it's hard for me to accept an argument that such an AI would provide no benefit, even if one put priority on maintaining high quality standards. I take the point that the human author is ultimately responsible, but still.

lvl155•7mo ago

Analogous to assembly, we need standardized AI language/styles.

ukprogrammer•7mo ago

> “It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.”

There’s your issue, the skill of programming has changed.

Typing gets fast; so does review once robust tests already prove X, Y, Z correctness properties.

With the invariants green, you get faster at grokking the diff, feed style nits back into the system prompt, and keep tuning the infinite tap to your taste.

mentalgear•7mo ago

From all my experience over several years, the best stance towards AI assisted development is: "Trust, but verify" (each change). Which is in stark contrast of brittle "vibe coding" (which might work for demos but nothing else).

Tzt•7mo ago

What do you mean several years, it became feasible like 6 months ago lol. No, gpt3.5 doesn't count, it's a completely useless thing.

varispeed•7mo ago

What is the purpose of this article?

bawana•7mo ago

Is AI's relationship to knowledge the same as an index fund is to equities? Does the fact that larger and larger groups of people use AI result in more homogeneous and 'blindered' thinking?

cwoolfe•7mo ago

I have found AI generated code to be overly verbose and complex. It usually generates 100 lines and I take a few of them and adapt them to what I want. The best cases I've found for using it are asking specific technical questions, helping me learn a new code language, having it generate ideas on how to solve a problem for brainstorming. It also does well with bounded algorithmic problems that are well specified i.e. write a function that takes inputs and produces outputs according to xyz. I've found it's usually sorely lacking in domain knowledge (i.e. it is not an expert on the iOS SDK APIs, not an expert in my industry, etc.)

mettamage•7mo ago

My heuristic: the more you're solving a solved problem that is just tedious work and memory intensive take a crack at using AI. It will probably one shot your solution with minimal tweaks required.

The more you deviate from that, the more you have to step in.

But given that I constantly forget how to open a file in Python, I still have a use for it. It basically supplanted Stackoverflow.

giantg2•7mo ago

The true test is can it write tests? Ask the dev if they use it to write tests. The answers to #1 is it can't really. The answer to #2 should be no.

AI can write some tests, but it can't design thorough ones. Perhaps the best way to use AI is to have a human writing thorough and well documented tests as part of TDD, asking AI to write code to meet those tests, then thoroughly reviewing that code.

AI saves me just a little time by writing boilerplate stuff for me, just one step above how IDEs have been providing generated getters and setters.

xpe•7mo ago

> Unfortunately these claims are just based on the perception of the subjects themselves, so there is no hard data to back them up.

Did the author take their own medicine and measure their own productivity?

dlock17•7mo ago

Somehow you think the burden of proof is on his response rather than the AI maxis initial claims, but regardless, here you go. It was measured.

https://arxiv.org/abs/2506.08872

throwaway12345t•7mo ago

I don’t understand this one at all. Say you need to update a somewhat unique implementation of a component across 5 files. In pseudocode, it might take you 30 seconds to type out whatever needs to be done. It would take maybe 3-4 minutes to do it.

I set that up to run then do something different. I come back in a couple minutes, scan the diffs which match expectations and move on to the next task.

That’s not everything but those menial tasks where you know what needs to be done and what the final shape should look like are great for AI. Pass it off while you work on more interesting problems.

xpe•7mo ago

> In recent times I had to learn Rust, Go, TypeScript, WASM, Java and C# for various projects, and I wouldn't delegate this learning effort to an AI, even if it saved me time.

Either/or fallacy. There exist a varied set of ways to engage with the technology. You can read reference material and ask for summarization. You can use language models to challenge your own understanding.

Are people really this clueless? (Yes, I know the answer, but this is a rhetorical device.)

Think, people. Human intelligence is competing against artificial intelligence, and we need to step it up. Probably a good time to stop talking like we’re in Brad Pitt’s latest movie, Logical Fallacy Club. If we want to prove our value in a competitive world, we need to think and write well.

I sometimes feel like bashing flawed writing is mean, but maybe the feedback will get through. Better to set a quality bar. We should aim to be our best.

Fraterkes•7mo ago

Let me help you remove the beam from your own eye first: this comment leaves me with the impression that your writing isn’t great.

xpe•7mo ago

I welcome specific and actionable criticism. Would you like to engage with my (a) substance; (b) tone; (c) something else?

saulpw•7mo ago

It's not organized well. As the reader I had to do too much work to discern your point and what was relevant. I'm sure it's obvious to you the writer, but one of the foundational skills of a good writer is empathy for their reader, whoever they may be. Even here, you think you're being open-minded, but you offered a multiple choice question, where the choices are reductive and it comes off as defensive. An open-ended question like "What problem did you have with my writing?" might elicit a better response.

xpe•7mo ago

Thanks for the feedback.

> As the reader I had to do too much work to discern your point and what was relevant.

Re-reading, I hope the first ~15 words make my main point:

> Either/or fallacy. There exist a varied set of ways to engage with the technology...

Was this part unclear? Something else?

> you think you're being open-minded...

"Open minded" can mean very different things to different people. I recommend the article "The Proper Use of Humility" by Yudkowsky [1] because it rings true to me. I'm open to hearing other people's points of view, up to a point, given enough time. (Everyone has their limit, whether we admit it or not.) When it comes to assessing truth, I care about good arguments and good evidence, and I heavily discount anything else. If someone says I'm not "open minded" because of what I just wrote, then my reply would be "what do you want me to be more open to?"

There is a gem from in a comment below the above article that deserves repeating:

> People often take open disagreement as a sign of intellectual arrogance, while it is a display of respect and humility; showing respect with the honest acknowledgment of your disagreement, and showing humility in affording the other person a chance to defend themselves and prove you wrong. To say nothing is to treat that person's beliefs dismissively, as if they don't matter, and then assume that discussion was futile because they're incapable of understanding the truth, and of course, couldn't possible have anything to teach you.

> ...but you offered a multiple choice question, where the choices are reductive...

I offered two specific categories (tone or substance) and a third option for "anything else". I'm not following why this feels reductive to you; it leaves space for someone to reply however they like.

> and it comes off as defensive.

I've thought about this word quite a bit. From dictionary.com defensive means "excessively concerned with guarding against the real or imagined threat of criticism, injury to one's ego, or exposure of one's shortcomings." I'm open to criticism and happy to learn. If I'm wrong, I strive to admit it and apologize where needed. At the same time, I am confident enough to push back, stand up for myself, and defend my ideas (which is a different sense of 'defensive').

Here is the backstory to my second comment. The comment I replied to did not strike me as kind, much less well-intentioned. It probably was intended to be an insult, but I replied anyway. I gave the benefit of the doubt while challenging the commenter to give constructive criticism. I strived for clarity and confidence without being defensive or going on a counter-attack. This is a hard balance to strike.

[1] https://www.lesswrong.com/posts/GrDqnMjhqoxiqpQPw/the-proper...

saulpw•7mo ago

Humility is a good thing to strive for. It's clear to me from your comments that you stop short of the amount of humility that would improve your writing. It's not my responsibility to argue with you about it though, and I've hit my limit. Keep reflecting on it! Just remember that reflection is quiet listening, unlike most of what you'd consider "thinking" is just loudly asserting.

thefz•7mo ago

I learned C# first then async/await then the TPL then MVVM by banging my head against the problems I had to solve. I still retain the knowledge to this day because I had to think long and hard and test a lot, prototype and verify.

Having a chatbot telling me what to write would have not sorted the same effect.

It's like having someone tell you the solutions to your homework.

abalashov•7mo ago

I think the point about owning the code is the significant one. If you’re just doing some throwaway prototyping or trying stuff, fine. But if you really need to commit to ownership and maintenance and care and feeding of this code, best just write it yourself, if only for the reason that writing it engenders the appropriate level of understanding while removing the distraction of AI slop code review.

Where I find it genuinely useful is in extremely low-value tasks, like localisation constants for the same thing in other languages, without having to tediously run that through an outside translator. I think that mostly goes in the "fancy inline search" category.

Otherwise, I went back from Cursor to normal VS Code, and mostly have Copilot autocompletions off these days because they're such a noisy distraction and break my thought process. Sometimes they add something of value, sometimes not, but I'd rather not have to confront that question with every keystroke. That's not "10x" at all.

Yes, I've tried the more "agentic" workflow and got down with Claude Code for a while. What I found is that its changes are so invasive and chaotic--and better prompts don't really prevent this--that it has the same implications for maintainability and ownership referred to above. For instance, I have a UIKit-based web application to which I recently asked Claude Code to add dark theme options, and it rather brainlessly injected custom styles into dozens of components and otherwise went to town, in a classic "optimise for maximum paperclip production" kind of way. I spent a lot more time un-F'ing what it did throughout the code base than I would have spent adding the functionality myself in an appropriately conservative fashion. Sure, a better prompt would probably have helped, but that would have required knowing what chaos it was going to wreak in advance, as to ask it to refrain from that as part of the prompt. The possibility of this happening with every prompt is not only daunting, but a rabbit hole of cognitive load that distracts from real work.

I will concede it does a lot better--occasionally, very impressively--with small and narrow tasks, but those tasks at which it most excels are so small that the efficiency benefit of formulating the prompt and reviewing the output is generally doubtful.

There are those who say these tools are just in their infancy, AGI is just around the corner, etc. As far as I can tell from observing the pace of progress in this area (which is undeniably impressive in strictly relative terms), this is hype and overextrapolation. There are some fairly obvious limits to their training and inference, and any programmer would be wise to keep their head down, ignore the hype, use these tools for what they're good at and studiously avoid venturing into "fundamentally new ways of working".

osigurdson•7mo ago

That is my experience with Windsurf / Cursor type tools. Faster for some things but generally super annoying and slow.

The Codex workflow however really is a game changer imo. It takes the time to ensure changes are consistent with other code and the async workflow is just so much nicer.

conductr•7mo ago

Has a lot to do with what you’re building. It does front end and crud apps pretty well. Things like games and more complex programs I feel like I’m fighting it more and should just write the code myself

mellosouls•7mo ago

The author makes the excellent point that LLM-coding still has a human bottleneck at the code review point - regardless of whether the issue at hand is fixed or not.

Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":

I'd be interested to know if any of the existing LLM ELO style leaderboards mark for code quality in addition to issue fixing?

The former seems a particularly useful benchmark as they become more powerful in surface abilities.

NoGravitas•7mo ago

> Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":

But this is one of the core problems with LLM coding, right? It accelerates an already broken model of software development (worse is better) rather than trying to help fix it.

mellosouls•7mo ago

Possibly so - which is why I think research towards quality rather than just test-passing would be a significant benefit.

wilkinsonsmooth•7mo ago

One thing about AI that I feel like no company that is already inserting into their workforce is thinking about is what the future looks like when your company depends on it. If AI is doing the work that junior employees used to do, then you are losing the base knowledge that your employees used to learn. Maybe in the coming years it starts to take over more and more roles that people used to do and companies can decrease their workforce. AI comes a lot cheaper than real people (at least that's the selling point).

Most tech companies however tend to operate following a standard enshittification schedule. First they are very cheap, supported by investments and venture capitalists. Then they build a large user base who become completely dependent on them as alternatives disappear (in this case as they lose the institutional knowledge that their employees used to have). Then they seek to make money so the investors can make their profits. In this case I could see the cost of AI rising a lot, after companies have already built it in to their business. AI eventually has to start making money. Just like Amazon had to, and Facebook, and Uber, and Twitter, and Netflix, etc.

From all the talk I see of companies embracing AI wholeheartedly it seems like they aren't looking any further than the next quarter. It only costs so much per month to replace so many man hours of work! I'm sure that won't last once AI is deeply embedded into so many businesses that they can start charging whatever they want to.

scotty79•7mo ago

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

Even if that was true for everybody reviews would still be worth doing because when the code is reviewed it gets more than one pair of eyes looking at it.

So it's still worth using AI even if it's slower than writing code yourself. Because you wouldn't have made mistakes that AI would made and AI wouldn't make mistakes you would have made.

It still might be personally not worth it for you though if you prefer to write code than to read it. Until you can set up AI as a reviewer for yourself.

mdavid626•7mo ago

I fully agree. People who save lots of time, are the people who don’t care about the code. If it builds and the golden scenario works, it’s good to go. It doesn’t matter, that it’ll cost multiple times more to fix the bugs. Hey, they were fast!

mdavid626•7mo ago

AI offers many solutions and also brings many problems. Some people like to see only the good side and act, as if there would be no bad side.

One of the biggest problem I see with AI is, that it makes people used to NOT to think. It takes lots of time and energy to learn to program and design complex software. AI doesn’t solve this - humans to be able to supervise need to have these skills. But why would new programmers learn them? AI writes their code! It’s already hard to convince them otherwise. This only leads to bad things.

Technology without proper control and wisdom, destroys human things. We saw this many times already.

maujun•7mo ago

Agreed.

StackOverflow makes it easier not think and copy-paste. Autocomplete makes it easier to not think and make typos (Hopefully you have static typing). Package management makes it easier to not think and introduce heavy dependencies. C makes it easier to not think and forget to initialize variables. I make it easier to not think and read without considering evil (What if every word I say has evil intention and effect?)

mdavid626•7mo ago

Everything you said, except for Stackoverflow is an abstraction.

Abstractions are making you think of different things. They “hide” some detail and allow you to focus on something else. Of course, the abstraction has its price.

This is true for AI too. The price is the problem.

mdavid626•7mo ago

AI is like a factory, which produces goods. People are happy: they have the goods and can improve their lifes. These people see only the goods and are really surprised why some people talk about AI being not useful for them or even having a negative effect.

The reality is, that this factory is also leaking toxic waste into the nature. There are people who already see this, and try to warn the rest. Of course, the factory doesn’t care, unless it’s forced to.

The toxic waste started to accumulate. People start to get sick… but nobody cares.

mumbisChungo•7mo ago

The more people that feel this way the better (for those of us who they do work for).

deterministic•7mo ago

I use non-AI code generators written by myself all the time. It is a huge time saver. Usually about 90% of the code I need for a typical biz application is auto generated => no coding needed, no bugs, no AI weirdness etc.

Having said that, for simple ad-hoc code generation (I need a dump function for this data structure for example) AI's work great.

careful_ai•7mo ago

Miguel nails the core issue: LLMs often feel like interns with no memory—raw, forgetful, and unreliable for anything beyond boilerplate. As many commenters point out, they need constant supervision and never actually learn your patterns or architecture.

We ran into the same problem when rolling out AI-assisted code reviews and code generation pipelines. What helped us was adopting AppMod.AI's Project Analyzer: - Memory & context retention: It parses your full repo, builds architecture diagrams and dependency maps—so AI suggestions stay grounded in real code structure. - Human-in-the-loop chat interface: You can ask clarifying questions like, “Does this function follow our performance pattern?” and get guided explanations from the tool before merging. - Collaborative refactor tracking: It tracks changes and technical debt over time, making it easy to spot drift or architectural erosion—something pure LLMs miss. - Prompt-triggered cost and quality metrics: You can see how often you call AI, what it costs, and its success rates in passing your real tests—not just anecdotal gains.

It’s far from perfect, but it shifts the workflow from “LLM writes → you fix” to “LLM assists within your live code context, under your control.” Others have noted similar limitations in Copilot and GPT-4 based tools—where human validation remains essential .

In short: LLMs aren’t going to replace senior devs—they’re tools that need tooling. Blending AI insights with architecture-aware context and built-in human validation feels like the best middle path so far.

aayushi_1607•7mo ago

Wow, this is such a refreshing take—and totally agree with the “intern with no memory” analogy! We’ve all felt that disconnect when LLMs generate decent code but miss the bigger picture (like architectural patterns, naming conventions, or even basic repo logic).

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

Generative AI coding tools and agents do not work for me

Comments