90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.
I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.
Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.
Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.
Perhaps I need to bind to a hot key instead of using the default always-on setting.
---
I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.
Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.
I believe your reported experience, but mine (and presumably many others') is different.
With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.
Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.
Editing code is where I focus on it, but I'm also reading docs, running tests,...
Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.
And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.
The skeptics will forever move the goal posts.
However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.
IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).
Compare to a hand saw. You still see them in specialty work and hobby shops, but you don't see them on construction sites. You see circular saws. Same with hammers. You'll probably still see them in job sites, but with far less usage than nail guns. And in many contexts nail guns have completely replaced hammers. There are still people griping about power tools but the industry doesn't care. I know a fair number of people in the trades and I can't imagine any of them seriously suggesting that you don't need to know how to use power tools.
My argument is that, assuming AI fulfills the expectation of those who hype it (and that assumption has yet to be proven), we will see a similar effect in software. The results will speak for themselves and make the arguments irrelevant. That hasn't happened yet, leaving room for genuine debate.
This. The dev's outcompeting by using AI today are too busy shipping, rather than wasting time writing blog posts about what ultimately, is a skill-issue.
It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.
[1] https://devops.com/study-finds-no-devops-productivity-gains-...
There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.
Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?
I don't send my coworkers lists of micromanaged directions that give me a pretty clear expectation of what their PR is going to look like. I do however, occasionally get tagged on a review for some feature I had no part in designing, in a part of some code base I have almost no experience with.
Reviewing that the components you asked for do what you asked is a much easier scenario.
Maybe if people are asking an LLM to build an entire product from scratch with no guidance it would take a lot more effort to read and understand the output. But I don't think most people do that on a daily basis.
Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.
I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.
I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.
Closest parallel I can think of is the code-generation-from-UML era, but that explicitly kept the design decisions on the human side, and never really took over the world.
But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.
This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.
None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.
And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.
That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.
But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.
People's lives are literally at stake. If my systems screw up, people can die.
And I will continue to use AI to help get through all that. It doesn't make me any less responsible for the result.
We don't have a theory of LLMs that provides a basis on which to trust them. The people who create them do not test them in a way that passes muster with experts in the field of testing. Numerous articles by people at least as qualified as you cast strong doubt on the reliability of LLMs.
But you say "trust me!"
Stockton Rush assured us that his submersible was safe, despite warnings from experts. He also made noises about being responsible.
The fact there is AI involved doesn't change the nature of the work. Engineers and coders are paid to produce functioning results, and thorough code review is sometimes but not always involved. None of that changes. Software developers make mistakes, regardless of whether there is an AI involved or not. So introducing AI literally changes nothing in terms of the validation chain.
If you're trying to prevent a Stockton Rush type personality from creating larger social problems, then you're talking about regulating the software industry presumably like how the Engineering industry is regulated. However again, that doesn't change anything about the tools, only who and how responsibility flows.
The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).
Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.
Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.
But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.
With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".
It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.
ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.
- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.
- the workplace is not a meritocracy. People are not defined by their wealth.
- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.
- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?
- Folks can have different priorities and it's as simple as that
We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.
Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.
Instead you can get comfortable prompting and managing context with aider.
Or you can use claude code with a pro subscription for a fair amount of usage.
I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.
While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).
It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.
IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).
[0] https://i.imgur.com/FevOm0o.png
[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...
[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things
[3] the code it produced for [2] was fixable to do a simple BVH
[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.
No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.
> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.
While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.
If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?
I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.
AI isn't a panacea, but it can be the right tool for the job.
>I do not agree it is something you can pick up in an hour.
But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).
>You have to learn what AI is good at.
More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.
"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?
I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.
That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.
That is software engineering realm, not using LLMs realm. You have to answer all of these questions even with traditional coding. Because they’re not coding questions, they’re software design questions. And before that, there were software analysis questions preceded by requirements gathering questions.
A lot of replies around the thread is conflating coding activities with the parent set of software engineering activities.
LLMs can help answer the questions. However, they're not going to necessarily make the correct choices or implementation without significant input from the user.
You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?
Correct prompting these days what makes a difference in tasks like SWE-verified.
For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.
This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.
And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.
I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.
I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).
I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.
My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."
For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.
It sounds like a lot when I type it out, but it is pretty quick to read through and edit.
The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.
I'm not sure if this is the best process, but it seems to work pretty well.
The problem with overinvesting in a brand new, developping field is that you get skills that are soon to be redundant. You can hope that the skills are gonna transfer to what will be needed after, but I am not sure if that will be the case here. There was a lot of talk about prompting techniques ("prompt engineering") last year, and now most of these are redundant and I really don't think I have learnt something that is useful enough for the new models, nor have I actually understood sth. These are all tricks and tips level, shallow stuff.
I think these skills are just like learning how to use some tools in an ide. They increase productivity, it's great but if you have to switch ide they may not actually help you with the new things you have to learn in the new environment. Moreover, these are just skills in how to use some tools; they allow you to do things, but we cannot compare learning how to use tools vs actually learning and understanding the structure of a program. The former is obviously a shallow form of knowledge/skill, easily replaceable, easily redundant and probably not transferable (in the current context). I would rather invest more time in the latter and actually get somewhere.
The things that will change may be prompts or MCP setups or more specific optimisations like subagents. Those may require more consideration of how much you want to invest in setting them up. But the majority of setup you do for Claude Code is not only useful to Claude Code. It is useful to human developers and other agent systems as well.
> There was a lot of talk about prompting techniques ("prompt engineering") last year and now most of these are redundant.
Not true, prompting techniques still matter a lot to a lot of applications. It's just less flashy now. In fact, prompting techniques matter a ton for optimising Claude Code and creating commands like the planning prompt I created. It matters a lot when you are trying to optimise for costs and use cheaper models.
> I think these skills are just like learning how to use some tools in an ide. > if you have to switch ide they may not actually help you
A lot of the skills you learn in one IDE do transfer to new IDEs. I started using Eclipse and that was a steep learning curve. But later I switched to IntelliJ IDEA and all I had to re-learn were key-bindings and some other minor differences. The core functionality is the same.
Similarly, a lot of these "agent frameworks" like Claude Code are very similar in functionality, and switching between them as the landscape shifts is probably not as large of a cost as you think it is. Often it is just a matter of changing a model parameter or changing the command that you pass your prompt to.
Of course it is a tradeoff, and that tradeoff probably changes a lot depending upon what type of work you do, your level of experience, how old your codebases are, how big your codebases are, the size of your team, etc... it's not a slam dunk that it is definitely worthwhile, but it is at least interesting.
Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.
The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.
I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.
I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.
Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.
This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.
i.e. continually gambling and praying the model spits something out that works instead of thinking.
But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.
But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.
If the outcome is indistinguisable from using "thinking" as the process rather than brute force, why would the process matter regarding how the outcome was achieved?
Your concept of thinking is the classic retoric - as soon as some "ai" manages to achieve something which previously wasn't capable, it's no longer AI and is just xyz process. It happened with chess engines, with alphago, and with LLMs. The implication being that human "thinking" is somehow unique and only the AI that replicate it can be considered to have "thinking".
From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.
> a couple Claude instances running in the background chewing through simple yet time consuming tasks.
If you don't mind, I'd love to hear more about this. How exactly are they running the background? What are they doing? How do you interact with them? Do they have access to your file system?Thank you!
If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.
Maybe not hours, but extended periods of time, yes. Agents are very quick, so they can frequently complete tasks that would have taken me hours in minutes.
> The page says $17 per month. That's unlimited usage?
Each plan has a limited quota; the Pro plan offers you enough to get in and try out Claude Code, but not enough for serious use. The $100 and $200 plans still have quotas, but they're quite generous; people have been able to get orders of magnitude of API-cost-equivalents out of them [0].
> If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.
Perhaps, but for now, you still need to have some degree of vague competence to know what to look out for and what works best. Might I suggest using the tools to get work done faster so that you can relax for the rest of the day? ;)
[0]: https://xcancel.com/HaasOnSaaS/status/1932713637371916341
From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.
Detailed specs are certainly a transferable skill, what isn't is the tedious hand holding and defensive prompting. In my entire career I've worked with a lot of people, only one required as much hand holding as AI. That person was using AI to do all their work.
LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.
This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.
--------
[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.
Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...
LLMs absolutely can improve over time.
We all want many things, doesn't mean someone will pay you for it. You want to tinker? Great, awesome, more power to you, tinker on personal projects to your heart's content. However, if someone pays you to solve a problem, then it is our job to find the best, most efficient way to cleanly do it. Can LLMs do this on their own most of the time? I think not, not right now at least. The combination of skilled human and LLM? Most likely, yes.
Maybe I'll retrain for lab work, I know a few people in the area, yeah I'd need a pay cut, but… Heck, I've got the mortgage paid, so I could take quite a cut and not be destitute, especially if I get sensible and keep my savings where they are and building instead of getting tempted to spend them! I don't think it'll get to that point for quite a few years though, and I might have been due to throw the towel in by that point anyway. It might be nice to reclaim tinkering as a hobby rather than a chore!
A million times yes.
And we live in a time in which people want to be called "programmers" because it's oh-so-cool but not doing the work necessary to earn the title.
This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.
*dusts off hands* Problem solved! Man, am I great at management or what?
Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.
Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.
This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR
This is a recipe for disaster with AI agents. You have to read every single line carefully, and this is much more difficult for the large majority of people out there than if you had written it yourself. It's like reviewing a Junior's work, except I don't mind reviewing my Junior colleague's work because I know they'll at least learn from the mistakes and they're not a black box that just spews bullshit.
Which is kind of like if AI wrote it: except someone is standing behind those words.
I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.
Or are you talking about OP not knowing AI tools enough?
The saying is, "You can lead a horse to water, but you can't make him drink." I intended no more profound meaning than that. A quip. Nothing more.
Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.
Even when explicitly prompted to use first-principles analysis, AI models can struggle because:
- They lack the intuitive understanding of when to discard prior assumptions
- They don't naturally distinguish between surface-level similarity and deep structural similarity
- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics
This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.
Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.
Humans are also not as susceptible to context poisoning, unlike llms.
I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.
I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.
But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.
[1] https://storage.googleapis.com/gweb-research2023-media/pubto...
Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.
After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.
>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).
In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).
To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.
One person might feel like their job is just coding the same CRUD app over and over re-skinned. Where-as I feel my job is to simplify code by figuring out better structures and abstractions to model the problem domain which together solve systemic issues with the delivered system and enables more features to work together without issue and be added to the system, as well as making changes and new features/use-cases delivery faster.
The latter I find a creative exercise, the former I might get bored and wish AI could automate it away.
I think what it is you are tasked with doing exactly at your job will also mean that your use of agentic AI actually makes you more productive or not.
I went with OP's hypothesis that you are not faster, you throw things at the wall, wait, and see if it sticks, or re-throw it until it does. This reduces your cognitive load, but might not actually make you more productive.
I'm assuming here that "you are not more productive" already accounted for what you are saying. Like in a 8h day, without AI, you get X done, and with AI you also get X done, likely because during the peak productivity hours of your day you get more done without AI, but when you are mentally tired you get less done, and it evens out with a full day of AI work.
There's no data here, it's all just people's intuition and impression, not actually measuring their productivity in any quantifiable way.
What you hypothesize could also be true, it the mental load is reduced, can you sustain a higher productivity for longer? We don't know, maybe.
It's not maybe, it's confirmed fact. Otherwise there wouldn't be burnout epidemic.
https://www.mayoclinic.org/healthy-lifestyle/adult-health/in...
Of the six general causes listed, four are institutional or social, having to do more with the workplace or coworkers: lack of control, lack of clarity, interpersonal conflicts, lack of support. IME, in tech, these are far more common causes and more deeply tied to the root of the issue than specifics of work.
The remaining two are productivity-related issues: too much/little to do, problems with WLB.
I would note these are tied into lack of control/clarity/support, and conflict. In a healthy work environment, expectations should be clear and at least somewhat flexible depending on employee feedback, and adequate support should be provided by the employer.
That aside, it's unclear, and I would argue unlikely, that AI-related productivity gains will help with workload issues. If you do disproportionately more work in an overworked team/org, you will simply be given more work. If many people see gains in productivity, then either the bar for productivity goes up, or there's layoffs. Even if you manage to squeak by / quiet quit with much reduced cognitive load for coding, and that's most of your job, unless you are fully remote the most likely change is your butt-in-seat time will go from "mentally taxing coding" to "mentally toxic doomscrolling."
AI hits 3, but not the other two. Given the current human condition, this is a dangerous combination! It will win, but at the cost of the other two.
https://news.ycombinator.com/item?id=44297190
Already replied better.
Couldn't this result in being able to work longer for less energy, though? With really hard mentally challenging tasks I find I cap out at around 3-4 hours a day currently
Like imagine if you could walk at running speed. You're not going faster.. but you can do it for way longer so your output goes up if you want it to
The later is not making any neuron embedding tradeoff when they hand of the slog to agents.
There’s a lot of software development in that latter category.
Apparently models are not doing great for problems out of distribution.
But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.
write a stub for a react context based on this section (which will function as a modal):
```
<section>
// a bunch of stuff
</section>
```
Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)
For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.
That’s an issue I have with generated code. More often, I start with a basic design that evolves based on the project needs. It’s an iterative process that can span the whole timeline. But with generated code, it’s a whole solution that fits the current needs, but it’s a pain to refactor.
Both of these would take longer than 5 minutes. There's also no "lifting" as this case involves both Provider and Context, so you'd have to combine React doc examples.
The only alternative would be knowing it by heart, which you might, but I don't (nor do I particularly care to). There's definitely a force multiplier here, even if just in the boring boilerplate cases.
Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.
My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.
How much do you believe a programmer needs to layout to “get good”?
I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.
I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.
At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.
I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.
I think that getting "good" at using AI means that you figure out exactly how to formulate your prompts so that the results are what you are looking for given your code base. It also means knowing when to start new chats, and when to have it focus on very specific pieces of code, and finally, knowing what it's really bad at doing.
For example, if I need to have it take a list of 20 fields and create the HTML view for the form, it can do it in a few seconds, and I know to tell it, for example, to use Bootstrap, Bootstrap icons, Bootstrap modals, responsive rows and columns, and I may want certain fields aligned certain ways, buttons in certain places for later, etc, and then I have a form - and just saved myself probably 30 minutes of typing it out and testing the alignment etc. If I do things like this 8 times a day, that's 4 hours of saved time, which is game changing for me.
Another great example, is the power of tabbing with Cursor. If I want to change the parameters of a function in my React app, I can be at one of the functions anywhere in my screen, add a variable that relates to what is being rendered, and I can now quickly tab through to find all the spots that also are affected in that screen, and then it usually helps apply the changes to the function. It's like smart search and replace where I can see every change that needs made but it knows how to make it more intelligently than just replacing a line of code - and I didn't have to write the regex to find it, AND it usually helps get the work done in the function as well to reflect the change. That could save me 3-5 minutes, and I could do that 5 times a day maybe, and another almost half-hour is saved.
The point is, these small things add up SO fast. Now I'm incredibly efficient because the tedious part of programming has been sped up so much.
This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?
This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.
Using a package that hundreds of thousands of other people use is low risk, it is battle tested
It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments
And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it
Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding
I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.
I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.
Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.
The only thing AI got wrong is that the virtual functions were symlinks and not directories.
The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.
That is the mental model I have for the work (computer programing) i like to do and am good at.
Plumbing
In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.
On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!
The entire code? Not there, but with debuggers, I've even started doing that a bit.
The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.
But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.
Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.
That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.
If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".
This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.
I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.
It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.
This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.
So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.
I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.
I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.
The same goes with shell scripting.
But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.
There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.
The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.
And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...
Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.
Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.
Certainly, however:
> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work
The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).
Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".
My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.
You're missing the point.
tptacek is saying he isn't the one who needs to fix the issue because he can just reject the PR and either have the AI agent refine it or start over. Or ultimately resort to writing the code himself.
He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.
> My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code.
There's a vast chasm between simple enough that a non-AI code generator can generate it using templates and simple enough that a fast read-through is enough to show that it's okay to run.
As an example, the other day I had my own agent generate a 1kloc API client for an API. The worst case scenario other than failing to work would be that it would do something really stupid, like deleting all my files. Since it passes its tests, skimming it was enough for me to have confidence that nowhere does it do any file manipulation other than reading the files passed in. For that use, that's sufficient since it otherwise passes the tests and I'll be the only user for some time during development of the server it's a client for.
But no template based generator could write that code, even though it's fairly trivial - it involved reading the backend API implementation and rote-implementation of a client that matched the server.
Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...
> He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.
I think you are missing the point as well, that's still review, that's still being on the hook.
Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.
But I hear you say "all software works like that", well, yes, to some degree. The difference being, one you hopefully actually wrote and have some idea what's going wrong, the other one?
Well, you just have to sort of hope it works and when it doesn't, well you said it yourself. Your code was garbage anyways, time to "kill" it and generate some new slop...
Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?
I'm 30 years of development, I've seen nothing like it.
> I think you are missing the point as well, that's still review, that's still being on the hook.
I don't know if you're being intentionally obtuse, or what, but while, yes, you're on the hook for the final deliverable, you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over.
The point you seem intent on missing is that the cost of throwing out the work of another developer is high, while the cost of throwing out the work of an AI assistant is next to nothing, and so where you need to carefully review a co-workers code because throwing it away and starting over from scratch is rarely an option, with AI generated code you can do that at the slightest whiff of an issue.
> Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.
No, they are not a problem at all. They point to a difference in opportunity cost. If the rate at which you kill code is too high, it's a problem irrespective of source. But the point is that this rate can be much higher for AI code than for co-workers before it becomes a problem, because the cost of starting over is orders of magnitude different, and this allows for a very different way of treating code.
> Well, you just have to sort of hope it works and when it doesn't
No, I don't "hope it works" - I have tests.
I'd argue you are quite a bit beyond "rote" code at that point (with the understanding and protocol bits). But, generating client code is not hard, there are numerous generators around e.g. swagger:
https://swagger.io/ https://swagger.io/tools/swagger-codegen/
In ten years I expect other generators/platforms exist too, that's merely one I'm familiar with.
> you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over. > ... > No, I don't "hope it works" - I have tests.
These are contradictory statements. Every instance of that code you are responsible for, or you wouldn't test it and you wouldn't deign to "need" to throw it away.
> They point to a difference in opportunity cost.
Yes, we are all ultimately concerned with this. However this is not an easy metric to quantify, clearly you feel your OC (Opportunity Cost) because maybe you don't work well with other humans, ok whatever, however you are likely overestimating the supposed savings, and underestimating the lost OC of working with other developers, or simply writing code that doesn't need to be thrown out at all...
I explicitly wasn't trying to persuade anyone that the cost/benefit tradeoff for LLM coding was positive. I obviously believe it is, but reasonable people can disagree.
With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.
Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.
"Ship it!" - me
I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.
I don't think code is ever "obviously right" unless it is trivially simple
So with this "obviously right" rubric I would wind up rejecting 95% of submissions, which is a waste of my time and energy. How about instead I just write it myself? At least then I know who's responsible for cleaning up after the it.
The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.
My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.
The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.
But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.
The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.
The worse case is you took the time to write 2 prompts that didn’t work.
I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful
However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.
The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.
That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.
Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.
You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.
That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work
If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly
A stochastic parrot can never be trusted, let alone one that tweaks its model every other night.
I totally get that not all code ever written needs to be correct.
Some throw-away experiments can totally be one-shot by AI, nothing wrong with that. Depending on the industry one works in, people might be on different points of the expectation spectrum for correctness, and so their experience with LLMs vary.
It's the RAD tool discussion of the 2000s, or the "No-Code" tools debate of the last decade, all over again.
The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.
The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.
I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?
Doesn't matter, I'm not responsible for maintaining that particular code
The code in my PRs has my name attached, and I'm not trusting any LLM with my name
If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.
I'm ultimately still responsible for the code. And unlike AI, library authors but their and their libraries reputation on the line.
"A computer can never be held accountable therefore a computer should never make a management decision"
I think we need to go back to this. I think a computer cannot be held accountable so a computer should never make any decision with any kind of real world impact
Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.
Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.
Who is accountable for the code that AI writes?
i say we make it the original publishers of the data ingested by the AI during training. Just for the court battles.
Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.
I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.
Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?
Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.
Using a third party Library? I force it to look up every function doc online and it still has errors.
Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.
Commenter Doug asks:
> > what AI coding tools have you utilized
Miguel replies:
> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?
Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.
I can't help but point out the irony here: you write a great deal on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.
And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:
> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.
Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.
> my personal experience with these tools, from a strictly technical point of view
> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.
Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.
I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.
I also write all of this, I guess, as someone who thinks commenter Doug seems like a way cool guy, a decent chap who asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.
Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of everybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence paired with the seductive illusion of understanding.
I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.
Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.
Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.
It's not that LLMs are better, it's search that got entshittified.
The "plain" Google Search before LLM never had the capability to copy&paste an entire lengthy stack trace (e.g. ~60 frames of verbose text) because long strings like that exceeds Google's UI. Various answers say limit of 32 words and 5784 characters: https://www.google.com/search?q=limit+of+google+search+strin...
Before LLM, the human had to manually visually hunt through the entire stack trace to guess at a relevant smaller substring and paste that into Google the search box. Of course, that's do-able but that's a different workflow than an LLM doing it for you.
To clarify, I'm not arguing that the LLM method is "better". I'm just saying it's different.
But I did it subconsciously. I never thought of it until today.
Another skill that LLM use can kill? :)
I could break most passwords of an internal company application by googling the SHA1 hashes.
It was possible to reliably identify plants or insects by just googling all the random words or sentences that would come to mind describing it.
(None of that works nowadays, not even remotely)
Which is never? Do you often just lie to win arguments? LLM gives you a synthesized answer, search engine only returns what already exists. By definition it can not give you anything that is not a super obvious match
In my experience it was "a lot". Because my stack traces were mostly hardware related problems on arm linux in that period.
But I suppose your stack traces were much different and superior and no one can have stack traces that are different from yours. The world is composed of just you and your project.
> Do you often just lie to win arguments?
I do not enjoy being accused of lying by someone stuck in their own bubble.
When you said "Which is never" did you lie consciously or subconsciously btw?
Whatever it is specifically, the idea that you could just paste a 600 line stack trace unmodified into google, especially "way before AI" and get pointed to the relevant bit for your exact problem is obviously untrue.
Pasting stack traces and kernel oopses hasn't worked in quite a while, I think. It's very possible that the maximum query was longer in the past.
2000 characters is also more than a double spaced manuscript page as defined by the book industry (which seems to be about 1500). You can fit the top of a stack trace in there. And if you're dealing with talking to hardware, the top can be enough.
And indeed, in the early days the maximum query length was 10 words. So no, you have never been able to paste an entire stack trace into google and magically get a concise summary.
If you are changing the original claim that you were responding to to "I can do my job without llms if I have google search" Sure of course anyone can. But you can't use that to dismiss that some people find it quite convenient to just dump the entire stack trace into a text chat and have a decent summary of what is important without having to read a single part of it.
Very few devs bother to post stack traces (or generally any programming question) online. They only do that when they're stuck so badly.
Most people work out their problem then move on. If no one posts about it your search never hits.
We have a habit of finding efficiencies in our processes, even if the original process did work.
Analyzing crash dumps and figuring out what's going on is a pretty useful skill.
At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.
AI is a search engine that can also remix its results, often to good effect.
I mean yes, current large models are essentially compressing incredible amounts of content into something manageable by a single Accelerator/GPU, and making it available for retrieval through inference.
Which strongly discouraged trying to teach people.
> And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago
ChatGPT acts exactly opposite to the SO mods.
> But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.
Here it acts exactly like what SO mods like.
The other comments are mostly people thinking this is about ChatGPT...
What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.
Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.
I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.
That doesn't mean it knows the answer. That means it guessed or hallucinated correctly. Guessing isn't knowing.
edit: people seem to be missing my point, so let me rephrase. Of course AIs don't think, but that wasn't what I was getting at. There is a vast difference between knowing something, and guessing.
Guessing, even in humans, is just the human mind statistically and automatically weighing probabilities and suggesting what may be the answer.
This is akin to what a model might do, without any real information. Yet in both cases, there's zero validation that anything is even remotely correct. It's 100% conjecture.
It therefore doesn't know the answer, it guessed it.
When it comes to being correct about a language or API that there's zero info on, it's just pure happenstance that it got it correct. It's important to know the differences, and not say it "knows" the answer. It doesn't. It guessed.
One of the most massive issues with LLMs is we don't get a probability response back. You ask a human "Do you know how this works", and an honest and helpful human might say "No" or "No, but you should try this. It might work".
That's helpful.
Conversely a human pretending it knows and speaking with deep authority when it doesn't is a liar.
LLMs need more of this type of response, which indicates certainty or not. They're useless without this. But of course, an LLM indicating a lack of certainty, means that customers might use it less, or not trust it as much, so... profits first! Speak with certainty on all things!
You want to say this guy's experience isn't reproducible? That's one thing, but that's probably not the case unless you're assuming they're pretty stupid themselves.
You want to say that it Is reproducible, but that "that doesn't mean AI can think"? Okay, but that's not what the thread was about.
As to 'knows the answer', I'm don't even know what that means with these tools. All I know is if it is helpful or not.
The amazing thing about LLMs is that we still don’t know how (or why) they work!
Yes, they’re magic mirrors that regurgitate the corpus of human knowledge.
But as it turns out, most human knowledge is already regurgitation (see: the patent system).
Novelty is rare, and LLMs have an incredible ability to pattern match and see issues in “novel” code, because they’ve seen those same patterns elsewhere.
Do they hallucinate? Absolutely.
Does that mean they’re useless? Or does that mean some bespoke code doesn’t provide the most obvious interface?
Having dealt with humans, the confidence problem isn’t unique to LLMs…
You may want to take a course in machine learning and read a few papers.
LLMs are insanely complex systems and their emergent behavior is not explained by the algorithm alone.
Goodness this is a dim view on the breadth of human knowledge.
But I look down my nose at conceptions that human knowledge is packagable as plain text, our lives, experience, and intelligence is so much more than the cognitive strings we assemble in our heads in order to reason. It’s like in that movie Contact when Jodie Foster muses that they should have sent a poet. Our empathy and curiosity and desires are not encoded in UTF8. You might say these are realms other than knowledge, but woe to the engineer who thinks they’re building anything superhuman while leaving these dimensions out, they’re left with a cold super-rationalist with no impulse to create of its own.
When I built my own programming language and used it to build a unique toy reactivity system and then asked the LLM "what can I improve in this file", you're essentially saying it "only" could help me because it learned how it could improve arbitrary code before in other languages and then it generalized those patterns to help me with novel code and my novel reactivity system.
"It just saw that before on Stack Overflow" is a bad trivialization of that.
It saw what on Stack Overflow? Concrete code examples that it generalized into abstract concepts it could apply to novel applications? Because that's the whole damn point.
* Read the signatures of the functions.
* Use the code correctly.
* Answer questions about the behavior of the underlying API by consulting the code.
Of course they're just guessing if they go beyond what's in their context window, but don't underestimate context window!
"If you're getting answers, it has seen it elsewhere"
The context window is 'elsewhere'.
As they say, it sounds like you're technically correct, which is the best kind of correct. You're correct within the extremely artificial parameters that you created for yourself, but not in any real world context that matters when it comes to real people using these tools.
To anyone who has used these tools in anger it’s remarkable given they’re only trained on large corpuses of language and feedback they’re able to produce what they do. I don’t claim they exist outside their weights, that’s absurd. But the entire point of non linear function activations with many layers and parameters is to learn highly complex non linear relationships. The fact they can be trained as much as they are with as much data as they have without overfitting or gradient explosions means the very nature of language contains immense information in its encoding and structure, and the network by definition of how it works and is trained does -not- just return what it was trained on. It’s able to curve fit complex functions that inter relate semantic concepts that are clearly not understood as we understand them, but in some ways it represents an “understanding” that’s sometimes perhaps more complex and nuanced than even we can.
Anyway the stochastic parrot euphemism misses the point that parrots are incredibly intelligent animals - which is apt since those who use that phrase are missing the point.
It’s silly to say that something LLMs can reliably do is impossible and every time it happens it’s “dumb luck”.
How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.
Correct answers couldn't be dumb luck either, because otherwise the models would pretty much only hallucinate (the space of wrong answers is many orders of magnitude larger than the space of correct answers), similar to the early proto GPT models.
This is false. You are off by ~4 orders of magnitude by claiming these models are trained on exabytes of data. It is closer to 500TB of more curated data at most. Contrary to popular belief LLMs are not trained on "all of the data on the internet". I responded to another one of your posts that makes this false claim here:
"<the human brain> cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck."
Modern implementations of LLMs can "do research" by performing searches (whose results are fed into the context), or in many code editors/plugins, the editor will index the project codebase/docs and feed relevant parts into the context.
My guess is they either were using the LLM from a code editor, or one of the many LLMs that do web searches automatically (ie. all of the popular ones).
They are answering non-stackoverflow questions every day, already.
This happens all the time via RAG. The model “knows” certain things via its weights, but it can also inject much more concrete post-training data into its context window via RAG (e.g. web searches for documentation), from which it can usefully answer questions about information that may be “not in its training data”.
People don't think that. Especially not the commentor you replied to. You're human-hallucinating.
People think LLM are trained on raw documents and code besides StackOverflow. Which is very likely true.
Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.
Models work across programming languages because it turned out programming languages and API are much more similar than one could have expected.
I mean... They also can read actual documentation. If I'm working on any api work or a language I'm not familiar with, I ask the LLM to include the source they got their answer from and use official documentation when possible.
That lowers the hallucination rate significantly and also lets me ensure said function or code actually does what the llm reports it does.
In theory, all stackoverflow answers are just regurgitated documentation, no?
This 100%. I use o3 as my primary search engine now. It is brilliant at finding relevant sources, summarising what is relevant from them, and then also providing the links to those sources so I can go read them myself. The release of o3 was a turning point for me where it felt like these models could finally go and fetch information for themselves. 4o with web search always felt inadequate, but o3 does a very good job.
> In theory, all stackoverflow answers are just regurgitated documentation, no?
This is unfair to StackOverflow. There is a lot of debugging and problem solving that has happened on that platform of undocumented bugs or behaviour.
Obviously this isn’t true. You can easily verify this by inventing and documenting an API and feeding that description to an LLM and asking it how to use it. This works well. LLMs are quite good at reading technical documentation and synthesizing contextual answers from it.
On a related note, I recently learned that you can still subscribe to the Encyclopedia Britannica. It's $9/month, or $75/year.
Considering the declining state of Wikipedia, and the untrustworthiness of A.I., I'm considering it.
I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .
Got tired of MS's licensing BS and I made the switch.
This meant learning Proxmox, Linux, Pangolin, UV, Python, JS, Bootstrap, NGinx, Plausible, SQLite, Postgress ...
Not all of these were completely new, but I had never dove in seriously.
Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.
It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.
AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.
Don't get me wrong, I tried. But even when pasting the documentation in, the amount of times it just hallucinated parameters and arguments that were not even there were such a huge waste of time, I don't see the value in it.
Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.
It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.
Sure, there is a chance that one day AI will be smart enough to read an entire codebase and chug out exhaustively comprehensive and accurate documentation. I'm not convinced that is guaranteed to happen before our collective knowledge falls off a cliff.
Thats why AI works for him and not for you.
We both agree.The difference between me and the person I responded to is that I feel I understand the perspective of the OP and I was trying to help the person who it didn't make sense to to understand the perspective.
I disabled AI autocomplete and cannot understand how people can use it. It was mostly an extra key press on backspace for me.
That said, learning new languages is possible without searching anything. With a local model, you can do that offline and have a vast library of knowledge at hand.
The Gemini results integrated in Google are very bad as well.
I don't see the main problem to be people just lazily asking AI for how to use the toilet, but that real knowledge bases like stack overflow and similar will vanish because of lacking participation.
Sort of. The process of working through the question is what drives learning. If you just receive the answer with zero effort, you are explicitly bypassing the brain's learning mechanism.
There's huge difference between your workflow and fully Agentic AIs though.
Asking an AI for the answer in the way you describe isn't exactly zero effort. You need to formulate the question and mold the prompt to get your response, and integrate the response back into the project. And in doing so you're learning! So YOUR workflow has learning built in, because you actually use your brain before and after the prompt.
But not so with vibe coding and Agentic LLMs. When you hit submit and get the tokens automatically dumped into your files, there is no learning happening. Considering AI agents are effectively trying to remove any pre-work (ie automating prompt eng) and post-work (ie automating debugging, integrating), we can see Agentic AI as explicitly anti-learning.
Here's my recent vibe coding anecdote to back this up. I was working on an app for an e-ink tablet dashboard and the tech stack of least resistance was C++ with QT SDK and their QML markup language with embedded javascript. Yikes, lots of unfamiliar tech. So I tossed the entire problem at Claude and vibe coded my way to a working application. It works! But could I write a C++/QT/QML app again today - absolutely not. I learned almost nothing. But I got working software!
Vibe-coding is just a stop on the road to a more useful AI and we shouldn't think of it as programming.
There is a sweet spot of situations I know well enough to judge a solution quickly, but not well enough to write code quickly, but that's a rather narrow case.
The author is one who appears unwilling to do so.
I still use them, but more as a support tool than a real assistant.
To me the part I enjoy most is making things. Typing all that nonsense out is completely incidental to what I enjoy about it.
Using them for larger bits of code feels silly as I find subtle bugs or subtle issues in places, so I don't necessarily feel comfortable passing in more things. Also, large bits of code I work with are very business logic specific and well abstracted, so it's hard to try and get ALL that context into the agent.
I guess what I'm trying to ask here is what exactly do you use agents for? I've seen youtube videos but a good chunk of those are people getting a bunch of typescript generated and have some front-end or generate some cobbled together front end that has Stripe added in and everyone is celebrating as if this is some massive breakthrough.
So when people say "regular tasks" or "rote tasks" what do you mean? You can't be bothered to write a db access method/function using some DB access library? You are writing the same regex testing method for the 50th time? You keep running into the same problem and you're still writing the same bit of code over and over again? You can't write some basic sql queries?
Also not sure about others, but I really dislike having to do code reviews when I am unable to really gauge the skill of the dev I'm reviewing. If I know I have a junior with 1-2 years maybe, then I know to focus a lot on logic issues (people can end up cobbling toghether the previous simple bits of code) and if it's later down the road at 2-5 years then I know that I might focus on patterns or look to ensure that the code meets the standards, look for more discreet or hidden bugs. With an agent output it could oscilate wildly between those. It could be a solidly written search function, well optimized or it could be a nightmarish sql querry that's impossible to untangle.
Thoughts?
I do have to say I found it good when working on my own to get another set of "eyes" and ask things like "are there more efficient ways to do X" or "can you split this larger method into multiple ones" etc
My company just had internal models that were mediocre at best, but at the beginning this year they finally enabled Copilot for everyone.
At the beginning I was really excited for it, but it’s absolutely useless for work. It just doesn’t work on big old enterprise projects. In an enterprise environment everything is composed of so many moving pieces, knowledge scattered across places, internal terminology, etc. Maybe in the future, with better MCP servers or whatever, it’ll be possible to feed all the context into it to make it spit something useful, but right now, at work, I just use AI as search engine (and it’s pretty good at it, when you have the knowledge to detect when it have subtle problems)
Yep, this is pretty much it. However, I honestly feel that AI writes so much better code than me that I seldom need to actually fix much in the review, so it doesn't need to be as thorough. AI always takes more tedious edge-cases into account and applies best practices where I'm much sloppier and take more shortcuts.
Responsability and "AI" marketing are two non intersecting sets.
Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.
The real problem: Disinformation. Needless extrapolation, poor analogies, over valuing anecdotes.
But there's money to be made. What can we do, sometimes the invisible hand slaps us silly.
Counter counter claim for these use cases: when I do proof of concept, I actually want to increase my understanding of said concept at the same time, learn challenges involved, and in general get a better idea how feasible things are. An AI can be useful for asking questions, asking for reviews, alternative solutions, inspiration etc (it may have something interesting to add or not) but if we are still in the territory "this matters" I would rather not substitute the actual learning experience and deeper understanding with having an AI generate code faster. Similar for hobby projects, do I need that thing to just work or I actually care to learn how it is done? If the learning/understanding is not important in a context, I would say then using AI to generate the code is a great time-saver. Otherwise, I may still use AI but not in the same way.
Revised example: Software where the goal is design experimentation; like with trying out variations of UX ideas.
In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.
Unlike the author of the article I do get a ton of value from coding agents, but as with all tools they are less than useless when wielded incompetently. This becomes more damaging in an org that already has perverse incentives which reward performative slop over diligent and thoughtful engineering.
Most of my teams have been very allergic to assigning personal blame and management very focused on making sure everyone can do everything and we are always replaceable. So maybe I could phrase it like "X could help me with this" but saying X is responsible for the bug would be a no no.
I don't mind fixing bugs, but I do mind reckless practices that introduce them.
One of the most bizarre experiences I have had over this past year was dealing with a developer who would screen share a ChatGPT session where they were trying to generate a test payload with a given schema, getting something that didn't pass schema validation, and then immediately telling me that there must be a bug in the validator (from Apache foundation). I was truly out of words.
What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.
There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.
However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.
It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.
If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.
If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).
I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.
> The quality of the code these tools produce is not the problem.
So even if an AI could produce code of a quality equal to or surpassing the author's own code quality, they would still be uninterested in using it.
To each their own, but it's hard for me to accept an argument that such an AI would provide no benefit, even if one put priority on maintaining high quality standards. I take the point that the human author is ultimately responsible, but still.
There’s your issue, the skill of programming has changed.
Typing gets fast; so does review once robust tests already prove X, Y, Z correctness properties.
With the invariants green, you get faster at grokking the diff, feed style nits back into the system prompt, and keep tuning the infinite tap to your taste.
The more you deviate from that, the more you have to step in.
But given that I constantly forget how to open a file in Python, I still have a use for it. It basically supplanted Stackoverflow.
AI can write some tests, but it can't design thorough ones. Perhaps the best way to use AI is to have a human writing thorough and well documented tests as part of TDD, asking AI to write code to meet those tests, then thoroughly reviewing that code.
AI saves me just a little time by writing boilerplate stuff for me, just one step above how IDEs have been providing generated getters and setters.
Did the author take their own medicine and measure their own productivity?
I set that up to run then do something different. I come back in a couple minutes, scan the diffs which match expectations and move on to the next task.
That’s not everything but those menial tasks where you know what needs to be done and what the final shape should look like are great for AI. Pass it off while you work on more interesting problems.
Either/or fallacy. There exist a varied set of ways to engage with the technology. You can read reference material and ask for summarization. You can use language models to challenge your own understanding.
Are people really this clueless? (Yes, I know the answer, but this is a rhetorical device.)
Think, people. Human intelligence is competing against artificial intelligence, and we need to step it up. Probably a good time to stop talking like we’re in Brad Pitt’s latest movie, Logical Fallacy Club. If we want to prove our value in a competitive world, we need to think and write well.
I sometimes feel like bashing flawed writing is mean, but maybe the feedback will get through. Better to set a quality bar. We should aim to be our best.
> As the reader I had to do too much work to discern your point and what was relevant.
Re-reading, I hope the first ~15 words make my main point:
> Either/or fallacy. There exist a varied set of ways to engage with the technology...
Was this part unclear? Something else?
> you think you're being open-minded...
"Open minded" can mean very different things to different people. I recommend the article "The Proper Use of Humility" by Yudkowsky [1] because it rings true to me. I'm open to hearing other people's points of view, up to a point, given enough time. (Everyone has their limit, whether we admit it or not.) When it comes to assessing truth, I care about good arguments and good evidence, and I heavily discount anything else. If someone says I'm not "open minded" because of what I just wrote, then my reply would be "what do you want me to be more open to?"
There is a gem from in a comment below the above article that deserves repeating:
> People often take open disagreement as a sign of intellectual arrogance, while it is a display of respect and humility; showing respect with the honest acknowledgment of your disagreement, and showing humility in affording the other person a chance to defend themselves and prove you wrong. To say nothing is to treat that person's beliefs dismissively, as if they don't matter, and then assume that discussion was futile because they're incapable of understanding the truth, and of course, couldn't possible have anything to teach you.
> ...but you offered a multiple choice question, where the choices are reductive...
I offered two specific categories (tone or substance) and a third option for "anything else". I'm not following why this feels reductive to you; it leaves space for someone to reply however they like.
> and it comes off as defensive.
I've thought about this word quite a bit. From dictionary.com defensive means "excessively concerned with guarding against the real or imagined threat of criticism, injury to one's ego, or exposure of one's shortcomings." I'm open to criticism and happy to learn. If I'm wrong, I strive to admit it and apologize where needed. At the same time, I am confident enough to push back, stand up for myself, and defend my ideas (which is a different sense of 'defensive').
Here is the backstory to my second comment. The comment I replied to did not strike me as kind, much less well-intentioned. It probably was intended to be an insult, but I replied anyway. I gave the benefit of the doubt while challenging the commenter to give constructive criticism. I strived for clarity and confidence without being defensive or going on a counter-attack. This is a hard balance to strike.
[1] https://www.lesswrong.com/posts/GrDqnMjhqoxiqpQPw/the-proper...
Having a chatbot telling me what to write would have not sorted the same effect.
It's like having someone tell you the solutions to your homework.
Where I find it genuinely useful is in extremely low-value tasks, like localisation constants for the same thing in other languages, without having to tediously run that through an outside translator. I think that mostly goes in the "fancy inline search" category.
Otherwise, I went back from Cursor to normal VS Code, and mostly have Copilot autocompletions off these days because they're such a noisy distraction and break my thought process. Sometimes they add something of value, sometimes not, but I'd rather not have to confront that question with every keystroke. That's not "10x" at all.
Yes, I've tried the more "agentic" workflow and got down with Claude Code for a while. What I found is that its changes are so invasive and chaotic--and better prompts don't really prevent this--that it has the same implications for maintainability and ownership referred to above. For instance, I have a UIKit-based web application to which I recently asked Claude Code to add dark theme options, and it rather brainlessly injected custom styles into dozens of components and otherwise went to town, in a classic "optimise for maximum paperclip production" kind of way. I spent a lot more time un-F'ing what it did throughout the code base than I would have spent adding the functionality myself in an appropriately conservative fashion. Sure, a better prompt would probably have helped, but that would have required knowing what chaos it was going to wreak in advance, as to ask it to refrain from that as part of the prompt. The possibility of this happening with every prompt is not only daunting, but a rabbit hole of cognitive load that distracts from real work.
I will concede it does a lot better--occasionally, very impressively--with small and narrow tasks, but those tasks at which it most excels are so small that the efficiency benefit of formulating the prompt and reviewing the output is generally doubtful.
There are those who say these tools are just in their infancy, AGI is just around the corner, etc. As far as I can tell from observing the pace of progress in this area (which is undeniably impressive in strictly relative terms), this is hype and overextrapolation. There are some fairly obvious limits to their training and inference, and any programmer would be wise to keep their head down, ignore the hype, use these tools for what they're good at and studiously avoid venturing into "fundamentally new ways of working".
The Codex workflow however really is a game changer imo. It takes the time to ensure changes are consistent with other code and the async workflow is just so much nicer.
Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":
I'd be interested to know if any of the existing LLM ELO style leaderboards mark for code quality in addition to issue fixing?
The former seems a particularly useful benchmark as they become more powerful in surface abilities.
But this is one of the core problems with LLM coding, right? It accelerates an already broken model of software development (worse is better) rather than trying to help fix it.
Most tech companies however tend to operate following a standard enshittification schedule. First they are very cheap, supported by investments and venture capitalists. Then they build a large user base who become completely dependent on them as alternatives disappear (in this case as they lose the institutional knowledge that their employees used to have). Then they seek to make money so the investors can make their profits. In this case I could see the cost of AI rising a lot, after companies have already built it in to their business. AI eventually has to start making money. Just like Amazon had to, and Facebook, and Uber, and Twitter, and Netflix, etc.
From all the talk I see of companies embracing AI wholeheartedly it seems like they aren't looking any further than the next quarter. It only costs so much per month to replace so many man hours of work! I'm sure that won't last once AI is deeply embedded into so many businesses that they can start charging whatever they want to.
Even if that was true for everybody reviews would still be worth doing because when the code is reviewed it gets more than one pair of eyes looking at it.
So it's still worth using AI even if it's slower than writing code yourself. Because you wouldn't have made mistakes that AI would made and AI wouldn't make mistakes you would have made.
It still might be personally not worth it for you though if you prefer to write code than to read it. Until you can set up AI as a reviewer for yourself.
One of the biggest problem I see with AI is, that it makes people used to NOT to think. It takes lots of time and energy to learn to program and design complex software. AI doesn’t solve this - humans to be able to supervise need to have these skills. But why would new programmers learn them? AI writes their code! It’s already hard to convince them otherwise. This only leads to bad things.
Technology without proper control and wisdom, destroys human things. We saw this many times already.
StackOverflow makes it easier not think and copy-paste. Autocomplete makes it easier to not think and make typos (Hopefully you have static typing). Package management makes it easier to not think and introduce heavy dependencies. C makes it easier to not think and forget to initialize variables. I make it easier to not think and read without considering evil (What if every word I say has evil intention and effect?)
Abstractions are making you think of different things. They “hide” some detail and allow you to focus on something else. Of course, the abstraction has its price.
This is true for AI too. The price is the problem.
The reality is, that this factory is also leaking toxic waste into the nature. There are people who already see this, and try to warn the rest. Of course, the factory doesn’t care, unless it’s forced to.
The toxic waste started to accumulate. People start to get sick… but nobody cares.
Having said that, for simple ad-hoc code generation (I need a dump function for this data structure for example) AI's work great.
We ran into the same problem when rolling out AI-assisted code reviews and code generation pipelines. What helped us was adopting AppMod.AI's Project Analyzer: - Memory & context retention: It parses your full repo, builds architecture diagrams and dependency maps—so AI suggestions stay grounded in real code structure. - Human-in-the-loop chat interface: You can ask clarifying questions like, “Does this function follow our performance pattern?” and get guided explanations from the tool before merging. - Collaborative refactor tracking: It tracks changes and technical debt over time, making it easy to spot drift or architectural erosion—something pure LLMs miss. - Prompt-triggered cost and quality metrics: You can see how often you call AI, what it costs, and its success rates in passing your real tests—not just anecdotal gains.
It’s far from perfect, but it shifts the workflow from “LLM writes → you fix” to “LLM assists within your live code context, under your control.” Others have noted similar limitations in Copilot and GPT-4 based tools—where human validation remains essential .
In short: LLMs aren’t going to replace senior devs—they’re tools that need tooling. Blending AI insights with architecture-aware context and built-in human validation feels like the best middle path so far.
jumploops•5mo ago
As someone who uses Claude Code heavily, this is spot on.
LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.
I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).
hooverd•5mo ago
jumploops•5mo ago
Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.
I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.
adastra22•5mo ago
bdamm•5mo ago
cbsmith•5mo ago
There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".
Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).
It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.
Terr_•5mo ago
cbsmith•5mo ago
I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.
Terr_•5mo ago
What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?
I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."
cbsmith•5mo ago
Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.
layer8•5mo ago
jyounker•5mo ago
ramraj07•5mo ago
adriand•5mo ago
jumploops•5mo ago
For context, it’s just a reimplementation of a tool I built.
Let’s just say it’s going a lot slower than the first time I built it by hand :)
hatefulmoron•5mo ago
If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.
The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.
vidarh•5mo ago
I don't know what you mean by "a lot of guidance". Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.
Another issue is that as long as you ensure it builds good enough tests, the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.
The code is increasingly becoming throwaway.
hatefulmoron•5mo ago
What do you mean? If it were as simple as not letting it do so, I would do as you suggest. I may as well stop letting it be incorrect in general. Lots of guidance helps avoid it.
> Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.
Well yeah. You need to give it lots of guidance, like someone who works for you.
> the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.
It's a moving target for sure. My confidence with this in more complex scenarios is much smaller.
vidarh•5mo ago
I'm arguing it is as simple as that. Don't accept changes that muddle up the architecture. Take attempts to do so as evidence that you need to add direction. Same as you presumably would - at least I would - with a developer.
hatefulmoron•5mo ago
swader999•5mo ago
mleonhard•5mo ago
jumploops•5mo ago
Years of PT have enabled me to work quite effectively and minimize the flare ups :)
sagarpatil•5mo ago
susshshshah•5mo ago
9rx•5mo ago
wiseowise•5mo ago
adastra22•5mo ago
layer8•5mo ago
genewitch•5mo ago