The credit lies with a more functional style of C++ and typescript (the languages i use for hobbies and work, respectively), but claude has sort of taken me out of the bubble I was brought up in and introduced new ideas to me.
However, I've also noticed that LLM products also tend to reinforce your biases. If you dont ask it to critique you or push back, it often tells you what a great job you did and how incredible your code is. You see this with people who have gotten into a kind of psychotic feedback loop with ChatGPT and who now believe they can escape the matrix.
I think LLMs are powerful, but only for a handful of use cases. I think the majority of what theyre marketed for right now is techno-solutionism and theres an impending collapse in VC funding for companies that are plugging in chatgpt APIs for everything from insurance claims to medical advice
Then unfortunately you're leaving yourself at a serious disadvantage.
Good for you if you're able to live without a calculator, but frankly the automated tool is faster and leaves you less exhausted so you should be taking advantage of it.
I use it similar to the parent poster when I am working with an unfamiliar API, in that I will ask for simple examples of functionality that I can easily verify are correct and then build upon them quickly.
Also, let me know when your calculator regularly hallucinates. I find it exhausting to have an LLM dump out a "finished" implementation and have to spend more time reviewing it than it would take to complete it myself from scratch.
As a junior I used to think it was ok to spend much less time on the review than the writing, but unless the author has diligently detailed their entire process a good review often takes nearly as long. And unsurprisingly enough working with an AI effectively requires that detail in a format the AI can understand (which often takes longer than just doing it).
Yes, if it isn't your being overpaid in the view of a lot of people. Step out of the way and let an expert use the keyboard.
How can you not read and understand code but spend time writing it? That's bad code in that situation.
Source: try working with assembly and binary objects only which really do require working out what's going on. Code is meant to be human readable remember...
Maybe LLMs make you 10x faster at using boilerplate-heavy things like Shadcn/ui or Tanstack.
...which is still only about half as fast as using a sane ecosystem.
IMO this is why there's so many diverging opinions about the productivity of AI tools.
Plus there are use-cases for LLMs that go beyond augmenting your ability to produce code, especially for learning new technologies. The yield depends on the distribution of tasks you have in your role. For example, if you are in lots of meetings, or have lots of administrative overhead to push code, LLMs will help less. (Although I think applying LLMs to pull request workflow, commit cleanup and reordering, will come soon).
To summarize, LLM agents are not the silver bullet those promoting them suggest they are. The headline is all that was needed.
That aside: I still think complaining about "hallucination" is a pretty big "tell".
To be clear, I did not classify "all the AI-supporters" as being in those three categories, I specifically said the people posting that they are getting 10x improvements thanks to AI.
Can you tell me about what you've done to no longer have any hallucinations? I notice them particularly in a language like Terraform, the LLMs add properties that do not exist. They are less common in languages like Javascript but still happen when you import libraries that are less common (e.g. DrizzleORM).
Your article does not specifically say 10x, but it does say this:
> Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.
> “I’m sipping rocket fuel right now,” a friend tells me. “The folks on my team who aren’t embracing AI? It’s like they’re standing still.” He’s not bullshitting me. He doesn’t work in SFBA. He’s got no reason to lie.
That's not quantifying it specifically enough to say "10x", but it is saying no uncertain terms that AI engineers are moving fast and everyone else is standing still by comparison. Your article was indeed one of the ones I specifically wanted to respond to as the language directly contributed to the anxiety I described here. It made me worry that maybe I was standing still. To me, the engineer you described as sipping rocket fuel is an example both of the "degrees of separation" concept (it confuses me you are pointing to a third party and saying they are trustworthy, why not simply describe your workflow?), and the idea that a quick burst of productivity can feel huge but it just doesn't scale in my experience.
Again, can you tell me about what you've done to no longer have any hallucinations? I'm fully open to learning here. As I stated in the article, I did my best to give full AI agent coding a try, I'm open to being proven wrong and adjusting my approach.
I distinctly did not say that. I said your article was one of the ones that made me feel anxious. And it's one of the ones that spurred me to write this article. I demonstrated how your language implies a massive productivity boost from AI. Does it not? Is this not the entire point of what you wrote? That engineers who aren't using AI are crazy (literally the title) because they are missing out on all this "rocket fuel" productivity? The difference between rocket fuel and standing still has to be a pretty big improvement.
The points I make here still apply, there is not some secret well of super-productivity sitting out in the open that luddites are just too grumpy to pick up and use. Those who feel they have gotten massive productivity boosts are being tricked by occasional, rare boosts in productivity.
You said you solved hallucinations, could you share some of how you did that?
The article in question[0] has the literal tag line:
> My AI Skeptic Friends Are All Nuts
how much saner is someone who isn't nuts to someone who is nuts? 10x saner? What do the specific numbers matter given you're not writing a paper?
You're enjoying the click bait benefits of using strong language and then acting offended when someone calls you out on it. Yes, maybe you didn't literally say "10x" but you said or quoted things in exactly that same ballpark and its worthy of a counter point like the OP has provided. They're both interesting articles with strong opinions that make the world a more interesting place so idk why you're trying to disown the strength with which you wrote your article.
I'm not offended at all. I'm saying: no, I'm not a valid cite for that idea. If the author wants to come back and say "10x developer", a term they used twenty five times in this piece, was just a rhetorical flourish, something they conjured up themselves in their head, that's great! That would resolve this small dispute neatly. Unfortunately: you can't speak for them.
They used it 25 times in their piece and in your piece stated that being interested in "the craft" is something people should do in their own time from now on. Strongly implying, if not outright stating; that the processes and practices we've refined for the past 70 years of software engineering need to move aside for the next hotness that has only been out for 6 months. Sure you never said "10x", but to me it read entirely like you're doing the "10x" dance. It was a good article and it definitely has inspired me to check it out.
However there is a bit of irony in that you're happy to point out my defensiveness as a potential flaw when you're getting hung up on nailing down the "10x" claim with precision. As an enjoyer of both articles I think this one is a fair retort to yours, so I think it a little disappointing to get distracted by the specifics.
If only we could accurately measure 1x developer productivity, I imagine the truth might be a lot clearer.
I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
You can keep being mad at me for not providing a detailed target list, I said several times that that's not what the point of this is. You can keep refusing to actually elaborate on how you use AI day to day and solve its problems. That's fine. I don't care. I care a lot more to talk about the people who are actually engaging with me (such as your friend) and helping me to understand what they are doing. Right now, if you're going to keep not actually contributing to the conversation, you're just kinda being a salty guy with an almost unfathomable 408,000 karma going through every HN thread every single day and making hot takes.
I _never_ made the claim that you could call that 10x productivity improvement. I’m hesitant to categorize productivity in software in numeric terms as it’s such a nuanced concept.
But I’ll stand by my impression that a developer using ai tools will generate code at a perceptibly faster pace than one who isn’t.
I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.
I no longer feel like my peers are standing still, because they’ve nearly uniformly adopted ai tools. And again, you rightly point out, there isn’t much of a learning curve. If you could develop before them you can figure out how to improve with them. I found it easier than learning vim.
As for hallucinations I don’t experience them effectively _ever_. And I do let agents mess with terraform code (in code bases where I can prevent state manipulation or infrastructure changes outside of the agents control).
I don’t have any hints on how. I’m using a pretty vanilla Claude code setup. But im not sure how an agent that can write and run compile/test loops could hallucinate.
And this is the problem.
Masterful developers are the ones you pay to reduce lines of code, not create them.
Perhaps, start from the assumption that I have in fact spent a fair bit of time doing this job at a high level. Where does that mental exercise take you with regard to your own position on ai tools.
In fact, you don’t have to assume I’m qualified to speak on the subject. Your retort assumes that _everyone_ who gets improvement is bad at this. Assume any random proponent isn’t.
One of the most valuable qualities of humans is laziness.
We're constantly seeking efficiency gains, because who wants to carry buckets of water, or take laundry down to the river?
Skilled developers excel at this. They are "lazy" when they code - they plan for the future, they construct code in a way that will make their life better, and easier.
LLMs don't have this motivation. They will gleefully spit out 1000 lines of code when 10 will do.
It's a fundamental flaw.
> I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.
I'm a bit confused by this. There is work that apparently is unlocking big productivity boosts but was somehow not justified before? Are you referring to places like my ESLint rule example, where eliminating the startup costs of learning how to write one allows you to do things you wouldn't have previously bothered with? If so, I feel like I covered this pretty well in the article and we probably largely agree on the value that productivity boost. My point is still stands that that doesn't scale. If this is not what you mean, feel free to correct me.
Appreciate your thoughts on hallucinations. My guess is the difference between what we're experiencing is that in your code hallucinations are still happening but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops and can't figure out how to solve the problem, or it "solves" it by deleting the tests or something like that. I've seen videos and viewed open source AI PRs which end up in similar loops as to what I've experienced, so I think what I see is common.
Perhaps that's an indication of that we're trying to solve different problems with agents, or using different languages/libraries, and that explains the divergence of experiences. Either way, I still contend that this kind of productivity boost is likely going to be hard to scale and will get tougher to realize as time goes on. If you keep seeing it, I'd really love to hear more about your methods to see what I'm missing. One thing that has been frustrating me is that people rarely share their workflows after makign big claims. This is unlike previous hype cycles where people would share descriptions of exactly what they did ("we rewrote in Rust, here's how we did it", etc.) Feel free to email me at the address in my about page[1] or send me a request on LinkedIn or whatever. I'm being 100% genuine that I'd love to learn from you!
This maybe a definition problem then. I don’t think “the agent did a dumb thing that it can’t reason out of” is a hallucination. To me a hallucination is a pretty specific failure mode, it invents something that doesn’t exist. Models still do that for me but the build test loop sets them aright on that nearly perfectly. So I guess the model is still hallucinating but the agent isn’t so the output is unimpacted. So I don’t care.
For the agent is dumb scenario, I aggressively delete and reprompt. This is something I’ve actually gotten much better at with time and experience, both so it doesn’t happen often and I can course correct quickly. I find it works nearly as well for teaching me about the problem domain as my own mistakes do but is much faster to get to.
But if I were going to be pithy. Aggressively deleting work output from an agent is part of their value proposition. They don’t get offended and they don’t need explanations why. Of course they don’t learn well either, that’s on you.
Deleting and re-prompting is fine. I do that too. But even one cycle of that often means the whole prompting exercise takes me longer than if I just wrote the code myself.
A lot of the advantage is that it can make forward progress when I can’t. I can check to see if an agent is stuck, and sometimes reprompt it, in the downtime between meetings or after lunch before I start whatever deep thinking session I need to do. That’s pure time recovered for me. I wouldn’t have finished _any_ work with that time previously.
I don’t need to optimize my time around babysitting the agent. I can do that in the margins. Watching the agents is low context work. That adds the capability to generate working solutions during times that was previously barred from that.
Either way, I'm happy that you are getting so much out of the tools. Perhaps I need to prompt harder, or the codebase I work on has just deviated too much from the stuff the LLMs like and simply isn't a good candidate. Either way, appreciate talking to you!
Good luck ever getting that. I've asked that about a dozen times on here from people making these claims and have never received a response. And I'm genuinely curious as well, so I will continue asking.
What people aren't doing is proving to you that their workflows work as well as they say they do. You want proof, you can DM people for their rate card and see what that costs.
> As of March, 2025, this library is very new, prerelease software.
I'm not looking for personal proof that their workflows work as well as they say they do.
I just want an example of a project in production with active users depending on the service for business functions that has been written 1.5/2/5/10/whatever x faster than it otherwise would have without AI.
Anyone can vibe code a side project with 10 users or a demo meant to generate hype/sales interest. But I want someone to actually have put their money where their mouth is and give an example of a project that would have legal, security, or monetary consequences if bad code was put in production. Because those are the types of projects that matter to me when trying to evaluate people's claims (since those are what my paycheck actually depends on).
Do you have any examples like that?
At some point you have to accept that no amount of proof will convince someone that refuses to be swayed. It's very frustrating because, while these are wonderful tools already, its clear that the biggest thing that makes a positive difference is people using and improving them. They're still in relative infancy.
I want to have the kind of conversations we had back at the beginning of web development, when people were delighted at what was possible despite everything being relatively awful.
Since my day job is creating systems that need to be operational and predictable for paying clients - examples of front end mockups, demos, apps with no users, etc don't really matter that much at the end of the day. It's like the difference between being a great speaker in a group of 3 friends vs standing up in front of a 30 person audience with your job on the line.
If you have some examples, I'd love to hear about them because I am genuinely curious.
I rolled out a PR that was a one shot change to our fundamental storage layer on our hot path yesterday. This was part of a large codebase and that file has existed for four years. It hadn’t been touched in 2. I literally didn’t touch a text editor on that change.
I have first hand experience watching devs do this with payment processing code that handles over a billion dollars on a given day.
When you say you didn't touch a text editor, do you mean you didn't review the code change or did you just look at the diff in the terminal/git?
Because I was the instigator of that change a second code owner was required to approve the PR as well. That PR didn't require any changes, which is uncommon but not particularly rare.
It is _common_ for me to only give feedback to the agents via the GitHub gui the same way I do humans. Occasionally I have to pull the PR down locally and use the full powers of my dev environment to review but I don't think that is any more common than with people. If anything its less common because of the tasks the agents get typically they either do well or I kill the PR without much review.
I spent probably a day building prompts and tests and getting an example of failing behavior in Python, and then I wrote pseudocode and had it implement and write comprehensive unit tests in rust. About three passes and manual review of every line. I also have an MCP that calls out to O3 as a second opinion code review and passes it back in
Very fun stuff
1. Would have legal, security, or monetary consequences if bad code was put in production
2. Was developed using an AI/LLM/Agent/etc that made the development many times faster than it otherwise would have (as so many people claim)
I would love to hear an example where "I used Claude to develop this hosting/ecommerce/analytics/inventory management service that is used in production by 50 paying companies. Using an LLM we deployed the project in 4 week where it would normally take us 4 months." Or "We updated an out of date code base for a client in half the time it would normally take and have not seen any issues since launch"
At the end of the day I code to get paid. And it would really help to be able to point to actual cases where both money and negative consequences of failure are on the line.
So if you have any examples please share. But the more people deflect the more skeptical I get about their claims.
I mean it's pretty simple - there are a lot of big claims that I read but very few tangible examples that people share where the project has consequences for failure. Someone else replied with some helpful examples in another thread. If you want to add another one feel free, if not that's cool too.
That code tptacek linked you to? It's part of our (Cloudflare's) MCP framework. Which means all of the companies mentioned in this blog post are using this code in production today: https://blog.cloudflare.com/mcp-demo-day/
There you go. This is what you are looking for. Why are you refusing to believe it?
(OK fine. I guess I should probably update the readme to remove that "prerelease" line.)
I never look at my own readmes so they tend to get outdated. :/
Fixing: https://github.com/cloudflare/workers-oauth-provider/pull/59
That seemed to me be to be the author's point.
His article resonated with me. After 30 years of development and dealing with hype cycles, offshoring, no-code "platforms", endless framework churn (this next version will make everything better!), coder tribes ("if you don't do typescript, you're incompetent and should be fired"), endless bickering, improper tech adopting following the FANGs (your startup with 0 users needs kubernetes?) and a gazillion other annoyances we're all familiar with, this AI stuff might be the thing that makes me retire.
To be clear: it's not AI that I have a problem with. I'm actually deeply interested in it and actively researching it from a math's up approach.
I'm also a big believer in it, I've implemented it in a few different projects that have had remarkable efficiency gains for my users, things like automatically extracting values from a PDF to create a structured record. It is a wonderful way to eliminate a whole class of drudgery based tasks.
No, the thing that has me on the verge of throwing in the towel is the wholesale rush towards devaluing human expertise.
I'm not just talking about developers, I'm talking about healthcare providers, artists, lawyers, etc...
Highly skilled professionals that have, in some cases, spent their entire lives developing mastery of their craft. They demand a compensation rate commensurate to that value, and in response society gleefully says "meh, I think you can be replaced with this gizmo for a fraction of the cost."
It's an insult. It would be one thing if it were true - my objection could safely be dismissed as the grumbling of a buggy whip manufacturer, however this is objectively, measurably wrong.
Most of the energy of the people pushing the AI hype goes towards obscuring this. When objective reality is presented to them in irrefutable ways, the response is inevitably: "but the next version will!"
It won't. Not with the current approach. The stochastic parrot will never learn to think.
That doesn't mean it's not useful. It demonstrably is, it's an incredibly valuable tool for entire classes of problems, but using it as a cheap replacement for skilled professionals is madness.
What will the world be left with when we drive those professionals out?
Do you want an AI deciding your healthcare? Do you want a codebase that you've invested your life savings into written by an AI that can't think?
How will we innovate? Who will be able to do fundamental research and create new things? Why would you bother going into the profession at all? So we're left with AIs training on increasingly polluted data, and relying on them to push us forward. It's a farce.
I've been seriously considering hanging up my spurs and munching popcorn through the inevitable chaos that will come if we don't course correct.
The conversation around LLMs is so polarized. Either they’re dismissed as entirely useless, or they’re framed as an imminent replacement for software developers altogether.
Hallucinations are worth talking about! Just yesterday, for example, Claude 4 Sonnet confidently told me Godbolt was wrong wrt how clang would compile something (it wasn’t). That doesn’t mean I didn’t benefit heavily from the session, just that it’s not a replacement for your own critical thinking.
Like any transformative tool, LLMs can offer a major productivity boost but only if the user can be realistic about the outcome. Hallucinations are real and a reason to be skeptical about what you get back; they don’t make LLMs useless.
To be clear, I’m not suggesting you specifically are blind to this fact. But sometimes it’s warranted to complain about hallucinations!
Anyway, I still see hallucinations in all languages, even javascript, attempting to use libraries or APIs that do not exist. Could you elaborate on how you have solved this problem?
Gemini CLI (it's free and I'm cheap) will run the build process after making changes. If an error occurs, it will interpret it and fix it. That will take care of it using functions that don't exist.
I can get stuck in a loop, but in general it'll get somewhere.
It's a pretty obvious rhetorical tactic: everybody associates "hallucination" with something distinctively weird and bad that LLMs do. Fair enough! But then they smuggle more meaning into the word, so that any time an LLM produces anything imperfect, it has "hallucinated". No. "Hallucination" means that an LLM has produced code that calls into nonexistent APIs. Compilers can and do in fact foreclose on that problem.
If, according to you, LLMs are so good at avoiding hallucinations these days, then maybe we should ask an LLM what hallucinations are. Claude, "in the context of generative AI, what is a hallucination?"
Claude responds with a much broader definition of the term than you have imagined -- one that matches my experiences with the term. (It also seemingly matches many other people's experiences; even you admit that "everybody" associates hallucination with imperfection or inaccuracy.)
Claude's full response:
"In generative AI, a hallucination refers to when an AI model generates information that appears plausible and confident but is actually incorrect, fabricated, or not grounded in its training data or the provided context.
"There are several types of hallucinations:
"Factual hallucinations - The model states false information as if it were true, such as claiming a historical event happened on the wrong date or attributing a quote to the wrong person.
"Source hallucinations - The model cites non-existent sources, papers, or references that sound legitimate but don't actually exist.
"Contextual hallucinations - The model generates content that contradicts or ignores information provided in the conversation or prompt.
"Logical hallucinations - The model makes reasoning errors or draws conclusions that don't follow from the premises.
"Hallucinations occur because language models are trained to predict the most likely next words based on patterns in their training data, rather than to verify factual accuracy. They can generate very convincing-sounding text even when "filling in gaps" with invented information.
"This is why it's important to verify information from AI systems, especially for factual claims, citations, or when accuracy is critical. Many AI systems now include warnings about this limitation and encourage users to double-check important information from authoritative sources."
Right across this thread we have the author of the post saying that when they said "hallucinate", they meant that if they watched they could see their async agent getting caught in loops trying to call nonexistent APIs, failing, and trying again. And? The point isn't that foundation models themselves don't hallucinate; it's that agent systems don't hand off code with hallucinations in it, because they compile before they hand the code off.
Usually, such a loop just works. In the cases where it doesn't, often it's because the LLM decided that it would be convenient if some method existed, and therefore that method exists, and then the LLM tries to call that method and fails in the linting step, decides that it is the linter that is wrong, and changes the linter configuration (or fails in the test step, and updates the tests). If in this loop I automatically revert all test and linter config changes before running tests, the LLM will receive the test output and report that the tests passed, and end the loop if it has control (or get caught in a failure spiral if the scaffold automatically continues until tests pass).
It's not an extremely common failure mode, as it generally only happens when you give the LLM a problem where it's both automatically verifiable and too hard for that LLM. But it does happen, and I do think "hallucination" is an adequate term for the phenomenon (though perhaps "confabulation" would be better).
Aside:
> I can't imagine an agent being given permission to iterate Terraform
Localstack is great and I have absolutely given an LLM free rein over terraform config pointed at localstack. It has generally worked fine and written the same tf I would have written, but much faster.
https://www.windowscentral.com/software-apps/sam-altman-ai-w...
https://brianchristner.io/how-cursor-ai-can-make-developers-...
https://thenewstack.io/the-future-belongs-to-ai-augmented-10...
And I think that sentence is a pretty big tell, so ...
The amount of product ideation, story point negotiation, bugfixing, code
review, waiting for deployments, testing, and QA in that go into what was
traditionally 3 months of work is now getting done in 7 work days? For that
to happen each and every one of these bottlenecks has to also seen have 10x
productivity gains.
For Terraform, specifically, Claude 4 can get thrown into infinite recursive loops trying to solve certain issues within the bounds of the language. Claude still tries to add completely invalid procedures into things like templates.
It does seem to work a bit better for standard application programming tasks.
I wonder if that's all it is, or if the lack of context you mention is a more fundamental issue.
It's like discussing in a gaming guild how to reach the next level. It isn't real.
Internally we expected 15%-25%. A big-3 consultancy told senior leadership "35%-50%" (and then tried to upsell an AI Adoption project). And indeed we are seeing 15%-35% depending on which part of the org you look and how you measure the gains.
It's not a ground-breaking app, its CRUD and background jobs and CSV/XLSX exports and reporting, but I found that I was able to "wireframe" with real code and thus come up with unanswered questions, new requirements, etc. extremely early in the project.
Does that make me a 10x engineer? Idk. If I wasn't confident working with CC, I would have pushed back on the project in the first place unless management was willing to devote significant resources to this. I.e. "is this really a P1 project or just a nice to have?" If these tools didn't exist I would have written spec's and excalidraw or Sketch/Figma wireframes that would have taken me at least the same amount of time or more, but there'd be less functional code for my team to use as a resource.
It reads like this project would have taken your company 9 weeks before, and now will take the company 9 weeks.
Except it also blurs the lines and sets incorrect expectations.
Management often see code being developed quickly (without full understanding of the fine line between PoC and production ready) and soon they expect it to be done with CC in 1/2 the time or less.
Figma on the other hand makes it very clear it is not code.
I sort of want to get back to that... it was really good at getting ideas across.
I was surprised with claude code I was able to get a few complex things done that I had anticipated to be a few weeks to uncover, stitch together and get moving.
Instead I pushed Claude to consistently present the correct udnerstanding of the problem, strucutre, approach to solving things, and only after that was OK, was it allowed to propose changes.
True to it's shiny things corpus, it will over complicate things because it hasn't learned that less is more. Maybe that reflects the corpus of the average code.
Looking at how folks are setting up their claude.md and agents can go a long way if you haven't had a chance yet.
I find it impossible to work out who to trust on the subject, given that I'm not working directly with them, so remain entirely on the fence.
What you need is just boring project management. Have a proper spec, architecture and tasks split into manageable chunks with enough information to implement them.
Then you just start watching TV and say "implement github issue #42" to Claude and it'll get on with it.
But if you say "build me facebook" and expect a shippable product, you'll have a bad time.
The problem is that AI needs to be spoon-fed overly detailed dos and donts, and even then the output can't be trusted without carefully checking it. It's easy to reach a point where breaking down the problem into pieces small enough for AI to understand takes more work than just writing the code.
AI may save time when it generates the right thing on the first try, but that's a gamble. The code may need multiple rounds of fixups, or end up needing a manual rewrite anyway, after wasting time and effort on instructing the AI. The ceiling of AI capabilities is very uneven and unpredictable.
Even worse, the AI can confidently generate code that looks superficially correct, but has subtle bugs/omissions/misinterpretations that end up costing way more time and effort than the AI saved. It has uncanny ability to write nicely structured, well-commented code that is just wrong.
[In fact you can sometimes find that 10x bigger diff leads to decreased productivity down the line...]
LLMs make writing code quick, that's it. There's nothing more to this. LLMs aren't solutioning nor are they smart. If you know what you want to build, you can build quick. Not good, quick.
That said, if managers don't care about code quality (because customer's don't care either) then who am I to judge them. I don't care.
I'm on the edge of just blacklisting the word AI from my feed.
When I use Claude Code on my personal projects, it's like it can read my mind. As if my project is coding itself. It's very succinct and consistent. I just write my prompt and then I'm just tapping the enter key; yes, yes, yes, yes.
I also used Claude Code on someone else's code and it was not the same experience. It kept trying to implement dirty hacks to fix stuff but couldn't get very far with that approach. I had to keep reminding it "Please address the root cause" or "No hacks" or "Please take a step back and think harder about this problem." There was a lot of back-and-forth where I had to ask it to undo stuff and I had to step in and manually make certain changes.
I think part of the issue is that LLMs are better at adding complexity than at removing it. When I was working on the bad codebase, the times I had to manually intervene, the solution usually involved deleting some code or CSS. Sometimes the solution was really simple and just a matter of deleting a couple of lines of CSS but it couldn't figure it out no matter how I wrote the prompt or even if I hinted at the solution; it kept trying to solve problems by adding more code on top.
That means that good developers are more productive, and bad developers create more work for everyone else at an very rapid pace.
I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.
https://www.construx.com/blog/productivity-variations-among-...
Depending on the environment, I can imagine the worst devs being net negative.
Thinking about it personally, a 10X label means I'm supposedly the smartest person in the room and that I'm earning 1/10th what I should be. Both of those are huge negatives.
This is particularly true for headlines like this one which stand alone as statements.
Again, appreciate your thoughts, I have a huge amount of respect for your work. I hope you have a good one!
Well, the people who quote from TFA have usually at least read the part they quoted ;)
The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.
Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.
You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.
MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.
Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?
You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
Toy project viability does not connect with making people redundant in the process (ever, really) — at least not for me. Care to elaborate where do you draw the optimism from?
I called it a toy project because I'm not earning money with it - hence it's a toy.
It does have medium complexity with roughly 100k loc though.
And I think I need to repeat myself, because you seem to read something into my comment that I didn't say: the building blocks exist doesn't mean that today's tooling is sufficient for this to play out, today.
I very explicitly set a time horizon of 5 yrs.
I'm gonna pivot to building bomb shelters maybe
Or stockpiling munitions to sell during the troubles
Maybe some kind of protest support saas. Molotov deliveries as a service, you still have to light them and throw them but I guarantee next day delivery and they will be ready to deploy into any data center you want to burn down
What Im trying to say is "companies letting people go in staggering numbers" is a societal failure state not an ideal
I don't buy that. The linked article makes a solid argument for why that's not likely to happen: agentic loop coding tools like Claude Code can speed up the "writing code and getting it working" piece, but the software development lifecycle has so much other work before you get to the "and now we let Claude Code go brrrrrrr" phase.
These are exactly the people that are going to stay, medium term.
Let's explore a fictional example that somewhat resembles my, and I suspect a lot of peoples current dayjob.
A Micro-Service architecture, each team administers 5-10 services and the whole application, which is once again only a small part of the platform as a whole is developed by maybe 100-200 devs. So something like ~200 micro services
The application architects are gonna be completely save in their jobs. And so are the lead devs in each team - at least from my perspective. Anyone else? I suspect MBAs in 5 yrs will not see their value anymore. That's gonna be the vast majority of all devs, that's likely going to cost 50% of the devs their jobs. And middle management will be slimmed down just as quickly, because you suddenly need a lot less managers.
tl;dr: in the future when vibe coding works 100% of the time, logically the only companies that will exist are the ones that have processes that AI can’t do, because all the other parts of the supply chain can all be done in-house
It's conceivable that thats going to happen, eventually. but that'd likely require models a lot more advanced to what we have now.
The agent approach with lead devs administering and merging the code the agents made is feasible with today's models. The missing part is the tooling around the models and the development practices that that standardizes this workflow.
That's what I'd expect to take around 5 yrs to settle.
There are so many flaws in your plan, I have no doubt that "AI" will ruin some companies that try to replace humans with a "tin can". LLMs are being inserted loosey-goosey into too many places by people that don't really understand the liability problems it creates. Because the LLM doesn't think, it doesn't have a job to protect, it doesn't have a family to feed. It can be gamed. It simply won't care.
The flaws in "AI" are already pretty obvious to anyone paying attention. It will only get more obvious the more LLMs get pushed into places they really do not belong.
Thats the key right there. Try to use it in a project that handles PII, needs data to be exact, or has many dependencies/libraries and needs to not break for critical business functions.
This seems to be the current consensus.
A very similar quote from another recent AI article:
One host compares AI chatbots to “a very smart assistant who has a dozen Ph.D.s but is also high on ketamine like 30 percent of the time.”
https://lithub.com/what-happened-when-i-tried-to-replace-mys...
If an AI assistant was the equivalent of “a dozen PhDs” at any of the places I’ve worked you would see an 80-95% productivity reduction by using it.
they are the equivalent.
there is already an 80-95% productivity reduction by just reading about them on Hacker News.
There's the old trope that systems programmers are smarter than applications programmers but SWE-Bench puts the lie to that. Sure, SWE-Bench problems are all in the language of software, applications programmers take badly specified tickets in the language of product managers, testers and end users and have to turn that into the language of SWE-Bench to get things done. I am not that impressed with 65% performance on SWE-Bench because those are not the kind of tickets that I have to resolve at work, but rather at work if I want to use AI to help maintain a large codebase I need to break the work down into that kind of ticket.
I don’t think models are doing that. They certainly can retrieve a huge amount of information that would otherwise only be available to specialists such as people with PhDs… but I’m not convinced the models have the same level of understanding as a human PhD.
It’s easy to test though- the models simply have to write and defend a dissertation!
To my knowledge, this has not yet been done.
https://en.m.wikipedia.org/wiki/Ketamine
Because of its hallucinogenic properties?
The best way to think of chat bot "AI" is as the compendium of human intelligence as recorded in books and online media available to it. It is not intelligent at all on its own and its judgement can't be better than its human sources because it has no biological drive to sythesize and excel. Its best to think of AI as a librarian of human knowledge or an interactive Wikipedia which is designed to seem like an intelligent agent but is actually not.
It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.
Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.
If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.
This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.
The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.
We use a Team plan ($500 /mo), which includes 250 ACUs per month. Each bug or small task consumes anywhere between 1-3 ACUs, and fewer units are consumed if you're more precise with your prompt upfront. A larger prompt will usually use fewer ACUs because follow-up prompts cause Devin to run more checks to validate its work. Since it can run scripts, compilers, linters, etc. in its own VM -- all of that contributes to usage. It can also run E2E tests in a browser instance, and validate UI changes visually.
They recommend most tasks should stay under 5 ACUs before it becomes inefficient. I've managed to give it some fairly complex tasks while staying under that threshold.
So anywhere between $2-6 per task usually.
I'm curious, this is js/ts? Asking because depending on the lang, good old machine refactoring is either amazeballs (Java + IDE) or non-existent (Haskell).
I'm not js/ts so I don't know what the state of machine refactoring is in VS code ... But if it's as good as Java then "a couple of sentences" is quite slow compared to a keystroke or a quick dialog box with completion of symbol names.
It's not always right, but I find it helpful when it finds related changes that I should be making anyway, but may have overlooked.
Another example: selecting a block that I need to wrap (or unwrap) with tedious syntax, say I need to memoize a value with a React `useMemo` hook. I can select the value, open Quick Chat, type "memoize this", and within milliseconds it's correctly wrapped and saved me lots of fiddling on the keyboard. Scale this to hundreds of changes like these over a week, it adds up to valuable time-savings.
Even more powerful: selecting 5, 10, 20 separate values and typing: "memoize all of these" and watching it blast through each one in record time with pinpoint accuracy.
Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.
if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.
It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for
(1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code.
(2) however, the biggest unlock is it makes working on side projects __immensely__ easier. Before AI I was always too tired to spend significant time on side projects. Now, I can see my ideas come to life (albeit with shittier code), with much less mental effort. I also get to improve my AI engineering skills without the constraint of deadlines, data privacy, tool constraints etc..
I hear this take a lot but does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
I know that a whole bunch of people will respond with the exact set of words that will make it show up right away on Google, but that's not the point: I couldn't remember what language it used, or any other detail beyond what I wrote and that it had been shared on Hacker News at some point, and the first couple Google searches returned a million other similar but incorrect things. With an LLM I found it right away.
The training cutoff comes into play here a bit, but 95% of the time I'm fuzzy searching like that I'm happy with projects that have been around for a few years and hence are both more mature and happen to fall into the training data.
Me, typing into a search engine, a few years ago: "Postgres CTE tutorial"
Me, typing into any AI engine, in 2025: "Here is my schema and query; optimize the query using CTEs and anything else you think might improve performance and readability"
This sort of implies you are not reading and deeply understanding your LLM output, doesn't it?
I am pretty strongly against that behavior
That 20 minutes, repeated over and over over the course of a career, is the difference between being a master versus being an amateur
You should value it, even if your employer doesn't.
Your employer would likely churn you into ground beef if there was a financial incentive to, never forget that
If you try it yourself you'll soon find out that the answer is a very obvious yes.
You don't need a paid plan to benefit from that kind of assistance, either.
At this point I am close to deciding to fully boycott it yes
> If you try it yourself you'll soon find out that the answer is a very obvious yes
I have tried plenty over the years, every time a new model releases and the hype cycle fires up again I look in to see if it is any better
I try to use it a couple of weeks, decide it is overrated and stop. Yes it is improving. No it is not good enough for me to trust
How have you found it not to be significantly better for those purposes?
The "not good enough for you to trust" is a strange claim. No matter what source of info you use, outside of official documentation, you have to assess its quality and correctness. LLM output is no different.
Not even remotely
> LLM output is no different
It is different
A search result might take me to the wrong answer but an LLM might just invent nonsense answers
This is a fundamentally different thing and is more difficult to detect imo
> This is a fundamentally different thing and is more difficult to detect imo
99% of the time it's not. You validate and correct/accept like you would any other suggestion.
Please be sure to put this on your CV so I never hire you by mistake.
This can't be a serious question? 5 minutes of testing will prove to you that it's not just better, it's a totally new paradigm. I'm relatively skeptical of AI as a general purpose tool, but in terms of learning and asking questions on well documented areas like programming language spec, APIs etc it's not even close. Google is dead to me in this use case.
Being able to sit down after a long way of work and ask an AI model to implement some bug or feature on something while you relax and _not_ type code is a major boon. It is able to immediately get context and be productive even when you are not.
For 20 a month I can get my stupid tool and utility ideas from "it would be cool if I could..." to actual "works well enough for me" -tools in an evening - while I watch my shows at the same time.
After a day at work I don't have the energy to start digging through, say, OpenWeather's latest 3.0 API and its nuances and how I can refactor my old code to use the new API.
Claude did it in maybe one episode of What We Do in the Shadows :D I have a hook that makes my computer beep when Claude is done or pauses for a question, so I can get back, check what it did and poke it forward.
If I'm using it to remember the syntax or library for something I used to know how to do, it's great.
If I'm using it to explore something I haven't done before, it makes me faster, but sometimes it lies to me. Which was also true of Stack Overflow.
But when I ask it to so something fairly complex on it's own, it usually tips over. I've tried a bunch of tests with a bunch of models, and it never quite gets it right. Sometimes it's minor stuff that I can fix if I bang on it long enough, and sometimes it's a steaming pile that I end up tossing in the garbage.
For example, I've asked it to code me a web-based calculator, or a 3D model of the solar system using WebGL, and none of the models I've tried have been able to do either.
[And to those saying we're using it wrong... well I can't argue with something that's not falsifiable]
I am not allowed to use LLMs at work for work code so I can't tell what claims are real. Just my 80s game reimplementations of Snake and Asteroids.
Who's making these claims?
It didn’t.
Now when I'm designing software there are all sorts of things where I'm much less likely to think "nah, that will take too long to type the code for".
I have found for myself it helps motivate me, resulting in net productivity gain from that alone. Even when it generates bad ideas, it can get me out of a rut and give me a bias towards action. It also keeps me from procrastinating on icky legacy codebases.
The smartest programmer I know is so impressive mainly for two reasons: first, he seems to have just an otherworldly memory and seems to kind of have absolutely every little feature and detail of the programming languages he uses memorized. Second, his real power is really in cognitive ability, or the ability to always quickly and creatively come up with the smartest and most efficient yet elegant and clean solution to any given problem. Of course somewhat opinionated but in a good way. Funnily he often wouldn't know the academic/common name for some algorithm he arrived at but it just happened to be what made sense to him and he arrived at it independently. Like a talented musician with perfect pitch who can't read notation or doesn't know theory yet is 10x more talented than someone who has studied it all.
When I pair program with him, it's evident that the current iteration of AI tools is not as quick or as sharp. You could arrive at similar solutions but you would have to iterate for a very long time. It would actually slow that person down significantly.
However, there is such a big spectrum of ability in this field that I could actually see this increasing for example my productivity by 10x. My background/profession is not in software engineering but when I do it in my free time the perfectionist tendencies make me work very slowly. So for me these AI tools are actually cool for generating the first crappy proof of concepts for my side projects/ideas, just to get something working quickly.
If I'm writing a series of very similar test cases, it's great for spamming them out quickly, but I still need to make sure they're actually right. This is easier to spot errors because I didn't type them out.
It's also decent for writing various bits of boilerplate for list / dict comprehensions, log messages (although they're usually half wrong, but close enough to what I was thinking), time formatting, that kind of thing. All very standard stuff that I've done a million times but I may be a little rusty on. Basically StackOverflow question fodder.
But for anything complex and domain-specific, it's more wrong than it's right.
but the principle is the same: if the human isn’t doing theory-building, then no one is
It helps me being lazy because I have a rough expectation of what the outcome should be - and I can directly spot any corner cases or other issues the AI proposed solution has, and can either prompt it to fix that, or (more often) fix those parts myself.
The bottom 20% may not have enough skill to spot that, and they'll produce superficially working code that'll then break in interesting ways. If you're in an organization that tolerates copy and pasting from stack overflow that might be good enough - otherwise the result is not only useless, but as it provides the illusion of providing complete solution you're also closing the path of training junior developers.
Pretty much all AI attributed firings were doing just that: Get rid of the juniors. That'll catch up with us in a decade or so. I shouldn't complain, though - that's probably a nice earning boost just before retirement for me.
I was watching to learn how other devs are using Claude Code, as my first attempt I pretty quickly ran into a huge mess and was specifically looking for how to debug better with MCP.
The most striking thing is she keeps on having to stop it doing really stupid things. She slightly glosses over those points a little bit by saying things like "I roughly know what this should look like, and that's not quite right" or "I know that's the old way of installing TailwindCSS, I'll just show you how to install Context7", etc.
But in each 10 minute episodes (which have time skips while CC thinks) it happens at least twice. She has to bring her senior dev skills in, and it's only due to her skill that she can spot the problem in seconds flat.
And after watching much of it, though I skipped a few episodes at the end, I'm pretty certain I could have coded the same app quicker than she did without agentic AI, just using the old chat window AIs to bash out the React boilerplate and help me quickly scan the documentation for getting offline. The initial estimate of 18 days the AI came up with in the plan phase would only hold truye if you had to do it "properly".
I'm also certain she could have too.
[1] https://www.youtube.com/watch?v=erKHnjVQD1k
It's worth a watch if you're not doing agentic coding yet. There were points I was impressed with what she got it to do. The TDD section was quite impressive in many ways, though it immediately tried to cheat and she had to tell it to do it properly.
Its like Wordpress all over again but with people even less able to code. There's going to be vast amounts of opportunities for people to get into the industry via this route but its not going to be a very nice route for many of them. Lots of people who understand software even less than c-suite holding the purse-strings.
People keep focusing on general intelligence style capabilities but that is the golden grail. The world could go through multiple revolutions before finding that golden grail, but even before then everything would have changed beyond recognition.
So write an integration over the API docs I just copy-pasted.
But of course that’s ridiculous.
10x is intended to symbolize a multiplier. As Microsoft fired that guy, 10 × 0 is still 0.
1.2x increase
I guess this is still the "caveat" that can keep the hype hopes going. But I've found at a team velocity level, with our teams, where everyone is actively using agentic coding like Claude Code on the daily, we actually didn't see an increase in team velocity yet.
I'm curious to hear anecdotal from other teams, has your team seen velocity increase since it adopted agentic AI?
What will happen is over time this will become the new baseline for developing software.
It will mean we can deliver software faster. Maybe more so than other advances, but it won't fundamentally change the fact that software takes real effort and that effort will not go away, since that effort is much more than just coding this or that function.
I could create a huge list of things that have made developing and deploying quality software easier: linters, static type checkers, code formatters, hot reload, intelligent code completion, distributed version control (i.e., Git), unit testing frameworks, inference schema tools, code from schema, etc. I'm sure others can add dozens of items to that list. And yet there seems to be an unending amount of software to be built, limited only by the people available to build it and an organizations funding to hire those people.
In my personal work, I've found AI-assisted development to make me faster (not sure I have a good estimate for how much faster.) What I've also found is that it makes it much easier to tackle novel problems within an existing solution base. And I believe this is likely to be a big part of the dev productivity gain.
Just an example, lets say we want to use the strangler pattern as part of our modernization approach for a legacy enterprise app that has seen better days. Unless you have some senior devs who are both experienced with that pattern AND experienced with your code base, it can take a lot of trial and error to figure out how to make it work. (As you said, most of our work isn't actually typing code.)
This is where an AI/LLM tool can go to work on understanding the code base and understanding the pattern to create a reference implementation approach and tests. That can save a team of devs many weeks of trial & error (and stress) not to mention guidance on where they will run into roadblocks deep into the code base.
And, in my opinion, this is where a huge portion of the AI-assisted dev savings will come from - not so much writing the code (although that's helpful) but helping devs get to the details of a solution much faster.
It's that googling has always gotten us to generic references and AI gets us those references fit for our solution.
The hardest part of my job is actually understanding the problem space and making sure we're applying the correct solution. Actual coding is probably about 30% of my job.
That means, I'm only looking at something like 30% productivity gain by being 5x as effective at coding.
I'm not sure it is and I'll take it a step further:
Over the course of development, efficiency gains trend towards zero.
AI has a better case for increasing surface area (what an engineer is capable of working on) and effectiveness, but efficiency is a mirage.
This feels exactly right and is what I’ve thought since this all began.
But it also makes me think maybe there are those that A.I. helps 10x, but more because that code input is actually a very large part of their job. Some coders aren’t doing much design or engineering, just assembly.
I don't think I've encountered programmer like that in my own career, but I guess they might exist somewhere!
This article thinks that most people who say 10x productivity are claiming 10x speedup on end-to-end delivering features. If that's indeed what someone is saying, they're most of the time quite simply wrong (or lying).
But I think some people (like me) aren't claiming that. Of course the end to end product process includes a lot more work than just the pure coding aspect, and indeed none of those other parts are getting a 10x speedup right now.
That said, there are a few cases where this 10x end-to-end is possible. E.g. when working alone, especially on new things but not only - you're skipping a lot of this overhead. That's why smaller teams, even solo teams, are suddenly super interesting - because they are getting a bigger speedup comparatively speaking, and possibly enough of one to be able to rival larger teams.
And we're not seeing that at all. The companies whose software I use that did announce big AI initiatives 6 months ago, if they really had gotten 10x productivity gain, that'd be 60 months—5 years—worth of "productivity". And yet somehow all of their software has gotten worse.
- solo projects
- startups with few engineers doing very little intense code review if any at all
- people who don't know how to code themselves.
Nobody else is realistically able to get 10x multipliers. But that doesn't mean you can't get a 1.5-2x multiplier. I'd say even myself at a large company that moves slow have been able to realize this type of multiplier on my work using cursor/claude code. But as mentioned in the article the real bottleneck becomes processes and reviews. These have not gotten any faster - so in real terms time to ship/deliver isn't much different than before.
The only attempt that we should make at minimizing review times is by making them higher priority than development itself. Technically this should already be the case but in my experience almost no engineer outside of really disciplined companies and not in FAANG actually makes reviews a high priority, because unfortunately code reviews are not usually part of someones performance review and slows down your own projects. And usually your project manager couldn't give two shits about someone elses work being slow.
Processes are where we can make the biggest dent. Most companies as they get large have processes that get in the way of forward velocity. AI first companies will minimize anything that slows time to ship. Companies simply utilizing AI and expecting 10x engineers without actually putting in the work to rally around AI as a first class citizen will fall behind.
Overall it feels negligible too me in its current state.
Things like: build a settings system with org, user, and project level settings, and the UI to edit them.
A task like that doesn’t require a lot of thinking and planning, and is well within most developers’ abilities, but it can still take significant time. Maybe you need to create like 10 new files across backend and frontend, choose a couple libraries to help with different aspects, style components for the UI and spend some time getting the UX smooth, make some changes to the webpack config, and so on. None of it is difficult, per se, but it all takes time, and you can run into little problems along the way.
A task like that is like 10-20% planning, and 80-90% going through the motions to implement a lot of unoriginal functionality. In my experience, these kinds of tasks are very common, and the speedup LLMs can bring to them, when prompted well, is pretty dramatic.
This is where I have found LLMs to be most useful. I have never been able to figure out how to get it to write code that isn't a complete unusable disaster zone. But if you throw your problem at it, it can offer great direction in plain English.
I have decades of research, planning, and figuring things out under my belt, though. That may give me an advantage in guiding it just the right way, whereas the junior might not be able to get anything practical from it, and thus that might explain their focus on code generation instead?
One thing that AI has helped me with is finding pesky bugs. I mainly work on numerical simulations. At one point I was stuck for almost a week trying to figure out why my simulation was acting so strange. Finally I pulled up chatgpt, put some of my files into the context and wrote a prompt explaining the strange behavior and what I thought might be happening. In a few seconds it figured out that I had improperly scaled one of my equations. It came down to a couple missing parentheses, and once I fixed it the simulation ran perfectly.
This has happened a few times where AI was easily able to see something I was overlooking. Am I a 10x developer now that I use AI? No... but when used well, AI can have a hugely positive impact on what I am able to get done.
It’s a rubber duck that’s pretty educated and talks back.
You have to change the organization.
- no peer code review, u review the AI output and that’s enough
- devs need authority to change code anywhere in the company. No more team A owns service A and team B owns service B
- every dev and ops person needs to be colocated, no more waiting for timezones
- PMs and engineers are the same role now
Will it work for every company? No , if you are building a pacemaker , don’t use AI . Will things break? Yes sometimes but you can roll back.
Will things be somewhat chaotic? Yes somewhat but what did you think going 10x would feel like?
Now I don't want to sound like a doomsayer but it appears to me that application programming and corresponding software companies are likely to disappear within the next 10 years or so. We're now in a transitional phase were companies who can afford enough AI compute time have an advantage. However, this phase won't last long.
Unless there is a principal block to further enhance AI programming, not just simple functions but whole apps can be created with a prompt. However, this is not where it is going to stop. Soon, there will be no need for apps in the traditional sense. End users will use AI to manipulate and visualize data and operating systems will integrate the AI services needed for this. "Apps" can be created on the fly and are constantly adjusted to the users' needs.
Creating apps will not remain a profitable business. If there is an app X someone likes, they can prompt their AI to create an app with the same features, but perhaps with these or those small changes, and the AI will create it for them, including thorough tests and quality assurance.
Right now, in the transitional phase, senior engineers might feel they are safe because someone has to monitor and check the AI output. But there is no reason why humans would be needed for that step in the long run. It's cheaper to have 3 AIs quality test and improve the outputs of one generating AI. I'm sure many companies are already experimenting with this, and at some point the output of such iterative design procedures will have far less bugs than any code produced by humans. Only safety critical essential features such as operating systems and banking will continue to be supervised by humans, though perhaps mostly for legal reasons.
Although I hope it's not but to me the end of software development seems a logical long-term consequence of current AI development. Perhaps I've missed something, I'd be interested in hearing from people who disagree.
It's ironic because in my great wisdom I chose to quit my day job in academia recently to fulfill my lifelong dream of bootstrapping a software company. I'll see if I can find a niche, maybe some people appreciate hand-crafted software in the future for its quirks and originality...
Consider a fully loaded cost of 200k for an engineer or $16,666 per month. They only have to be >1.012x engineer for the "AI" to be worth it. Of course that $200 dollars per month is probably VC subsidized right now but there is lots of money on the table for <2x improvement.
Or the data showing something else... possibly, a company starts telling engineers to use AI, then RIFs a huge portion, and expects the remaining engineers to pick up the slack. They now claim "we're more efficient!" when they've just asked their employees to work more weekends.
Then it came time to make a change to one of the charts. Team members were asking me questions about it. "How can we make this axis display only for existing data rather than range?" I'm scrolling through code in a screenshare that I absolutely reviewed, I remember doing it, I remember clicking the green arrow in Cursor, but I'm panicking because this doesn't look like code I've ever seen, and I'm seeing gaping mistakes and stupid patterns and a ton of duplicated code. Yeah I reviewed it, but bit by bit, never really all at once. I'd never grocked the entire file. They're asking me questions to which I don't have answers, for code "I'd just written." Man it was embarrassing!
And then to make the change, the AI completely failed at it. Plotly.js's type definitions are super out of date and the Python library is more fleshed out, so the AI started hallucinating things that exist on Python and not in JS - so now I gotta head to the docs anyway. I had to get much more manual, and the autocomplete of cursor was nice while doing so, but sometimes I'd spend more time tab/backspacing after realizing the thing it recommended was actually wrong, than I'd have spent just quickly typing the entire whatever thing.
And just like a hit, now I'm chasing the dragon. I'd love to get that feeling back of entering a new era of programming, where I'm hugely augmented. I'm trying out all the different AI tools, and desperately wishing there was an autocomplete as fast and multi-line and as good as jumping around as Cursor, available in nvim. But they all let me down. Now that I'm paying more attention, I'm realizing the code really isn't good at all. I think it's still very useful to have Claude generate a lot of boilerplate, or come in and make some tedious changes for me, or just write all my tests, but beyond that, I don't know. I think it's improved my productivity maybe 20%, all things considered. Still amazing! I just wish it was good as I thought it was when I first tried it.
The part I disagree about: I've never worked at a company that has a 3 month cycle from code-written to code-review-complete. That sounds insane and dysfunctional. AI won't fix an organization like that
The better argument is that Software Engineers spend a lot of time doing things that aren't writing code and arent being accelerated by any AI code assistant
But if your system records internal state in english and generates code while handling requests, complex systems can become much simpler. You can build things that were impossible before
Where CC has excelled:
- New well-defined feature built upon existing conventions (10x+ boost)
- Performing similar mid-level changes across multiple files (10x+ boost)
- Quickly performing large refactors or architecture changes (10x+ boost)
- Performing analysis of existing codebases to help build my personal understanding (x10+ boost)
- Correctly configuring UI layouts (makes sense: this is still pattern-matching, but the required patterns can get more complex than a lot of humans can quickly intuit)
Where CC has floundered or wasted time:
- Anything involving temporal glitches in UI or logic. The feedback loop just can't be accomplished yet with normal tooling.
- Fixing state issues in general. Again, the feedback loop is too immature for CC to even understand what to fix unless your tooling or descriptive ability is stellar.
- Solving classes of smallish problems that require a lot of trial-and-error, aren't covered by automated tests, or require a steady flow of subjective feedback. Sometimes it's just not worth setting up the context for CC to succeed.
- Adhering to unusual or poorly-documented coding/architecture conventions. It's going to fight you the whole way, because it's been trained on conventional approaches.
Productivity hacks:
- These agents are automated, meaning you can literally have work being performed in parallel. Actual multitasking. This is actually more mentally exhausting, but I've seen my perceived productivity gains increase due to having 2+ projects going at once. CC may not beat a single engineer for many tasks, but it can literally do multiple things at once. I think this is where the real potential comes into play. Monitoring multiple projects and maintaining your own human mental context for each? That's a real challenge. - Invest in good context documents as early as possible, and don't hesitate to ask CC to insert new info and insights in its documents as you go. This is how you can help CC "learn" from its mistakes: document the right way and the wrong way when a mistake occurs.
Background: I'm a 16yoe senior fullstack engineer at a startup, working with React/Remix, native iOS (UIKit), native Android (Jetpack Compose), backends in TypeScript/Node, and lots of GraphQL and Postgres. I've also had success using Claude Code to generate Elixir code for my personal projects.
The full year is just the more of the above.
It makes everyone “produce more code” but your worst dev producing 10X the code is not 10X more productive.
There’s also a bit of a dunning Kruger effect where the most careless people are the most likely to YOLO thousands of lines of vibecode into prod. While a more meticulous engineer might take a lot more time to read the changes, figure out where the AI is wrong, and remove unnecessary code. But the second engineer would be seen as much much less productive than the first in this case
Now for senior developers, AI has been tremendous. Example: I'm building a project where I hit the backend in liveview, and internally I have to make N requests to different APIs in parallel and present the results back. My initial version to test the idea had no loading state, waiting for all requests to finish before sending back.
I knew that I could use Phoenix Channels, and Elixir Tasks, and websockets to push the results as they came in. But I didn't want to write all that code. I could already taste it and explain it. Why couldn't I just snap my fingers?
Well AI did just that. I wrote what I wanted in depth, and bada bing, the solution I would have written is there.
Vibe coders are not gonna make it.
Engineers are having the time of their lives. It's freeing!
So true, a lot of value and gains are had when tech leads can effectively negotiate and creatively offer less costly solutions to all aspects of a feature.
One of our EMs did this this week. He did a lot of homework: spoke to quite a few experts and pretty soon realised this task was too hard for his team to ever accomplish, if it was even possible. Lobbied the PM and, a VP and a C-level, but managed to stop a lot of wasted work from being done.
Sometimes the most important language to know as a dev is English*
s/English/YourLanguageOfChoice/g
What's your experience? And what do the "kids" use these days to indicate alternative options (as above — though for that, I use bash {} syntax too) or to signal "I changed my mind" or "let me fix that for you"?
They could have just said "the most important language [...] is spoken language".
The co-founder of a company I worked at was one for a period (he is not a 10xer anymore - I don't think someone can maintain that output forever with life constraints). He literally wrote the bulk of a multi-million line system, most of the code is still running today without much change and powering a unicorn level business.
I literally wouldn't believe it, but I was there for it when it happened.
Ran into one more who I thought might be one, but he left the company too early to really tell.
I don't think AI is going to produce any 10x engineers because what made that co-founder so great was he had some kind of sixth sense for architecture, that for most of us mortals we need to take more time or learn by trial and error how to do. For him, he was just writing code and writing code and it came out right on the first try, so to speak. Truly something unique. AI can produce well specified code, but it can't do the specifying very well today, and it can't reason about large architectures and keep that reasoning in its context through the implementation of hundreds of features.
I've been a bit of that engineer (though not at the same scale), like say wrote 70% of a 50k+ loc greenfield service. But I'm not sure it really means I'm 10x. Sometime this comes from just being the person allowed to do it, that doesn't get questioned in it's design choices, decisions of how to structure and write the code, that doesn't get any push back on having massive PRs where others almost just paper stamp it.
And you can really only do this at the greenfield phase, when things are not yet in production, and there's so much baseline stuff that's needed in the code.
But it ends up being the 80/20 rule, I did the 80% of the work in 20% of the time it'll take to go to prod, because that 20% remaining will eat up 80% of the time.
Junior: 100 total lines of code a day
Senior: 10,000 total lines of code a day
Guru: -100 total lines of code a day
I guess this leaves open question about the distribution of productivity across programmers and the difference between the min and the mean. Is productivity normally distributed? Log normal? Some kind of power law?
What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
Using AI will change nothing in this context.
So always aim for outcomes, not output :)
At my company, we did promote people quickly enough that they are now close to double their salaries when they started a year or so ago, due to their added value as engineers in the team. It gets tougher as they get into senior roles, but even there, there's quite a bit of room for differentiation.
Additionally, since this is a market, you should not even expect to be paid twice for 2x value provided — then it makes no difference to a company if they get two 1x engineers instead, and you are really not that special if you are double the cost. So really, the "fair" value is somewhere in between: 1.5x to equally reward both parties, or leaning one way or the other :)
This has never been the case in any company I've ever worked at. Even if you can finish your day's work in, say, 4 hours, you can't just dip out for the other 4 hours of the day.
Managers and teammates expect you to be available at the drop of a hat for meetings, incidents, random questions, "emergencies", etc.
Most jobs I've worked at eventually devolve into something like "Well, I've finished what I wanted to finish today. I could either stare at my monitor for the rest of the day waiting for something to happen, or I could go find some other work to do. Guess I'll go find some other work to do since that's slightly less miserable".
You also have to delicately "hide" the fact that you can finish your work significantly faster than expected. Otherwise the expectations of you change and you just get assigned more work to do.
Literally unwinnable scenarios. Only way to succeed is to just sit your ass in the chair. Almost no manager actually cares about your actual output - they all care about presentation and appearances.
Uh, no?
I had a task to do a semi-complex UI addition, the whole week was allocated for that.
I sicked the corp approved Github Copilot with 4o and Claude 3.7 at it and it was done in an afternoon. It's ~95% functionally complete, but ugly as sin. (The model didn't understand our specific Tailwind classes)
Now I can spend the rest of the week on polish.
Not really? That's defining productivity as latency, but it's at least as valid to define productivity as throughput.
And then all the examples that are just about time spent waiting become irrelevant. When blocked waiting on something external, you just work on other things.
My point around waiting for things like code review is that it creates a natural time floor, the context switching takes time and slows down other work. If you have 10x as much stuff to get reviewed, all the time loss to context switching is multiplied by 10x.
I believe his original thesis remains true: "There is no single development, in either technology or management technique, which by itself promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity."
Over the years this has been misrepresented or misinterpreted to suggest it's false but it sure feels like "Agentic Coding" is a single development promising a massive multiplier in improvement that once again is, another accidental tool that can be helpful but is definitely not a silver bullet.
The problem is...
1. there is an enormous investment in $$$ produces a too big to fail scenario where extravagant claims will be made regardless
2. leadership has made big promises around productivity and velocity for eng
the end result of this is going to be a lot of squinting at the problem, ignoring reality and declaring victory. These AI tools are very useful in automating chore and grunt tasks.
Context: EACL is an embedded ReBAC authorization library based on SpiceDB-compatible*, built in Clojure and backed by Datomic.
Perfectly put. I've been using a lot of AI for shell scripting. Granted I should probably have better knowledge of shell but frankly I think it's a terrible language and only use it because it enjoys wide system support and is important for pipelining. I prefer TS (and will try to write scripts and such in it if I can) and for that I don't use AI almost at all.
In some cases, LLMs can be a real speed boost. Most of the time, that has to do with writing boilerplate and prototyping a new "thing" I want to try out.
Inevitably, if I like the prototype, I end up re-writing large swaths of it to make it even half way productizable. Fundamentally, LLMs are bad at keeping an end goal in mind while working on a specific feature and it's terrible at holding enough context to avoid code duplication and spaghetti.
I'd like to see them get better and better, but they really are limited to whatever code they can ingest on the internet. A LOT of important code is just not open for consumption in sufficient quantities for it to learn. For this reason, I suspect LLMs will really never be all that good for non-web based engineering. Wheres all the training data gonna come from?
Any codebase that's difficult for me to read would be way too large to use an LLM on.
Where I see major productivity gains are on small, tech debt like tasks, that I could not justify before. Things that I can start with an async agent, let sit until I’ve got some downtime on my main tasks (the ones that involve all that coordination). Then I can take the time to clean them up and shepherd them through.
The very best case of these are things where I can move a class of problem from manually verified to automatically verified as that kick starts a virtuous cycle that makes the ai system more productive.
But many of them are boring refactors that are just beyond what a traditional refactoring tool can do.
I doubt that's the commonly desired outcome, but it is what I want! If AI gets too expensive overnight (say 100x), then I'll be able to keep chugging along. I would miss it (claude-code), but I'm betting that by then a second tier AI would fit my process nearly as well.
I think the same class of programmers that yak shave about their editor, will also yak shave about their AI. For me, it's just augmenting how I like to work, which is probably different than most other people like to work. IMO just make it fit your personal work style... although I guess that's problematic for a large team... look, even more reasons not to have a large team!
Much of production software engineering is writing boiler plate, building out test matrices and harnesses, scaffolding structure. And often, it’s for very similarly shaped problems at their core regardless of the company, organization, or product.
AI lets me get a lot of that out of the way and focus on more interesting work.
One might argue that’s a failure of tools or even my own technique. That might be true, but it doesn’t change the fact that I’m less bored than I used to be.
> Oh, and this exact argument works in reverse. If you feel good doing AI coding, just do it. If you feel so excited that you code more than ever before, that's awesome. I want everyone to feel that way, regardless of how they get there.
I enjoyed the article, fwiw. Twitter was insufferable before Elon bought it, but the AI bro scene is just...wow. An entire scene who only communicate in histrionics.
Not my experience.
You can instruct Claude Code to respect standards and practices of your codebase.
In fact I noticed that Claude Code has forced me to make few genuinely important things like documenting more, writing more E2E tests and tracking architectural and style changes.
Not only I am forcing myself to a consistent (and well thought styling), but I also need it later to feed it to the AI itself.
Seriously, I don't want to offend no one, but if you believe that AI doesn't make you more productive you've got skill issues in adopting and using new tools at what they are good at.
Ingesting legacy code, understanding it, looking at potential ways to rework it, and then putting in place the axioms to first work with it yourself, and then for others to join in has been able to get down from months to weeks and days.
Developing green field from scratch, statically typed languages seem to work a bit better than not.
Putting enough information around the requirements and how to structure undertake them is critical or it can turn into cowboy coding pretty easily, or default AI is leaning towards the average of it's corpus, not the best. That's where the developer comes in.
Here's what the 5x to 10x flow looks like:
1. Plan out the tasks (maybe with the help of AI)
2. Open a Git worktree, launch Claude Code in the worktree, give it the task, let it work. It gets instructions to push to a Github pull request when it's done. Claude gets to work. It has access to a whole bunch of local tools, test suites, and lots of documentation.
3. While that terminal is running, I go start more tasks. Ideally there are 3 to 5 tasks running at a time.
4. Periodically check on the tabs to make sure they're not stuck or lost their minds.
5. Finally, review the finished pull requests and merge them when they are ready. If they have issues then go back to the related chat and tell it to work on it some more.
With that flow it's reasonable to merge 10 to 20 pull requests every day. I'm sure someone will respond "oh just because there are a lot of pull requests, doesn't mean you are productive!" I don't know how to prove to you that the PRs are productive other than just say that they are each basically equivalent to what one human does in one small PR.
A few notes about the flow:
- For the AI to work independently, it really needs tasks that are easy to medium difficulty. There are definitely 'hard' tasks that need a lot of human attention in order to get done successfully.
- This does take a lot of initial investment in tooling and documentation. Basically every "best practice" or code pattern that you want to use use in the project must be written down. And the tests must be as extensive as possible.
Anyway the linked article talks about the time it takes to review pull requests. I don't think it needs to take that long, because you can automate a lot..
- Code style issues are fully automated by the linter.
- Other checks like unit test coverage can be checked in the PR as well.
- When you have a ton of automated tests that are checked in the PR, that also reduces how much you need to worry about as a code reviewer.
With all those checks in place, I think it can pretty fast to review a PR. As the human you just need to scan for really bad code patterns, and maybe zoom in on highly critical areas, but most of the code can be eyeballed pretty quickly.
Because I might just not have a great imagination, but it's very hard for me to see how you basically automate the review process on anything that is business critical or has legal risks.
On the security layer, I wrote that code mostly by hand, with some 'pair programming' with Claude to get the Oauth handling working.
When I have the agent working on tasks independently, it's usually working on feature-specific business logic in the API and frontend. For that work it has a lot of standard helper functions to read/write data for the current authenticated user. With that scaffolding it's harder (not impossible) for the bot to mess up.
It's definitely a concern though, I've been brainstorming some creative ways to add extra tests and more auditing to look out for security issues. Overall I think the key for extremely fast development is to have an extremely good testing strategy.
I think where I've become very hesitant is a lot of the programs that I touch has customer data belonging to clients with pretty hard-nosed legal teams. So it's quite difficult for me to imagine not reviewing the production code by hand.
The key isn't how much you can speed up the scalable/parallelizable portions, it's how limited you are by the non-scalable/parallelizable aspects.
So it's not like I'm delivering features in one day that would have taken two weeks. But I am delivering features in two weeks that have a bunch of extra niceties attached to them. Reality being what it is, we often release things before they are perfect. Now things are a bit closer to perfect when they are released.
I hope some of that extra work that's done reduces future bug-finding sessions.
What I'm about to discuss is about me, not you. I have no idea what kind of systems you build, what your codebase looks like, use case, business requirements etc. etc. etc. So it is possible writing tests is a great application for LLMs for you.
In my day to day work... I wish that developers where I work would stop using LLMs to write tests.
The most typical problem with LLM-generated tests on the codebase where I work is that the test code is almost extremely tightly coupled to the implementation code. Heavy use of test spies is a common anti-pattern. The result is a test suite that is testing implementation details, rather than "user-facing" behaviour (user could be a code-level consumer of the thing you are testing).
The problem with that type of a test is that is a fragile test. One of the key benefits of automated tests is that they give you a safety net to refactor implementation to your heart's content without fear of having broken something. If you change an implementation detail, and the "user-facing" behaviour does not change, your tests should pass. When tests are tightly coupled to implementation, they will fail and now your tests, in the worst of cases, might actually be creating negative value for you ... since you every code change now requires you to keep tests up to date even when what you actually care about testing "is this thing working correctly?" hasn't changed.
The root of this problem isn't even the LLM, it's just that the LLM makes it a million times worse. Developers often feel like writing tests are a menial chore that needs to be done after the fact to satisfy code coverage policy. Few developers, at many organizations, have ever truly worked TDD or learned testing best practices, how to write easy to test implementation code etc.
That problem statement is:
- Not all tests add value
- Some tests can even create dis-value (ex: slow to run, thus increasing CI bills for the business without actually testing anything important)
- Few developers understand what good automated testing looks like
- Developers are incentivized to write tests just to satisfy code coverage metrics
- Therefore writing tests is a chore and an afterthought
- So they reach for an LLM because it solves what they perceive as a problem
- The tests run and pass, and they are completely oblivious to the anti-patterns just introduced and the problems those will create over time
- The LLMs are generating hundreds, if not thousands, of these problems
So yeah, the problem is 100% the developers who don't understand how to evaluate the output of a tool that they are using.
But unlike functional code, these tests are - in many cases - arguably creating disvalue for the business. At least the functional code is a) more likely to be reviewed and code quality problems addressed and b) even if not, it's still providing features for the end user and thus adding some value.
Forcing the discussion of invariants, and property-based testing -- seems to improve on the issues you're mentioning (when using e.g. Opus 4), especially when combined with the "use the public API" or interface abstractions.
For much of what I build with AI, I'm not saving two weeks. I'm saving infinity weeks — if LLMs didn't exist I would have never built this tool in the first place.
So two groups are talking past one another. Someone has a completely new idea, starts with nothing and vibe codes a barely working MVP. They claim they were able to go from 0 to MVP ~10x faster than if they had written the code themselves.
Then some seasoned programmer hears that claim, scoffs and takes the agent into a legacy code base. They run `/init` and make 0 changes to the auto-generated CLAUDE.md. They add no additional context files or rules about the project. They ask completely unstructured questions and prompt the first thing that comes into their minds. After 1 or 2 days of getting terrible results they don't change their usage or try to find a better way, they instead write a long blog post claiming AI hype is unfounded.
What they ignore is that even the maximalists are stating: 30%-50% improvement on legacy code bases. And that is if you use the tool well.
This author gets terrible results and then says: "Dark warnings that if I didn't start using AI now I'd be hopelessly behind proved unfounded. Using AI to code is not hard to learn." How sure is the author that they actually learned to use it? "A competent engineer will figure this stuff out in less than a week of moderate AI usage." One of the most interesting things about learning are those things that are easy to learn and hard to master. You can teach a child chess, it is easy to learn but it is hard to master.
As in, it's now completely preventing you from doing things you could have before?
This is not to disagree with the OP, but to point out that, even for engineers, the speedups might not appear where you expect. [EDIT I see like 4 other comments making the same point :)]
I find that getting from zero to 80-90% functionality on just about anything software these days is exceedingly easy. So, I wonder if AI just rides that wave. Software development is maturing now such that making software with or without AI feels 10-100x faster. I suspect it is partially due to the profound leap that has been made with collaborative tools, compilers, languages, and open source methodology, etc..
Actually writing software was only like 15-20% of my time though so the efficiency wins from having an LLM write the code is somewhat limited. It’s still another tool that makes me more productive but I’ve not figured out a way to really multiplicatively increase my productivity.
Also, one underestimated aspect is that LLMs don’t get writer’s block or get tired (so long as you can pay to keep the tokens flowing).
Also, one of the more useful benefits of coding with LLMs is that you are explicitly defining the requirements/specs in English before coding. This effectively means LLM-first code is likely written via Behavior Driven Development, so it is easier to review, troubleshoot, upgrade. This leads to lower total cost of ownership compared to code which is just cowboyed/YOLOed into existence.
What about just noticing that coworkers are repeatedly doing something that could easily be automated?
Interesting observation. I am inclined to agree with this myself. I'm more of a 10^0 kind of developer though.
It is not making us 10x productive. It is making it 10x easier.
There is no secret herbal medicine that prevents all disease sitting out in the
open if you just follow the right Facebook groups. There is no AI coding
revolution available if you just start vibing. You are not missing anything.
Trust yourself. You are enough.
Oh, and don't scroll LinkedIn. Or Twitter. Ever.
https://arxiv.org/abs/2507.09089
Obviously it depends on what you are using the AI to do, and how good a job you do of creating/providing all the context to give it the best chance of being successful in what you are asking.
Maybe a bit like someone using a leaf blower to blow a couple of leaves back and forth across the driveway for 30 sec rather than just bending down to pick them up.... It seems people find LLMs interesting, and want to report success in using them, so they'll spend a ton of time trying over and over to tweak the context and fix up what the AI generated, then report how great it was, even though it'd have been quicker to do it themselves.
I think agentic AI may also lead to this illusion of, or reported, AI productivity ... you task an agent to do something and it goes off and 30 min later creates what you could have done in 20 min while you are chilling and talking to your workmates about how amazing this new AI is ...
But maybe another thing is not considered - while things may take longer, they ease cognitive load. If you have to write a lot of boilerplate or you have a task to do, but there are too many ways to do it, you can ask AI to play it out for you.
What benefit I can see the most is that I no longer use Google and things like Stack Overflow, but actual books and LLMs instead.
1) The junior developer is able to learn from experience and feedback, and has a whole brain to use for this purpose. You may have to provide multiple pointers, and it may take them a while to settle into the team and get productive, but sooner or later they will get it, and at least provide a workable solution if not what you may have come up with yourself (how much that matters depends on how wisely you've delegated tasks to them). The LLM can't learn from one day to the next - it's groundhog day every day, and if you have to give up with the LLM after 20 attempts it'd be the exact same thing tomorrow if you were so foolish to try again. Companies like Anthropic apparently aren't even addressing the need for continual learning, since they think that a larger context with context compression will work as an alternative, which it won't ... memory isn't the same thing as learning to do a task (learning to predict the actions that will lead to a given outcome).
2) The junior developer, even if they are only marginally useful to begin with, will learn and become proficient, and the next generation of senior developer. It's a good investment training junior developers, both for your own team and for the industry in general.
Now let's say you use Claude code, or whatever, and you're able to create the same web app over a weekend. You spend 6 hours a day on Saturday and Sunday, in total 12 hours.
That's 10x increase in productivity right there. Did it make you a 10x better programmer? Nope, probably not. But your productivity went up by a tenfold.
And at least to me, that's sort of how it has worked. Things I didn't have motivation or energy to get into before, I can get into over a weekend.
For me it's 50-50 reading other people's code and getting a feel for the patterns and actually writing the code.
So no, imho people with no app dev skills cannot just build something over a weekend, at least something that won‘t break when the first user logs in.
That being said, I am a generalist with 10+ years of experience and can spot the good parts from bad parts and can wear many hats. Sure, I do not know everything, but, hey did I know everything when AI was not there? I took help from SO, Reddit and other places. Now, I go to AI, see if it makes sense, apply the fix, learn and move on.
https://www.businessinsider.com/ai-coding-tools-may-decrease...
1. googling stuff about how APIs work 2. writing boilerplate 3. typing syntax correctly
These three things combined make up a huge amount of programming. But when real cognition is required I find I'm still thinking just as hard in basically the same ways I've always thought about programming: identifying appropriate abstractions, minimizing dependencies between things, pulling pieces together towards a long term goal. As far as I can tell, AI still isn't really capable of helping much with this. It can even get in the way, because writing a lot of code before key abstractions are clearly understood can be counterproductive and AI tends to have a monolithic rather than decoupled understanding of how to program. But if you use it right it can make certain tasks less boring and maybe a little faster.
But is a 10x going to 100x?
This is all you have to takeaway from this article. Social media is a cesspool of engagement farmers dropping BS takes to get you to engage out of FOMO or anger. Every time I'm on there, I am instantly reminded why I quit going there. It's not genuine and it's designed to capture your attention away from more important things.
I've been using LLMs on my own for the past few years and we just recently started our own first party model that we can now use for work. I'm starting to get into agentic actions where I can integrate with confluence, github, jira, etc. It's a learning curve for sure but I can see where it will lead to some productivity gains but the road blocks are still real, especially when working with other teams. Whether you're waiting for feedback or a ticket to be worked on, the LLM might speed run you to a solution but you better be ready with the next thing and the next thing while you're waiting.
A lot of senior engineers in the big tech companies spend most of their time in meetings. They're still brilliant. For instance, they read papers and map out the core ideas, but they haven't been in the weeds for a long time. They don't necessarily know all the day-to-day stuff anymore.
Things like: which config service is standard now? What's the right Terraform template to use? How do I write that gnarly PromQL query? How do I spin up a new service that talks to 20 different systems? Or in general, how do I map my idea to deployable and testable code in the company's environment?
They used to have to grab a junior engineer to handle all that boilerplate and operational work. Now, they can just use an AI to bridge that gap and build it themselves.
If your organization is routinely spending 3 months on a code review, it sounds like there's probably a 10 to 100x improvement you can extract from fixing your process before you even start using AI.
but every company is going to enshittify everything they can to pidgeonhole ai use to justify the grifters costs
i look forward to years out when these companies trying to save money at any cost have to pay senior developers to rip all this garbage out
Totally agree, IMO there's a lot of potential for these tools to help with code understanding and not just generation. Shameless plug for a code understanding tool we've been working on that helps with this: https://github.com/sourcebot-dev/sourcebot
This assumes the acceleration happens on all tasks. Amdahl's law states that the overall acceleration is constrained by the portion of the accelerated work. Probably it's just unclear if the "engineer" or "productivity" means the programming part or the overall process.
- vibe coding is fun, but not production-ready software engineering
- LLMs like CC today moderately boost your performance. A lot of attention is still needed.
- some TDD style is needed for the AI tool to converge
- based on the growth of the last few months, it is quite likely that these tools will increase IC productivity substantially
- fully autonomous agentic coding will take more time as the error rate needs to decline significantly
> It tends to struggle with languages like Terraform
The language is called HCL (HashiCorp Configuration Language).
1. Tech Company's should be able to accelerate and supplant the FAANGs of this world. Like even if 10x was discounted to 5x. It would mean that 10 human years of work would be shrunk down to 2 to make multi-billion dollar companies. This is not happening right now. If this does not start happening with the current series of model, murphy's law (e.g. interest rate spike at some point) or just damn show me the money brutal questions would tell people if it is "working".
2. I think Anthropic's honcho did a back of the envelope number of 600$ for every human in the US(I think just it was just the US) was necessary to justify Nvidia's market Cap. This should play out by the end of this year or in Q3 report.
My use case is not for a 10x engineer but instead for *cognitive load sharing*. I use AI in a "non-linear" fashion. Do you? Here is what that means:
1. Brainstorm an idea and write down detailed enough plan. Like tell me how I might implement something or here is what I am thinking can you critique and compare it with other approaches. Then I quickly meet with 2 more devs and make a design decision for which one to use.
2. Start manual coding and let AI "fill the gaps": Write these test for my code or follow this already existing API and create the routes from this new spec. This is non-linear because I would complete 50-75% of the feature and let the rest be completed by AI.
3. I am tired and about to end my shift and there is this last bug, I go read the docs but I also ask AI to read my screen and come up with some hypothesis to come up with. I decide which hypothesis are most promising after some reading and then ask the AI to just test that(not fix it on auto mode).
4. Voice mode: I have a shortcut that triggers claude code and uses it like a quick "lookup/search" in my code base. This avoids context switching.
When you're not sure if what someone says makes sense, trust common sense, your own experience, and your thinking.
If I can write blue sky / green field, code. Brand new code in a new repo, no rules just write code, I can write tons of code. What bogs me down are things like tests. It can take more time to write tests than the code itself in my actual work project. Of course I know the tests are important and maybe the LLM can help here. I'm just saying that they slow me down. Waiting for code reviews slows me down. Again, they're super useful but coming from a place where the first 20-25 years of my career I didn't have them they are a drag on my performance. Another is just the size of the project I'm on. > 500 programmers on my current large project. Assume it's an OS. It's just hard to make progress on such a large project compared to a small one. And yet another which is part of the first, other people's code. If I write the whole thing or most of it, then I know exactly what to change. I've written features in code I know in days that someone who was not familiar with the code I believe would have taken months. But, me editing someone else's code without the entire state of the code base in my head is 10x slower.
That's a long way of saying, many 10xers might just be in the right circumstance to provide 10x. You're then compared against them but you're not in the same circumstance so you get different results.
I used to not really believe people like that existed but it turned out they're just rare enough that I hadn't worked with any yet. You could definitely go a whole career without ever working with any 10x engineers.
And also it's not like they're actually necessary for a project to succeed. They're very good but it's extremely unlikely that a project will succeed on the back of one or two very good engineers. The project I worked with them on failed for reasons nothing to do with us.
For newer languages, packages, and hardware-specific code, I have yet to use a single frontier model that has not slowed me down by 50%. It is clear to me that LLMs are regurgitating machines, and no amount of thinking will save the fact that the transformer architecture (all ML really) poorly extrapolates beyond what is in the training canon.
However, on zero-to-one projects that are unconstrained by my mag-seven employer, I am absolutely 10x faster. I can churn through boilerplate code, have faster iterations across system design, and generally move extremely fast. I don't use agentic coding tools as I have had bad experiences in how the complexity scales, but it is clear to me that startups will be able to move at lightning pace relative to the large tech behemoths.
Linear was a very early-stage product I tested a few months after their launch where I was genuinely blown away by the polish and experience relative to their team size. That was in 2020, pre-LLMs.
I have yet to see an equally polished and impressive early-stage product in the past few years, despite claims of 10x productivity.
Now that LLMs have actually fulfilled that dream — albeit by totally different means — many devs feel anxious, even threatened. Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
I think Colton’s article nails the emotional side of this: imposter syndrome isn’t about the actual 10x productivity (which mostly isn't real), it’s about the perception that you’re falling behind. Meanwhile, this perception is fueled by a shift in what “software engineering” looks like.
LLMs are effectively the ultimate CASE tools — but they arrived faster, messier, and more disruptively than expected. They don’t require formal models or diagrams. They leap straight from natural language to executable code. That’s exciting and unnerving. It collapses the old rites of passage. It gives power to people who don’t speak the “sacred language” of software. And it forces a lot of engineers to ask: What am I actually doing now?
Nor do they produce those (do they?). That is what I would like to see. Formal models and diagrams are not needed to produce code. Their point is that they allow us to understand code and to formalize what we want it to do. That's what I'm hoping AI could do for me.
Now I can always switch to a different model, increase the context, prompt better etc. but I still feel that actual good quality AI code is just out of arms reach, or when something clicks, and the AI magically starts producing exactly what I want, that magic doesn't last.
Like with stable diffusion, people who don't care as much or aren't knowledgeable enough to know better, just don't get what's wrong with this.
A week ago, I received a bug ticket claiming one of the internal libs i wrote didn't work. I checked out the reporter's code, which was full of weird issues (like the debugger not working and the typescript being full of red squiggles), and my lib crashed somewhere in the middle, in some esoteric minified js.
When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
And the knock-on effect is that there is less menial work. Artists are commissioned less for the local fair, their friend's D&D character portrait, etc. Programmers find less work building websites for small businesses, fixing broken widgets, etc.
I wonder if this will result in fewer experts, or less capable ones. As we lose the jobs that were previously used to hone our skills will people go out of their way to train themselves for free or will we just regress?
A schematic of a useless amplifier that oscillates looks just as pretty as one of a correct amplifier. If we just want to use it as a repeated print for the wallpaper of an electronic lab, it doesn't matter.
> Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
is what raised flags in my head. Rather than explain the difference between glorified autocompletion and generation, the post assumes there is a difference then uses florid prose to hammer in the point it didn't prove.
I've heard the paragraph "why? Because X. Which is not Y. And abcdefg" a hundred times. Deepseek uses it on me every time I ask a question.
Which came first...
There's many jobs that can be eliminated with software, but haven't because managers don't want to hire SWEs without proven value. I don't think HN realizes how big that market is.
With AI, the managers will replace their employees with a bunch of code they don't understand, watch that code fail in 3 years, and have to hire SWEs to fix it.
I'd bet those jobs will outnumber the ones initially eliminated by having non-technical people deliver the first iteration.
Many of those jobs will be high-skill/impact because they are necessarily focused on fixing stuff AI can't understand.
Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it. You can't review and understand code 10x faster just because an LLM generated it.
In fact, reviewing generated code often takes longer because you're reverse-engineering implicit assumptions rather than implementing explicit intentions.
The "10x productivity" narrative only works if you either:
- Are not actually reviewing the output properly
or
- Are working on trivial code where correctness doesn't matter.
Real software engineering, where bugs have consequences, remains bottlenecked by human cognitive bandwidth, not code generation speed. LLMs shifted the work from writing to reviewing, and that's often a net negative for productivity.
This seems excessive to me. Do you comprehend the machine code output of a compiler?
I must comprehend code at the abstraction level I am working at. If I write Python, I am responsible for understanding the Python code. If I write Assembly, I must understand the Assembly.
The difference is that Compilers are deterministic with formal specs. I can trust their translation. LLMs are probabilistic generators with no guarantees. When an LLM generates Python code, that becomes my Python code that I must fully comprehend, because I am shipping it.
That is why productivity is capped at review speed, you can't ship what you don't understand, regardless of who or what wrote it.
- it’s not just X, it’s Y
- emdashes everywhere
And while I don't categorically object to AI tools, I think your selling objections to them short.
It's completely legitimate to want an explainable/comprehendable/limited-and-defined tool rather than a "it just works" tool. Ideally, this puts one in an "I know its right" position rather than a "I scanned it and it looks generally right and seems to work" position.
Ironically, when I listen to vinyl instead of streaming, I listen to less music.
If I'm in the zone, I will often go minutes between flipping the record or choosing another one; even though my record player is right next to me.
* if my Github actions ran 10x faster, so I don't start reading about "ai" on hackernews while waiting to test my deployment and not noticing the workflow was done an hour ago
* if the Google cloud console deployment page had 1 instead of 10 vertical scroll bars and wasn't so slow and janky in Firefox
* if people started answering my peculiar but well-researched stackoverflow questions instead of nitpicking and discussing whether they belong on superuser vs unix vs ubuntu vs hermeneutics vs serverfault
* if MS Teams died
anyway, nice to see others having the same feeling about llm's
What makes an excellent engineer is risk mitigation and designing systems under a variety of possible constraints. This design is performed using models of the domains involved and understanding when and where these models hold and break down. There's no "10x". There is just being accountable for designing excellent systems to perform as desired.
If there is a "10x" software engineer, such an engineer would prevent data breaches from occurring, which is a common failure mode in software to the detriment of society. I want to see 10x less of that.
>What makes an excellent engineer is risk mitigation and designing systems under a variety of possible constraints.
I take it that those fields also don't live by the "move fast and break things" motto?
Any tool can be shown to increase performance in closed conditions and within specific environments, but when you try to generalize things do not behave consistently.
Regardless, I would always argue that trying new tech / tools / workflows is always better than being stiff in your ways, regardless of the productivity results. I do like holding up on new things until things mature down a bit before trying though.
jf22•10h ago
With enough rules and good prompting this is not true. The code I generate is usually better than what I'd do by hand.
The reason the code is better all the extra polish and gold plating is essentially free.
Everything I generate comes out commented great error handling, logging, SOLID, and united tested using established patterns in the code base.
tobyhinloopen•10h ago
drgiggles•10h ago
micromacrofoot•10h ago
mrugge•10h ago
micromacrofoot•9h ago
I read and understand 100% of the code it outputs, so I'm not so worried about falling too far astray...
being too prescriptive about it (like prompting "don't write comments") makes the output worse in my experience
the__alchemist•10h ago
exe34•10h ago
shortrounddev2•10h ago
ggregoire•10h ago
rob_c•10h ago
I prefer to push for self documenting code anyway, never saw the need for docs other than for an API when I'm calling something like a black box.
cjfd•10h ago
drgiggles•10h ago
skydhash•9h ago
jf22•10h ago
timmytokyo•8h ago
jf22•8h ago
How often do you use coding LLMs?
wglb•5h ago
What is particularly useful is the comments about reasoning about new code added at my request.
reactordev•10h ago
Let’s boil this down to an easy set of reproducible steps any engineer can take to wrangle some sense from their AI trip.
rob_c•10h ago
exe34•10h ago
reactordev•10h ago
jf22•10h ago
icey•9h ago
symfrog•10h ago
rob_c•10h ago
symfrog•10h ago
rob_c•10h ago
It's not about lines of code or quality it's about solving a problem. If the problem creates another problem then it's bad code. If it solves the problem without causing that then great. Move onto the next problem.
jakelazaroff•10h ago
In other words, it matters whether the AI is creating technical debt.
rob_c•4h ago
That has nothing to do with AI/LLMs.
If you can't understand what the tool spits out either; learn, throw it away, or get it to make something you can understand.
jakelazaroff•3h ago
zwnow•10h ago
NitpickLawyer•10h ago
Weren't there 2 or 3 dating apps that were launched before the "vibecoding" craze that went extremely popular and got extremely hacked weeks/months in? I also distinctly remember a social network having firebase global tokens on the clientside, also a few years ago.
zwnow•10h ago
NitpickLawyer•10h ago
We went from "this thing is a stochastic parrot that gives you poems and famous people styled text, but not much else" to "here's a fullstack app, it may have some security issues but otherwise it mainly works" in 2.5 years. People expect perfection, and move the goalposts. Give it a second. Learn what it can do today, adapt, prepare for what it can do tomorrow.
jakelazaroff•7h ago
bpt3•7h ago
LLMs are still stochastic parrots, though highly impressive and occasionally useful ones. LLMs are not going to solve problems like "what is the correct security model for this application given this use case".
AI might get there at some point, but it won't be solely based on LLMs.
rob_c•4h ago
Frankly I've seen LLMs answer better than people trained in security theatre so be very careful where you draw the line.
If you're trying to say they struggle with what they've not seen before. Yes, provided that what is new isn't within the phase space they've been trained over. Remember there's no photographs of cats riding dinosaurs but SD models can generate them.
bpt3•2h ago
rob_c•4h ago
Repeat after me, token prediction is not intelligence.
mwigdahl•9h ago
I have experimented with vibe coding. With Claude Code I could produce a useful and usable small React/TS application, but it was hard to maintain and extend beyond a fairly low level of complexity. I totally agree that vibe coding (at the moment) is producing a lot of slop code, I just don't think Tea is an example of it from what I understand.
mavelikara•10h ago
anonzzzies•10h ago
micahscopes•10h ago
https://github.com/micahscopes/radix_immutable
I took an existing MIT licensed prefix tree crate and had Claude+Gemini rewrite it to support immutable quickly comparable views. The execution took about one day's work, following two or three weeks thinking about the problem part time. I scoured the prefix tree libraries available in rust, as well as the various existing immutable collections libraries and found that nothing like this existed. I wanted O(1) comparable views into a prefix tree. This implementation has decently comprehensive tests and benchmarks.
No code for the next two but definitely results...
Tabu search guided graph layout:
https://bsky.app/profile/micahscopes.bsky.social/post/3luh4d...
https://bsky.app/profile/micahscopes.bsky.social/post/3luh4s...
Fast Gaussian blue noise with wgpu:
https://bsky.app/profile/micahscopes.bsky.social/post/3ls3bz...
In both these examples, I leaned on Claude to set up the boilerplate, the GUI, etc, which gave me more mental budget for playing with the challenging aspects of the problem. For example, the tabu graph layout is inspired by several papers, but I was able to iterate really quickly with claude on new ideas from my own creative imagination with the problem. A few of them actually turned out really well.
wglb•5h ago
(edit)
I asked it to generate a changelog: https://github.com/wglb/gemini-chat/blob/main/CHANGELOG.md
the__alchemist•10h ago
echelon•10h ago
I use "tab-tab" auto complete to speed through refactorings and adding new fields / plumbing.
It's easily a 3x productivity gain. On a good day it might be 10x.
It gets me through boring tedium. It gets strings and method names right for languages that aren't statically typed. For languages that are statically typed, it's still better than the best IDE AST understanding.
It won't replace the design and engineering work I do to scope out active-active systems of record, but it'll help me when time comes to build.
recursive•10h ago
Bootvis•10h ago
recursive•9h ago
The 5% is an increase in straight-ahead code speed. I spend a small fraction of my time typing code. Smaller than I'd like.
And it very well might be an economically rational subscription. For me personally, I'm subscription averse based on the overhead of remembering that I have a subscription and managing it.
neutronicus•10h ago
This is emphatically NOT my experience with a large C++ codebase.
echelon•10h ago
It expands match blocks against highly complex enums from different crates, then tab completes test cases after I write the first one. Sometimes even before that.
neutronicus•4h ago
Just by virtue of Rust being relatively short-lived I would guess that your code base is modular enough to live inside reasonable context limits, and written following mostly standard practice.
One of the main files I work on is ~40k lines of code, and one of the main proprietary API headers I consume is ~40k lines of code.
My attempts at getting the models available to Copilot to author functions for me have often failed spectacularly - as in I can't even get it to generate edits at prescribed places in the source code, follow examples from prescribed places. And the hallucination issue is EXTREME when trying to use the big C API I alluded to.
That said Claude Code (which I don't have access to at work) has been pretty impressive (although not what I would call "magical") on personal C++ projects. I don't have Opus, though.
AnotherGoodName•10h ago
Prompts are especially good for building a new template of structure for a new code module or basic boilerplate for some of the more verbose environments. eg. Android Java programming can be a mess, huge amounts of code for something simple like an efficient scrolling view. AI takes care of this - it's obvious code, no thought, but it's still over 100 lines scattered in XML (the view definitions), resources, and in multiple Java files.
Do you really want to be copying boilerplate like this across to many different files? Prompts that are well integrated to the IDE (they give a diff to add the code) are great (also old style Android before Jetpack sucked) https://stackoverflow.com/questions/40584424/simple-android-...
jf22•10h ago
abtinf•10h ago
mrugge•10h ago
mrbungie•10h ago
I'm always baffled by this. If you can't do it that well by hand, how can you discriminate its quality so confidently?
I get there is a artist/art consumer analogy to be made (i.e. you can see a piece is good without knowing how to paint), but I'm not convinced it is transferrable to code.
Also, not really my experience when dealing with IaC or (complex) data related code.
tptacek•10h ago
dingnuts•10h ago
tptacek•10h ago
mrbungie•10h ago
Well-written bullshit in perfect prose is still bullshit.
halfmatthalfcat•10h ago
k1t•10h ago
mrbungie•10h ago
wahnfrieden•10h ago
Related - agentic LLMs may be slow to produce output but they are parallelizable by an individual unlike hand-written work.
mrbungie•10h ago
jf22•10h ago
With AI the extra quality and polish is basically free and instantaneous.
mrbungie•10h ago
Point still remains for junior and semi-senior devs though, or any dev trying to leap over a knowledge barrier with LLMs. Emphasis on good pipelines and human (eventually maybe also LLM based) peer-reviews will be very important in the years to come.
Kiro•9h ago
micromacrofoot•10h ago
It may change in the future, but AI is without a doubt improving our codebase right now. Maybe not 10X but it can easily 2X as long as you actually understand your codebase enough to explain it in writing.
intended•10h ago
These conversations on AI code good, vs AI code bad constantly keep cropping up.
I feel we need to build a cultural norm to share examples places of succeeded, and failures, so that we can get to some sort of comparison and categorization.
The sharing also has to be made non-contentious, so that we get a multitude of examples. Otherwise we’d get nerd-sniped into arguing the specifics of a single case.
r3trohack3r•10h ago
micahscopes•10h ago
zoeysmithe•10h ago
I think its only a matter of time until our roles are commoditized and vibe-coding becomes the norm in most industries.
Vibe coding being a dismissive term on developing a new skillset. For example we'll be doing more planning and testing and such instead of writing code. The same way, say, sysadmins just spin up k8s instead of racking servers or car mechanics read diagnosis codes from readers and, often, just replace an electric part instead of hand-tuning carbs or gapping spark plugs and such. That is to say, a level of skill is being abstracted away.
I think we just have to see this, most likely, as how things will get done going forward.
tovej•10h ago
This reads like empty hype to me, and there's more than one claim like this in these threads, where AI magically creates an app, but any description of the app itself is always conspicuously missing.
zoeysmithe•9h ago
I also have never used godot before, and I was surprised at how well it navigated and taught me the interface as well.
At least the horror stories about "all the code is broken and hallucinations" isn't really true for me and my uses so far. If LLM's will succeed anywhere it will be in the overly logical and predictable worlds of programming languages, but that's just a guess on my part, but thus far whenever I reach out for code from LLM's, its been a fairly positive experience.
tovej•8h ago
I do still disagree with your assessment, I think the syntactic tokens in programming languages have a kind of impedance mismatch with the tokens that LLMs, and that the formal semantics of programming languages are a bad fit with the fuzzy statistical LLMs. I firmly believe that increased LLM usage will drive software safety and quality down, simply because a) no semblance of semantic reasoning or formal verification has been applied to the code and b) a software developer will have an incomplete understanding of code not written by themself.
But our opinions can co-exist, good luck in your game development journey!
zoeysmithe•8h ago
As far as QA goes, we then circle back to the tool itself being the cure for the problems the tool brings in, which is typical in technology. The same way agile/'break things' programming's solution to QA was to fire the 'hands on' QA department and then programmatically do QA. Mostly for cost savings, but partly because manual QA couldn't keep up.
I think like all artifacts in capitalism, this is 'good enough,' and as such the market will accept it. The same way my laggy buggy Windows computer would be laughable to some in the past. I know if you gave me this Win11 computer when I was big into low-footprint GUI linux desktop, I would have been very unimpressed, but now I'm used to it. Funny enough, I'm migrating back to kubuntu because Windows has become unfun and bloaty and every windows update feels a bit like gambling. But that's me. I'm not the typical market.
I think your concerns are real and correct factually and ideologically, but in terms of a capitalist market will not really matter in the end, and AI code is probably here to stay because it serves the capital owning class (lower labor costs/faster product = more profit for them). How the working class fares or if the consumer product isn't as good as it was will not matter either unless there's a huge pushback, which thus far hasn't happened (coders arent unionizing, consumers seem to accept bloaty buggy software as the norm). If anything the right-wing drift of STEM workers and the 'break things' ideology of development has primed the market for lower-quality AI products and AI-based workforces.
skydhash•9h ago
xienze•9h ago
tempodox•10h ago
jf22•10h ago
neutronicus•10h ago
nurettin•10h ago
Me: Here's the relevant part of the code, add this simple feature.
Opus: here's the modified code blah blah bs bs
Me: Will this work?
Opus: There's a fundamental flaw in blah bleh bs bs here's the fix, but I only generate part of the code, go hunt for the lines to make the changes yourself.
Me: did you change anything from the original logic?
Opus: I added this part, do you want me to leave it as it was?
Me: closes chat
NitpickLawyer•10h ago
Coding in a chat interface, and expecting the same results as with dedicated tools is ... 1-1.5 years old at this point. It might work, but your results will be subpar.
nurettin•7h ago
apwell23•10h ago
There are atleast 10 posts on HN these days with the same discussion in circle.
1. AI sucks at code
2. you are not using my magic prompting technique
jf22•10h ago
miggy•10h ago