I'm still at the bargaining phase, personally.
I mean, at some point it was true.
I remember that around 2023, when I first encountered colleagues trying to use ChatGPT for coding, I thought "by the time you are done with your back-and-forth to correct all the errors, I would have already written this code manually".
That was true then, but not anymore.
I'm not a proper software engineer, but I do a lot of scripting and most of my attempts to let a model speed up a menial task (e.g. a small bash or python script for some data parsing or chaining together other tools), end up with me doing extensive rewrites because the model is completely inconsistent in naming convention, pattern reusage, etc.
But the interesting stuff where you don't understand the problem yet, it doesn't make it quicker. Because then the bottleneck is my understanding. Things take time. And sleep. They require hands-on experience. It doesn't matter how fast LLMs can churn out code. There's a limit to how fast I can understand things. Unless, of course, I'm happy shipping code I don't understand, which I'm not.
So I think LLMs have moved the effort that used to be spent on fun part (coding) into the boring part (assessment and evaluation) that is also now a lot bigger..
You can also setup way more elaborate verification systems. Don't just do a static analyis of the code, but actually deploy it and let the LLM hammer at it with all kinds of creative paths. Then let it debug why it's broken. It's relentless at debugging - I've found issues in external tools I normally would've let go (maybe created an issue for), that I can now debug and even propose a fix for, without much effort from my side.
So yeah, I agree that the boring part has become the more important part right now (speccing well and letting it build what you want is pretty much solved), but let's then automate that. Because if anything, that's what I love about this job: I get to automate work, so that my users (often myself) can be lazy and focus on stuff that's more valuable/enjoyable/satisfying.
Once the tools outperform humans at the tasks to which they were applied (and they will), you don't need to be involved at all, except to give direction and final acceptance. The tools will write, and verify, the code at each step.
I don't get why some people are so convinced that this is inevitable. It's possible, yes, but it very well might be the case, that models cannot be stopped from randomly doing stupid things, cannot be made more trustworthy, cannot be made more verifiable, and will have to be relegated to the role of brainstorming aids.
Someone once said that It is hard to make a man understand things if their profit depends on them not understanding it...
Many are still in denial that you can do work that is as good as before, quicker, using coding agents. A lot of people think there has to be some catch, but there really doesn’t have to be. If you continue to put effort in, reviewing results, caring about testing and architecture, working to understand your codebase, then you can do better work. You can think through more edge cases, run more experiments, and iterate faster to a better end result.
To all of you I can only say, you were utterly wrong and I hope you realize how unreliable your judgements all are. Remember I'm saying this to roughly 50% of HN., an internet community that's supposedly more rational and intelligent than other places on the internet. For this community to be so wrong about something so obvious.... That's saying something.
They weren't wrong though. It objectively is just a next turn predictor and doesn't understand code. That is how the thing works.
Don’t make me cite George Hinton or other preeminent experts to show you how wrong you all are.
Use your brain. It is changing the industry from the ground up. It understands.
You gonna give some predictable answer about next token prediction and probability or some useless exposition on transformers while completely avoiding the fact that we don’t understand the black box emergent properties that make a next token predicted have properties indistinguishable from intelligence?
I don’t have any questions about LLMs. At least not any more than say an LLM researcher at anthropic working on model interpretability.
My conclusion is that at this point, LLMs are not capable of making good decisions supported by deep reasoning. They're capable of mimicking that, yes, and it takes some skill to see through them.
We don’t have to accept things.
— George Bernard Shaw
The antidote to runaway hype is for someone to push back, not to just relent and accept your fate. Who cares about affording to. We need more people with ideals stronger than the desire to make a lot of money.
When I say AI, I mean specifically LLMs. There isn't a single future position where all the risks are suitably managed, there is a return of investment and there is not a net loss to society. Faith, hope, lies, fraud and inflated expectations don't cut it and that is what the whole shebang is built on. On top of that, we are entering a time of serious geopolitical instability. Creating more dependencies on large amounts of capital and regional control is totally unacceptable and puts us all at risk.
My integrity is worth more than sucking this teat.
He doesn't have our bagage. He doesn't feel the anxiety the purists feel.
He just pipes all errors right back in his task flow. He does period refactoring. He tests everything and also refactors the tests. He does automated penetration testing.
There are great tools for everything he does and they are improving at breakneck speeds.
He creates stuff that is levels above what I ever made and I spent years building it.
I accepted months ago: adapt or die.
There is plenty of code that require proof of correctnesss and solid guarantees like in aviation or space and so on. Torvalds in a recent interview mentioned how little code he gets is generated despite kernel code being available to train easily .
Yeah I dread the software landscape in 10 years, when people will have generated terabytes of unmaintainable slop code that I need to fix.
“He automated his job so well the company doesn’t need him anymore.”
How is that measured? Is his stuff maintainable? Is it fast? Are good architectural decisions baked in that won't prevent him from adding a critical new feature?
I don't understand where this masochism comes from. I'm a software developer, I'm an intelligent and flexible person. The LLM jockey might be the same kind of person, but I have years of actual development experience and NOTHING preventing me from stepping down to that level and doing the same thing, starting tomorrow. I've built some nice and complicated stuff in my life, I'm perfectly capable of running a LLM in a loop. Most of the stuff that people like to call prompt/agentic/frontier or whatever engineering is ridiculously simple, and the only reason I'm not spending much time on it is that I don't think it leads to the kind of results my employer expects from me.
A programmer who is not delighted by programming cannot be very good at it. So the same people who are "delighted" by using an LLM is the exact same people who should not be using it.
It would be like putting a person who don't know how to drive in the driving seat of a semi-autonomous driving vehicle.
I'm able to pay rent just fine without one...
If that's not delusional thinking I don't know what is.
I mean, if anything, I would expect it to help bring structure to medicine, which is an often sloppy profession killing somewhere between tens of thousands and hundreds of thousands of people a year through mistakes and out of date practices.
As medicine is currently very subjective. As a scientific field in the realm of physical sciences, it shouldn't be.
Just basic stuff like smart dictation that listens to the conversation the practitioner is having and auto creates the medical notes, letters, prescriptions etc saving them time and effort to type that all up themselves etc. They were saying that obviously they have to check everything but it was (and I quote) "scarily perfectly accurate". Freeing up a bunch of their time to actually be with the patient and not have to spend time typing etc.
I was building a tool to do exploratory data analysis. The data is manufacturing stuff (data from 10s of factories, having low level sensor data, human enrichments, all the way up to pre-agregated OEE performance KPIs). I didn't even have to give it any documentation on how the factories work - it just knew from the data what it was dealing with and it is very accurate to the extent I can evaluate. People who actually know the domain are raving about it.
But the next step for many is championing acceptance. Eg "that the same kind of success is available outside the world of highly structured language" .. it actually is visible when you engage with people. I'm myself going through this transition.
simonw•6h ago
That's why they hated it. Approving every change is the most frustrating way of using these tools.
I genuinely think that one of the biggest differences between people who enjoy coding agents and people who hate them is whether or not they run in YOLO mode (aka dangerously-skip-permissions). YOLO mode feels like a whole different product.
I get the desire not to do that because you want to verify everything they do, but you can still do that by reviewing the code later on without the pain of step-by-step approvals.
samlinnfer•6h ago
I found that Claude likes to leave some real gems in there if you get lazy and don't check. Gently sprinkled in between 100 lines of otherwise fine looking code that sows doubt into all of the other lines it's written. Sometimes it makes a horrific architectural decision and if it doesn't get caught right there it's catastrophic for the rest of the session.
txtsd•6h ago
qsera•6h ago
cornel_io•5h ago
qsera•5h ago
samlinnfer•6h ago
bitwize•6h ago
Toutouxc•5h ago
qsera•5h ago
You mean, let the LLM hallucinate about the HOW...
Toutouxc•6h ago
fodkodrasz•5h ago
tetraodonpuffer•22m ago
For simple scripts and simple self contained problems fully agenting in yolo mostly works, but as soon as it's an existing codebase or plans get more complex I find I have to handhold claude a lot more and if I leave it to its own devices I find things later. I have found also that having it update the plan with what it did AND afterwards review the plan it will find deviations still in the codebase.
Like the other day I had in the plan to refactor something due to data model changes, specifying very clearly this was an intentional breaking change (greenfield project under development), and it left behind all the existing code to preserve backwards compatibility, and actually it had many code contortions to make that happen, so much so I had to redo the whole thing.
Sometimes it does feel that Anthropic turns up/down the intelligence (I always run opus in high reasoning) but sometimes it seems it's just the nature of things, it is not deterministic, and sometimes it will just go off and do what it thinks it's best whether or not you prompt it not to (if you ask it later why it did that it will apologize with some variation of well it made sense at the time)
neonstatic•6h ago
evnp•6h ago
sdenton4•6h ago
NitpickLawyer•6h ago
Especially when the harness loop works if you let it work. First pass might have syntax issues. The loop will catch it, edit the file, and the next thing pops up. Linter issues. Runtime issues. And so on. Approving every small edit and reading it might lead to frustrations that aren't there if you just look at the final product (that's what you care about, anyway).
vova_hn2•6h ago
If you launch it in YOLO mode in a separate branch in a separate worktree (or, preferably, in total isolation), you can instead spend time reviewing changes from previous tasks or refining requirements for new tasks.
dcre•6h ago
vova_hn2•6h ago
It would be better if an LLM coding harness just helped you set up a proper sandbox for itself (containers, VMs etc.) and then run inside the isolated environment unconstrained.
In setup mode, the only tool accessible to the agent should be running shell scripts, and each script should be reviewed before running.
Inside an isolated environment, there should be no permission system at all.
MattGaiser•6h ago
dcre•6h ago
With auto-accept edits plus a decent allowlist for common commands you know are safe, the permission prompts you still get are much more tolerable. This does prevent you from using too many parallel agents at a time, since you do have to keep an eye on them, but I am skeptical of people using more than 3-5 anyway. Or at least, I'm sure there is work amenable to many agents but I don't think most software engineering is like that.
All that said, I am reaching the point where I'm ready to try running CC in a VM so I can go full YOLO.
Sophira•5h ago
It's a well-known truth in software development that programmers hate having to maintain code written by someone else. We see all the ways in which they wrote terrible code, that we obviously would never write. (In turn, the programmers after us will do the same thing to our code.)
Having to get into the mindset of the person writing the code is difficult and tiring, but it's necessary in order to realise why they wrote things the way they did - which in turn helps you understand the problems they were solving, and why the code they wrote actually isn't as terrible in context as it looked at first glance.
I think it makes sense that this would also apply to the use of generative AI when programming - reviewing the entire codebase after it's already been written is probably more error-prone and difficult than following along with each individual step that went into it, especially when you consider that there's no singular "mindset" you can really identify from AI-generated output. That code could have come from anywhere...
seba_dos1•4h ago