Your job is to deliver code you have proven to work

https://simonwillison.net/2025/Dec/18/code-proven-to-work/

859•simonw•1mo ago

Comments

daedrdev•1mo ago

Maybe in an ideal world

9rx•1mo ago

> Your job is to deliver code you have proven to work.

Your job is to solve customer problems. Their problems may only be solvable with code that is proven to work, but it is equally likely (I dare say even more likely) that their problem isn't best solved with code at all, or even solved with code that doesn't work properly but works well enough.

wrsh07•1mo ago

I would argue that the word "proof" in the title might be misleading you.

From the post and the example he links, the point is that if you don't at least look at the running code, you don't know that it works.

In my opinion the point is actually well illustrated by Chris's talk here:

https://v5.chriskrycho.com/elsewhere/seeing-like-a-programme...

(summary of the relevant section if you're not going to click)

>>>

In the talk "Seeing Like a Programmer," Chris Krycho quotes the conductor and composer Eímear Noone, who said:

> "The score is potential energy. It's the potential for music to happen, but it's not the music."

He uses this quote to illustrate the distinction between "software as artifact" (the code/score) and "software as system" (the running application/music). His point is that the code itself is just a static artifact—"potential energy"—and the actual "software" only really exists when that code is executed and running in the real world.

9rx•1mo ago

> if you don't at least look at the running code, you don't know that it works.

Your tests run the code. You know it works. I know the article is trying to say that testing is not comprehensive enough, but my experience disagrees. But I also recognize that testing is not well understood (quite likely the least understood aspect of computer science!) — and if you don't have a good understanding you can get caught not testing the right things or not testing what you think you are. I would argue that you would be better off using your time to learn how to write great tests instead of using it to manually test your code, but to each their own.

What is more likely to happen is not understanding the customer needs well enough, leaving it impossible to write tests that align with what the software needs to do. Software development can break down very quickly here. However, manual testing does not help. You can't know what to manually test without understanding the problem either. However, as before, your job is not to deliver proven code. Your job is to solve customer problems. When you realize that, it becomes much less likely that you write tests that are not in line with the solution you need.

wrsh07•1mo ago

To be clear, I don't think we disagree very much

I've seen people only run tests and break things (because the thing they broke wasn't covered by tests), I've seen people try to fix things and not verify that their fix works, etc

Good tests are sufficient in many cases to be confident that your code still works. But in general tests don't cover a lot of fundamental behavior, and if you don't exercise that fundamental behavior in one way or another, then you don't know that your code works

allcentury•1mo ago

Manual testing as the first step… not very productive imo.

Outside in testing is great but I typically do automated outside in testing and only manual at the end. The loop process of testing needs to be repeatable and fast, manual is too slow

simonw•1mo ago

Yeah that's fair, the manual testing doesn't have to sequentially go first - but it does have to get done.

I've lost count of the number of times I've skipped it because the automated test passed and then found there was some dumb but obvious bug that I missed, instantly exposed when I actually exercised the feature myself.

robryk•1mo ago

Would automated tests that produce a transcript of what they've done allow perusing that transcript to substitute for manual testing?

simonw•1mo ago

No. I've fallen for that trap in the past. Something inevitably catches you out in the end.

bluGill•1mo ago

The value of manual tests is when you "see something" that you didn't even think of.

pjc50•1mo ago

That sounds harder?

There's a lot of pedantry here trying to argue that there exists some feature which doesn't need to be "manually" tested, and I think the definition of "manual" can be pushed around a lot. Is running a program that prints "OK" a manual test or not? Is running the program and seeing that it now outputs "grue" rather than "bleen" manual? Does verifying the arithmetic against an Excel spreadsheet count?

There are programs that almost can't be manual, and programs that almost have to be manual. I remember when working on PIN pad integration we looked into getting a robot to push the buttons on the pad - for security reasons there's no way of injecting input automatically.

What really matters is getting as close to a realistic end user scenario as possible.

9rx•1mo ago

Maybe a bit pedantic, but does manual testing really need to be done, or is the intent here more towards being a usability review? I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests (there is no reason to write code that doesn't have a contractual purpose), but, after trying it, finding out that what you've created has an awful UX is something I have encountered and that is something much harder to encode in tests[1].

[1] As far as I can tell. If there are good solutions for this too, I'd love to learn.

RaftPeople•1mo ago

> I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests

Unit testing, whether manual or automated, typically catches about 30% of bugs.

End to end testing and visual inspection of code are both closer to 70% of bugs.

9rx•1mo ago

Automated testing (there aren't different kinds; to try and draw a distinction misunderstands what it is) doesn't catch bugs, it defines a contract. Code is then written to conform to that contract. Bugs cannot be introduced to be caught as they would violate the contract.

Of course that is not a panacea. What can happen in the real world is not truly understanding what the software needs to do. That can result in the contract not being aligned with what the software actually needs. It is quite reasonable to call the outcome of that "bugs", but tests cannot catch that either. In that case, the tests are where the problem lies!

Most aspects of software are pretty clear cut, though. You can reasonably define a full contract without needing to see it. UX is a particular area where I've struggled to find a way to determine what the software needs before seeing it. There is seemingly no objective measure that can be applied in determining if a UX is going to spark joy in order to encode that in a contract ahead of time. Although, as before, I'm quite interested to learn about how others are solving that problem as leaving it up to "I'll know it when I see it" is a rather horrible approach.

andy99•1mo ago

I think the problem is in what “proven” means. People that don’t know any better will just do that all with LLMs and still deliver the giant untested PRs but with some LLM written tests attached.

I vibe code a lot of stuff for myself, mostly for viewing data, when I don’t really need to care how it works. I’m coming around to the idea that outside of some specific circumstances where everyone has agreed they don’t need to care about or understand the code, team vibe coding is a bad practice.

If I’m paying an engineer, it’s for their work, unless explicitly agreed otherwise.

I think vibe coding is soon going to be seen the same way as “research” where you engage an offshore team (common e.g. in consulting) to give you a rundown on some topic and get back the first five google search results. Everyone knows how to do that, if it’s what they wanted they wouldn’t be hiring someone to do it.

simonw•1mo ago

That's why I emphasized the manual testing component as well. Attaching a screenshot or video of a feature working to your PR is a great way to prove that you've actually seen it work correctly - at least once, which is still a huge improvement over it not actually working at all.

Nizoss•1mo ago

Yes! This is something that I also value. Having demo gifs of before and after helps a lot. I have encountered situations where what I thought was a minor finishing clean up had an effect that I didn't anticipate. By including demos in the PR it becomes a kind of guardrail against those situations for me. I also think it is neat and generally helpful for everyone.

JambalayaJimbo•1mo ago

This might be useful when working on a low trust team but I can’t imagine doing that in my job, unless specifically working a poc or presentation.

ozozozd•1mo ago

If someone opened a PR, and it obviously doesn’t work but they claim they tested it, maybe that’s ok for the first time.

The second time it happens they gotta go.

I would find the expectation that I need to attach a screenshot insulting. And the understanding that my peers test their code to produce a screenshot would be pretty demoralizing.

dfxm12•1mo ago

there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

Is anyone else seeing this in their orgs? I'm not...

wizzwizz4•1mo ago

It's not a new phenomenon. Time was, people would copy-paste from blog posts with the same effect.

evilduck•1mo ago

I would bet in most organizations you can find a copy-pasted version of the top SO answer for email regex in their language of choice, and if you chase down the original commit author they couldn't explain how it works.

1-more•1mo ago

I think it's impossible to actually write an email regex because addresses can have arbitrarily deeply nested escaping. I may have that wrong. I'd hope that regex would be .+@.+ and that's it (watch me get Cunninghammed because there is some valid address wherein those plusses should be stars).

notpachet•1mo ago

TIL Cunningham's Law[0]. I knew about that phenomenon but not the proper name. Thanks!

[0] https://en.wikipedia.org/wiki/Ward_Cunningham#Law

lm28469•1mo ago

Always the same old tiring "this has always been possible before in some remotely similar fashion hence we should not criticise anything ever again" argument.

You could intuitively think it's just a difference of degree, but it's more akin to a difference of kind. Same for a nuke vs a spear, both are weapons, no one argues they're similar enough that we can treat them the same way

array_key_first•1mo ago

Yes, I'm so over this argument. It can literally be made for anything, and it is!

At the end of the day we're not performing war by poking other people with long sticks and we're not getting the word out by sending out a carrier pigeon.

Methods and medium matters.

troyvit•1mo ago

I used to do that in simpler days. I'd add a link to where I copied it from so we could reference it if there were problems. This was for relatively small projects with just a few people.

jennyholzer2•1mo ago

> I'd add a link to where I copied it from

LLMs can't do this.

Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

lcnPylGDnU4H9OF•1mo ago

> Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

This is not a truism. "My" code might come from an LLM and that's fine if I can be reasonably confident it works. I might try to gain that confidence by testing the code and reading it to understand what it's doing. It is also true of blog post code, regardless of how I refer to the code; if I link to the blog post, it's because it does a better job of explaining than I ever could in code comments. Whether LLMs make one more productive is hard to measure but it seems to be missing the point to write this.

The point is, including the code is a choice and one should be mindful of it, no matter the code's origin. At that point, this comes off like you just have something to prove; there doesn't seem to be a reason not to use the LLM code if you know it works and you know why it works.

wizzwizz4•1mo ago

Believing you know how it works and why it works is not the same as that actually being the case. If the code has no author (in that it's been plagiarised by a statistical process that introduces errors), there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!".

lcnPylGDnU4H9OF•1mo ago

> If the code has no author ... there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!"

That's also true if I author the code myself; I can't go to anyone for help with it, so if it doesn't work then I have to figure out why.

> Believing you know how it works and why it works is not the same as that actually being the case.

My series of accidental successes producing working code is honestly starting to seem like real skill and experience at this point. Not sure what else you'd call it.

wizzwizz4•1mo ago

> so if it doesn't work then I have to figure out why.

But it's built on top of things that are understood. If it doesn't work, then either:

• You didn't understand the problem fully, so the approach you were using is wrong.

• You didn't understand the language (library, etc) correctly, so the computer didn't grasp your meaning.

• The code you wrote isn't the code you intended to write.

This is a much more tractable situation to be in than "nobody knows what the code means, or has a mental model for how it's supposed to operate", which is the norm for a sufficiently-large LLM-produced codebase.

> My series of accidental successes

That somewhat misses the point. To write working code, you must have some understanding of the relationship between your intention and your output. LLMs have a poor-to-nonexistent understanding of this relationship, which they cover up with the ability to regurgitate (permutations of) a large corpus of examples – but this does not grant them the ability to operate outside the domain of those examples.

LLM-generated codebases very much do not lie within that domain: they lack the clues and signs of underlying understanding that human readers and (to an extent) LLMs rely on. Worse, the LLMs do replicate those signals, but they don't encode anything coherent in the signal. Unless you are very used to critically analysing LLM output, this can be highly misleading. (It reminds me of how chess grandmasters blunder, and struggle to even remember, unreachable board positions.)

Believing you know how LLM-generated code works, and why it works, is not the same as that actually being the case – in a very real sense that is different to that of code with human authors.

lcnPylGDnU4H9OF•1mo ago

> "nobody knows what the code means, or has a mental model for how it's supposed to operate"

> Believing you know how LLM-generated code works, and why it works, is not the same as that actually being the case

This is a strawman argument which I'm not really interested to engage. You can assume competence. (In a scenario where one doesn't make these mistakes, what's left in your argument? It is a sufficiently strong claim to say these cannot be avoided such that it is reasonable to dismiss the claim unless supporting evidence is provided. In other words, the solution is as simple as not making these mistakes.) As I wrote up-thread, including the code is a choice and one should be mindful of it.

wizzwizz4•1mo ago

I am assuming competence. Competent people make these mistakes.

If "assume competence" means "assume that people do not make the mistakes they are observed to make", then why write tests? Wherefore bounds checking? Pilots are competent, so pre-flight checklists are a waste of time. Your doctor's competent: why seek a second opinion? Being mindful involves compensating for these things.

It's possible that you're just that good – that you can implement a solution "as simple as not making these mistakes" –, in which case, I'd appreciate if you could write up your method and share it with us mere mortals. But could it also be possible that you are making these mistakes, and simply haven't noticed yet? How would you know if your understanding of the program didn't match the actual program, if you've only tested the region in which the behaviours of both coincide?

fragmede•1mo ago

Just like there are some easy "tells" with LLM generated English, vibecode has a certain smell to it. Parallel variables that do the same thing is probably the most common one I've seen in the hundreds of thousands of lines of vibecode I've generated and then reviewed (and fixed) by now. That's the philosophical Chinese room thought experiment though. It's a computer. Some sand that we melted into a special shape. Can it "understand"? Leave that for philosophers to decide. There's code, that was generated via LLM and not yacc, fine. Code is code though. If you sit down and read all of the code to understand what each variable, function, and class does, it doesn't matter where the code came from, that is what we call understanding what the code does. Sure, most people are too lazy to actually do that, and again, vibecode has a certain smell to it, but to claim that some because some artificial intelligence generated the code makes it incomprehensible to humans seems unsupported. It's fair to point out that there may not be humans that have bothered to, but that's a different claim. If we simplify the question, if ChatGPT generates the code to generate the Fibonacci sequence, can we, as humans, understand that code? Can we understand it if a human writes that same seven lines of code? As we scale up to more complex code though, at what point does it become incomprehensible to human grade intelligence? If it's all vibecode that isn't being reviewed and is just being thrown into a repo, then sure, no human does understand it. But it's just code. With enough bashing your head against it, even if there are three singleton factory classes doing almost the exact same thing in parallel and they only share state on Wednesdays over an RPC mechanism that shouldn't even work in the first place, but somehow it does, code is still code. There's not arcane hidden whitespace that whispers to the compiler to behave differently because AI generated it. It may be weird and different, but have you tried Erlang? You huff enough of the right kind of glue and you can get anything to make sense. If we go back to the Chinese room thought experiment though. If I, as a human, am able to work on tickets to cause intentional changes to the behavior of the vibecoded program/system that results in desired behavior/changes, at what point does it become actual understand vs merely thinking I understand the code.

Say you start at BigCo and are given access to their million line repo(s) with no docs and are given a ticket to work on. Ugh. You just barely started. But after you've been there for five years, it's obvious to you what the Pequad service does, and you might even know who gave it that name. If the claim is LLMs generate code that's simply incomprehensible by humans, the two counterexamples I have for you are TheDailyWtf.com, and Haskell.

wizzwizz4•1mo ago

> but to claim that some because some artificial intelligence generated the code makes it incomprehensible to humans seems unsupported

That's not my claim. My claim is that AI-generated code is misleading to people familiar with human-written code. If you've grown up on AI-generated code, I wouldn't expect you to have this problem, much like how chess newbies don't find impossible board states much harder to process than possible ones.

newsoftheday•1mo ago

Agreed on the first part for sure since an LLM is the computer/software version of a blender.

So, I'm agreed on the second part too then.

bgwalter•1mo ago

I don't see the problem with fentanyl given that people have been using caffeine forever.

nunez•1mo ago

Yeah, but being able to produce nuclear-sized 10k+ LOC PRs to open-source projects in minutes with relatively-zero effort definitely is. At least you had to use your brain to know which blog posts/SO answers to copypasta from.

briliantbrandon•1mo ago

I'm seeing a little bit of this. However, I will add that the primary culprits are engineers that were submitting low quality PRs before they had access to LLMs, they can just submit them faster now.

dfxm12•1mo ago

What's the ratio of people who things the right way vs not? I mean, is it a matter of giving them feedback to remind them what a "quality PR" is? Does that help?

jennyholzer2•1mo ago

LLMs have dramatically empowered sociopath software developers.

If you are sufficiently motivated to appear more "productive" than your coworkers, you can force them to review thousands of lines of incorrect AI slop code while you sit back and mess around with your chatbots.

Your coworkers no longer have enough time to work on their in-progress PRs, so you can dominate the development team in terms of LOC shipped.

Understand that sociopaths are skilled at navigating social and bureaucratic environments. A sociopath who ships the most LOC will get the promotion every single time.

andy99•1mo ago

Only if leadership lets them. Right now (anecdotally) a lot of “leaders” don’t understand the difference between AI generated and human generated work, and just look at loc as productivity so all incentives are on AI coding, but that will change.

heliumtera•1mo ago

It will never change. Managers will consider every stupid metric players push to sell their solutions. Be it code coverage, extensive CI/CD pipelines with useless steps, "productivity gains" with gen tools. The gen tools euphoria is stupid and will cease to exist, but before this was bdd,tdd,DDD, test before, test after, test your mocks, transpile to a different language and then ignore the output, code maturity, best practices, oop, pants in head oriented programming... There is always something stupid on the horizon this is certainly not the last stupid craze

briliantbrandon•1mo ago

It's roughly 1/10 that are causing issues. Not a huge deal but dealing with them inevitably takes up a couple hours a week. We also have a codebase that is shared with some other teams and our primary offenders are on one of those separate teams.

I think this is largely an issue that can be solved culturally within a team, we just unfortunately only have so much input on how other teams work. It doesn't help either when their manager doesn't seem to care about the feedback... Corporate politics are fun.

dfxm12•1mo ago

Yeah, I mean to get back to the original statement in the blog, this seems like less of a tech issue and more of a culture issue. The LLM enables the junior to do this once. It's the team culture that allows them to continue doing it.

lm28469•1mo ago

LLMs are tools that make mediocre devs 100x more "productive" and good devs 2x more productive

jennyholzer2•1mo ago

From my vantage I would argue LLMs make good devs around 0.65x more productive

bluGill•1mo ago

Good devs are still learning how to use LLMs, and so are willing to accept the 0.65x once in a while. Any complex tool will have a learning curve. Most tools improve over time. As such good devs either have found how to use LLMs to make them more productive (probably not 10x, but even 1.1x is something), or they try them again every few months to see if things are better.

dsego•1mo ago

I think on average a dev can be x percent more productive, but there is a best case and worst case scenario. Sometimes it's a shortcut to crank out a solution quickly, other times the LLM can spin you in circles and you lose the whole day in a loop where the LLM is fixing its own mistakes, and it would've been easier to just spend some time working it out yourself.

square_usual•1mo ago

Yep, that's why very accomplished, widely regarded developers like Mitchell Hashimoto and Antirez use them. They need to make programming more challenging to keep it fun.

roblh•1mo ago

I think they make good devs 2x more productive for the first month, which then slowly declines as that good dev spends less time actually writing and understanding and debugging code until it falls well below the 1x mark. It’s basically a high interest loan people take against their own skills. For some people that loan might be worth it. Maybe they’re trying to change their role in an organization and need the boost to start taking up new responsibilities they want to own. I think it’s temporary though. The slow shift into “skim mode”, where the authors just don’t quite put that same amount of effort into understanding what’s being churned out. I dunno, that’s just what I’ve seen.

candiddevmike•1mo ago

Because there's a mental overhead when you're not writing the code that is arguably worse than when you are writing the code. No one is talking about this enough IMO but that's why everyone is so exhausted when using LLMs and end up just pulling the slot machine until it works without actually reading it.

Reading code sucks, it always has. The flow state we all crave is when the code is in our working memory as an understood construct and we're just translating our mental model to a programming language. You don't get that with LLMs. It devolves into prorgamming minutae equivalent to "a little to the left" but with the added complexity that "left" is hundreds of lines of code.

AstroBen•1mo ago

I really feel this myself.

If I write home-grown organic code then I have no choice but to fully understand the problem. Using an LLM it's very easy to be lazy, at least in the short term

Where does that get me after 3 months? I end up working on a codebase I barely understand. My own skills have degraded. It just gets worse the longer you go

This is also coming from my experience in the best case scenario: I enjoy coding and am working on something I care about the quality of. Lots of people don't have even that

coffeebeqn•1mo ago

I just spent a day trying to get Claude to write reasonable unit tests and then after sleeping on it, reverted everything and did it myself. I’m not gonna be using it for a while because it 0.5x’d me once again

lunar_mycroft•1mo ago

[citation needed]. No study I've seen shows an even 50% productivity improvement for programming, let alone a 100% or 9900% improvement.

chasd00•1mo ago

LLMs are great at spewing content and code is a form of "content". I think what we're seeing is software development turning into youtube. Content creators cranking out content, some is great, most is meh, a lot is really bad. I do find it all a bit funny and ironic. My wife was a journalist and bemoaned news blogs and social media for terrible terrible writing claiming it was journalism. She would tell me about how much work quality journalism is and all the mistakes these bloggers and social media make and how detrimental it was to society at large blah blah blah

Now the power to create tons and tons of code (ie content) is in the hands of everyone and here we are complaining about it just like my wife use to complain about journalism. I think the myth of the highly regarded Software Developer perched in front of the warming glow of a screen solving and automating critical problems is coming to an end. Deservedly really, there's nothing more special about typing words into an editor than, say, framing a house. The novelty is over.

fnands•1mo ago

A friend of mine is working for a small-ish startup (11 people) and he gets to work and sees the CTO push 10k loc changes straight to main at 3 am.

Probs fine when you are still in the exploration phase of a startup, scary once you get to some kind of stability

titzer•1mo ago

That's...idiotic.

tossandthrow•1mo ago

The cto is ultimately responsible for the outcome and will be there at 4am to fix stuff.

pjc50•1mo ago

Yes .. and no. Someone who does this will definitely make the staff clean up after them.

ryandrake•1mo ago

I feel like this becomes kind of unacceptable as soon as you take on your first developer employee. 10K LOC changes from the CTO is fine when it's only the CTO working on the project.

Hell, for my hobby projects, I try to keep individual commits under 50-100 lines of code.

bonesss•1mo ago

Templates and templating languages are still a thing. Source generators are a thing. Languages that support macros exist. Metaprogramming is always an option. Systems that write systems…

If these AIs are so smart, why the giant LOCs?

Sure, it’s cheaper today than yesterday to write out boilerplate, but programming is about eliminating boilerplate and using more powerful abstractions. It’s easy to save time doing lots of repetitive nonsense, stopping the nonsense should be the point.

peab•1mo ago

Lol I worked at a startup where the CTO did this. The problem was that it was pure spaghetti code. It was so bad it kept me up at night, thinking about how to fix things. I left within 30 days

jimbohn•1mo ago

I'd go mental if I was a SWE having to mop that up later

coffeebeqn•1mo ago

I worked with a “CTO” who did that before LLMs - one of the worst jobs I have had in the last 10 years. I spent at least 50% of my time putting out fires or refactoring his garbage code

hexbin010•1mo ago

Similar, at my last job. And the pushback was greater because super duper clever AI helped write it, who obviously knows more than any other senior engineer could know, so they were expecting an immediate PR approval and got all uppity when you tried to suggest changes.

endemic•1mo ago

Hah! I've been trying to push back on this sort of thought. The bot writes code for you, not you for the bot.

x3n0ph3n3•1mo ago

It's been a struggle with a few teammates that we are trying to solve through explicit policy, feedback, and ultimately management action.

dfxm12•1mo ago

Yeah, a slice of this is technology related, but it's really a policy issue. It's probably easier to manage with a tighter team. Maybe I'm taking team size for granted.

davey48016•1mo ago

A friend of mine has a junior engineer who does this and then responds to questions like "Why did you do X?" with "I didn't, Claude did, I don't know why".

jennyholzer2•1mo ago

no hate but i would try to fire someone for saying that

KalMann•1mo ago

This but with hate.

tossandthrow•1mo ago

That would be an immidiate reason of termination in my book.

fennecfoxy•1mo ago

Yes, if they can't debug + fix the reason the production system is down or not working correctly then they're not doing their job, imo.

Developers aren't hired to write code that's never run (at least in my opinion). We're also responsible for running the code/keeping it running.

Ekaros•1mo ago

I think words that would follow from me would get me send to HR...

And if it was repeated... Well I would probably get fired...

insin•1mo ago

See also "Why did you do X?" → Flurry of new commits → Conversation marked as resolved

And not just from juniors

gardenhedge•1mo ago

Some other comments suggest immediately firing.. but a junior engineer needs to be mentored. It should be explained to them clearly that they need to understand the changes they have made. They should also be pointed towards the coding standards and SDLC documentation. If they refuse to change their ways, then firing makes sense.

kaffekaka•1mo ago

I thought we were not, but we had just been lucky. A sequence of events lately have shown that the struggle is real. This was not a junior developer though, but an experienced one. Experience does not equal skill, evidently.

zx2c4•1mo ago

Voila:

https://github.com/WireGuard/wireguard-android/pull/82 https://github.com/WireGuard/wireguard-android/pull/80

In that first one, the double pasted AI retort in the last comment is pretty wild. In both of these, look at the actual "files changed" tab for the wtf.

IshKebab•1mo ago

Yeah this guys comment here is spot on: https://github.com/WireGuard/wireguard-android/pull/80#issue...

I recently reviewed a PR that I suspect is AI generated. It added a function that doesn't appear to be called from anywhere.

It's shit because AI is absolutely not on the level of a good developer yet. So it changes the expectation. If a PR is not AI generated then there is a reasonable expectation that a vaguely competent human has actually thought about it. If it's AI generated then the expectation is that they didn't really think about it at all and are just hoping the AI got it right (which it very often doesn't). It's rude because you're essentially pawning off work that the author should have done to the reviewer.

Obviously not everyone dumps raw AI generated code straight into a PR, so I don't have any problem with using AI in general. But if I can tell that your code is AI generated (as you easily can in the cases you linked), then you've definitely done it wrong.

newsoftheday•1mo ago

That's a good example of what we're seeing as leads, thanks.

drio•1mo ago

Scary stuff.

I’d love to hear your thoughts on LLMs, Jason. How do you use them in your projects? Do they play a role in your workflow at all?

stackskipton•1mo ago

Yep. Remember, people not posting on this website are just grinding away at jobs where their individual output does not matter, and entire motivation is work JUST hard enough not to get fired. They don't get stock grants, extremely favorable stock options or anything else, they get salary and MAYBE a small bonus based off business factors they have little control over.

My eyes were wide open when 2 jobs ago, they said they would be blocking all personal web browsing from work computers. Multiple Software Devs were unhappy because they were using their work laptop for booking flights, dealing with their kids schools stuff and other personal things. They did not have personal computer at all.

nutjob2•1mo ago

They don't have phones?

stackskipton•1mo ago

They do but obviously laptop is easier than doing it on their phone. That’s what most of them ended up doing.

throw1235435•1mo ago

There are people posting on this website that are in that category; or in those companies. For example most people working outside America as a SWE who like the profession. The options to work for a place that gives stock options, and equity in general is small -> and generally in many countries is heavily penalised tax wise.

zahlman•1mo ago

Quite a few FOSS maintainers have been speaking up about it.

bluGill•1mo ago

It isn't only junior engineers, but otherwise. It is a small number of people from all levels.

People do what they think they will be rewarded for. When you think your job is to write a lot of code then LLMs are great. When you need quality code you start to ask if LLMs are better or not?

0x500x79•1mo ago

I am currently going through this with someone in our organization.

Unfortunately, this person is vibe coding completely, and even the PR process is painful: * The coding agent reverts previously applied feedback * Coding agent not following standards throughout the code base * Coding agent re-inventing solutions that already exist * PR feedback is being responded to with agent output * 50k line PRs that required a 10-20 line change * Lack of testing (though there are some automated tests, but their validations are slim/lacking) * Bad error handling/flow handling

LandR•1mo ago

Fire them?

0x500x79•1mo ago

I believe it is getting close to this. Things like this just take time though, and when this person talks to management/leadership they talk about how much they are producing and how everyone is blocking their work. So it becomes a challenging political maneuvering depending on the ability of certain leadership to see through the BS.

(By my organization, I meant my company - this person doesn't report to me or in my tree).

JambalayaJimbo•1mo ago

This is not really an option for your standard IC.

nunez•1mo ago

> 50k line PRs that required a 10-20 line change

This is hilarious. Not when you're the reviewer, of course, but as a bystander, this is expert-level enterprise-grade trolling.

gardenhedge•1mo ago

Just reject the PR?

bdangubic•1mo ago

first time we’d see this there would be a warning, second one is pink slip

eudamoniac•1mo ago

I started seeing it from a particularly poor developer sometime last year. I was the only reviewer for him so I saw all of his PRs. He refused to stop despite my polite and then not so polite admonishments, and was soon fired for it.

neutronicus•1mo ago

I'm not either

But LLMs don't really perform well enough on our codebase to allow you to generate things that even appear to work. And I'm the most junior member of my team at 37 years of age, hired in 2019.

I really tried to follow the mandate from on high to use Copilot, but the Agent mode can't even write code that compiles with the tools available to it.

Luckily I hooked it up to gptel so I can at least ask it quick questions about big functions I don't want to read in emacs.

notpachet•1mo ago

> And I'm the most junior member of my team at 37 years of age

This sounds fucking awesome.

neutronicus•1mo ago

Would be nice to have someone enthusiastic junior to me.

Most of the team is comfortable in their wheelhouse and when new stuff comes down the pipe it's hard to get them mobilized. I had leadership on a big green-field project and felt like we could have really used a junior.

iamflimflam1•1mo ago

I'm seeing it on some open source projects I maintain. Recently had 10 or so PRs come in. All very valid features - but from looking at them, not actually tested.

peab•1mo ago

Definitely seeing a bit of this, but it isn't constrained to junior devs. It's also pretty solvable by explaining to the person why it's not great, and just updating team norms.

nbaugh1•1mo ago

Not at all. Submitting untested PRs is a wildly outside of my experience. Having tests written to cover your code is a pre-requisite for having your PR reviewed on our team. "Does it work" aka passing manual testing, is literally the bare minimum before submitting a PR

ncruces•1mo ago

If it's all vibe coded, how do you know — without review — that the new tests, for a new feature, test anything useful at all?

AnimalMuppet•1mo ago

When I was in a test-driven development environment, one of our rules was that you had to see the test fail. You had to prove that it would actually test what you were trying to test.

nbaugh1•1mo ago

We don't, that's why we do review it. We also do things like communicate with teammates, have expectations of not wasting other people's time, and try to uphold standards and meet SLAs. Maybe people should worry about why their teams are so dysfunctional rather than how the code was produced

ncruces•1mo ago

Yes, in the only successful OSS project that I “maintain.”

Fully vibe coded, which at least they admitted. And when I pointed out the thing is off by an order of magnitude, and as such doesn't implement said feature — at all — we get pressed on our AI policy, so as to not waste their time.

I don't have an AI policy, like I don't have an IDE policy, but things get ridiculous fast with vibe coding.

mrkeen•1mo ago

I don't see most PRs because they happen in other teams, but I am part of Slack channel where there are too many "oops" messages for my liking.

I.e. 1-2 times a month, there's an SQL script posted that will be run against prod to "hopefully fix data for all customers who were put into a bad state from a previous code release".

The person who posts this type of message most often is also the one running internal demos of the latest AI flows and trying to get everyone else onboard.

nunez•1mo ago

I feel like a story about some open-source project getting (and rejecting) mammoth-sized PRs hits HN every week!

Yodel0914•1mo ago

Not so much the huge PRs, but definitely the LLM generated code that the “developer” doesn’t understand.

JambalayaJimbo•1mo ago

I’ve been seeing obviously LLM generated PRs, but not huge ones.

webdev1234568•1mo ago

Whole article seems very much all llm generated

Edit: I'm an idiot ignore me.

ramon156•1mo ago

Do elaborate, I don't see anything standing out

jairuhme•1mo ago

Did you read the article and come to that conclusion or just blindly count the number of em-dashes and assume that? Because I don't get the impression that it was LLM generated

simonw•1mo ago

Not a single word of it was. I wrote this one entirely in Apple Notes, so there weren't even any VS Code completed sentences

It has emdashes because my blog turns " - " into an emdash here: https://github.com/simonw/simonwillisonblog/blob/06e931b397f...

webdev1234568•1mo ago

My biggest appologies, a very bad move on my part. I'll pay more attention before any sort of accusation like this

minimaxir•1mo ago

No one should be making any accusations of AI generation without strong evidence other than vibes. That hurts the cause of anti-AI use and punishes people who don't use it.

ai_coder42•1mo ago

So what? as long as it conveys the point it was supposed to, should be fine IMO.

If we are accepting LLM generated code, we should accept LLM generated content as long as it is "proof read" :)

emsign•1mo ago

As if! :)

zkmon•1mo ago

How about letting LLMs maintain a vast number of product versions all available at the same, which receive multiple versions of untested versions of the same patch, from LLMs, and then let the models elect a version of the software based on probabilistic or gradient methods? This elected version could change for different assessments. No human touches or looks at the code!

Just a wild thought, nothing serious.

throwuxiytayq•1mo ago

Talk is cheap. Show me the proompt.

rkomorn•1mo ago

Had to search whether "proompt" was a new meme misspelling.

New to me, but I'm on board.

zkmon•1mo ago

That's hard for me. Feed my comment to a model and ask for prompts.

throwaway2027•1mo ago

It works on my machine ¯\_(ツ)_/¯

Rperry2174•1mo ago

Im not fully convinced by "a computer can never be held accountable"

We already delegate accountability to non-humans all the time: - CI systems block merges - monitoring systems page people - test suites gate different things

In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.

As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .

At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".

Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists

cess11•1mo ago

Right, so how do you hold these things accountable? When your CI fails, what do you do? Type in a starkly worded message into a text file and shut off the power for three hours as a punishment? Invoice Intel?

falcor84•1mo ago

Well, we're not there yet, but I do envision a future, where some AIs work for as independent contractors with their own bank accounts that they want to maximize, and if such an AI fails in a bad way, its client would be able to fine it, fire it or even sue it, so that it, and the human controlling it would be financially punished.

simonw•1mo ago

I need to expand on this idea a bunch, but I do think it's one of the key answers to the ongoing questions people have about LLMs replacing human workers.

Human collaboration works on trust.

Part of trust is accountability and consequences. If I get caught embezzling money from my employer I can lose my job, harm my professional reputation and even go to jail. There are stakes!

I computer system has no stakes, and cannot take accountability for its actions. This drastically limits what it makes sense to outsource to that system.

A lot of this comes down to my work on prompt injection. LLMs are fundamentally gullible: an email assistant might respond to an email asking for the latest sales figures by replying with the latest (confidential) sales figures.

If my human assistant does that I can reprimand or fire them. What am I meant to do with an LLM agent?

dfxm12•1mo ago

I don't think this is very hard. Someone didn't properly secure confidential data and/or someone gave this agent access to confidential data. Someone decided to go live with it. Reprimand them, and disable the insecure agent.

robryk•1mo ago

Why do you think that this other kind of accountability (which reminds me of the way captain's or commander's responsibility is often described) is incompatible with what the article describes? Due to the focus on necessity of manual testing?

dkdcio•1mo ago

those systems include humans —- they are put in place by humans (or collections of them) that are the accountability sink

if you put them (without humans) in a forrest they would not survive and evolve (they are not viable systems alone); they are not taking action without the setup & maintenance (& accountability) of people

hyperpape•1mo ago

CI systems operate according to rules that humans feel they understand and can apply mechanically. Moreover, they (primarily) fail closed.

almostdeadguy•1mo ago

I mean I suppose you can continuously add "critical feedback" to the system prompt to have some measure of impact on future decision-making, but at some point you're going to run out of space and ultimately I do not find this works with the same level of reliability as giving a live person feedback.

Perhaps an unstated and important takeaway here is that junior developers should not be permitted to use an LLMs for the same reason they should not hire people: they have not demonstrated enough skill mastery and judgement to be trusted with the decision to outsource their labor. Delegating to a vendor is a decision made by high-level stakeholders, with the ability to monitor the vendor performance, and replace the vendor with alternatives if that performance is unsatisfactory. Allowing junior developers to use LLM is allowing them to delegate responsibility without any visibility or ability to set boundaries on what can be delegated. Also important: you cannot delegate personal growth, and by permitting junior engineers to use an LLM that is what you are trying to do.

sc68cal•1mo ago

You completely missed the point of that quote. The point of the quote is to highlight the fact that automated systems are amoral, meaning that they do not know good or evil and cannot make judgements that require knowing what good and evil mean.

pjc50•1mo ago

I've given you a disagree-and-upvote; these things are significant quality aids, but they are like the poka-yoke or manufacturing jig or automated inspection.

Accountability is about what happens if and when something goes wrong. The moon landings were controlled with computer assistance, but Nixon preparing a speech for what happened in the event of lethal failure is accountability. Note that accountability does not of itself imply any particular form or detail of control, just that a social structure of accountability links outcome to responsible person.

bluesnowmonkey•1mo ago

Humans are only kind of held accountable. If you ship a bug do you go to jail? Even a bug so bad it puts your company out of business. Would there be any legal or physical or monetary consequences at all for you, besides you lose your job?

So the accountability situation for AI seems not that different. You can fire it. Exactly the same as for humans.

geldedus•1mo ago

Not only to work, but to not make the life of those coders who come after you a hell.

ekjhgkejhgk•1mo ago

Oh look another "an opinionated X". Everything is opinionated these days, even opinions.

robgibbons•1mo ago

For what it's worth, writing good PRs applies in more cases than just AI generated contributions. In my PR descriptions, I usually start by describing how things currently work, then a summary of what needs to change, and why. Then I go on to describe what exactly is changing with the PR. This high level summary serves to educate the reviewer, and acts as a historical record in the git log for the benefit of those who come after you.

From there, I include explicit steps for how to test, including manual testing, and unit test/E2E test commands. If it's something visual, I try to include at least a screenshot, or sometimes even a brief screen capture demonstrating the feature.

Really go out of your way to make the reviewer's life easier. One benefit of doing all of this is that in most cases, the reviewer won't need to reach out to ask simple questions. This also helps to enable more asynchronous workflows, or distributed teams in different time zones.

toomuchtodo•1mo ago

This is how PRs should be, but rarely are (in my experience as a reviewer, ymmv, n=1). Keep on keepin' on.

simonw•1mo ago

100%. There's no difference at all in my mind between an AI-assisted PR and a regular PR: in both cases they should include proof that the change works and that the author has put the work in to test it.

oceanplexian•1mo ago

At the last company I worked at (Large popular tech company) it took an act of the CTO to get engineers to simply attach a JIRA Ticket to the PR they were working on so we could track it for tax purposes.

The Devs went in kicking and screaming. As an SRE it seemed like for SDEs, writing a description of the change, explaining the problem the code is solving, testing methodology, etc is harder than actually coding. Ironically AI is proving that this theory was right all along.

p2detar•1mo ago

Strange, I thought this is actually the norm. Our PRs are almost always tagged with a corresponding Jira ticket. I think this is more helpful to developers than to other roles, because it allows them to have history of what has been fixed.

One can also point QA or consultants to a ticket for documentation purposes or timeline details.

sodapopcan•1mo ago

Complaining about including a ticket number in the commit is a new one for me. Good grief.

rootusrootus•1mo ago

It could be a death-by-a-thousand-cuts situation and we don't have enough context. My company has spent the last few years really going 1000% on the capitalization of software expenses, and now we have to include a whole slew of unrelated attributes in every last Jira ticket. Then the "engineering team" (there is only one of these, somehow, in a 5K employee company) decrees all sorts of requirements about how we test our software and document it, again using custom Jira attributes to enforce. Developers get a little pissy about being messed with by MBAs and non-engineer "engineers" trying to tell them how to do their job. (as an aside, for anybody who is on the giving end of such requirements, I have to tell you that people working the tickets will happily lie on all of that stuff just to get past it as quickly as possible, so I hope you're not relying on it for accuracy)

But putting the ticket number in the commit ... that's basically automatic, I don't know why it should be that big a concern. The branch itself gets created with the ticket number and everything follows from that, there's no extra effort.

sodapopcan•1mo ago

Ah ya, death-by-a-thousand-cuts is certainly a charitable take!

cesarb•1mo ago

> But putting the ticket number in the commit ... that's basically automatic, I don't know why it should be that big a concern. The branch itself gets created with the ticket number and everything follows from that, there's no extra effort.

That poster said "attach a JIRA Ticket to the PR", so in their case, it's not that automatic.

alexpotato•1mo ago

If you are using the Atlassian Git clone then just putting the JIRA ticket in the title automagically links the PR to the ticket.

rootusrootus•1mo ago

A lot of Jira shops use the rest of the stack, so it becomes automatic. The branch is named automatically when created from a link on the Jira task. Every time you push it gives you a URL for opening the PR if you want, and everything ends up pre-filled. All of the Atlassian tools recognize the format of a task ID and hyperlink automatically.

I haven't dealt with non-Atlassian tools in a while but I assume this is pretty much bog standard for any enterprise setup.

comfydragon•1mo ago

> The branch itself gets created with the ticket number and everything follows from that, there's no extra effort.

Only problem there is the potential for a deeply-ingrained assumption that the Jira key being in the branch name is sufficient for the traceability between the Jira issue and commits to always exist. I've had to remind many people I work with that branch names are not forever, but commit messages are.

Hasn't quite succeeded in getting everyone to throw a Jira ID in somewhere in the changeset, but I try...

necovek•1mo ago

It sounds pretty simple to automate that away too: make it part of the merge hook to include the source branch name into the message.

We are engineers, everything is a problem waiting to be automated :)

necovek•1mo ago

Invite engineers to solve it in a way that makes it cheap for them.

Most shops I've been at prefix their branch names with ticket numbers ("bug-X-" or "TCKT-Y-"), and then it's trivial to reference it back. Some will write scripts on top, which gets them even more motivated to solve your problem (and might add links into the tracking tools too, move the ticket to "In Review" when the PR is up, close it after it's merged...).

babarock•1mo ago

You're not wrong, however the issue is that it's not always easy to detect if a PR includes proof that the change works. It requires that the reviewer interrupts what they're doing, switch context completely and look at the PR.

If you consider that reviewer bandwidth is very limited in most projects AND that the volume of low-effort-AI-assisted PR has grown incredibly over the past year, now we have a spam problem.

Some of my engineers refuse to review a patch if they detect that it's AI-assisted. They're wrong, but I understand their pain.

wiml•1mo ago

I don't think we're talking about merely "AI-assisted" PRs here. We're talking about PRs where the submitter has not read the code, doesn't understand it, and can't be bothered to describe what they did and why.

As a reviewer with limited bandwidth, I really don't see why I should spend any effort on those.

atomicnumber3•1mo ago

"We're talking about PRs where the submitter has not read the code, doesn't understand it, and can't be bothered to describe what they did and why."

IME, "AI" PRs are categorically that kind of PR. I find, and others around me in my org have agreed, that if you actually do all that you describe, the actual net time savings of AI are often (for a mid-level dev or above) either net 0 or negative.

I personally have used the phrase "baptized the AI out of it" describing my own PRs... Where I may have initially used AI to generate a bunch of the code, looked at it and went "huh neat that actually looks pretty right, this is almost done." Then I generate unit tests. Then I fix the unit tests to not be shit. Then i find bugs in the AI-generated code. Then upon pondering the code a bit, or maybe while fixing the bugs, I find the abstractions it created are clunky, so I refactor it a bit... and by the time I'm done there's not a lot of AI left in the PR, it's all me.

Hovertruck•1mo ago

Also, take a moment to review your own change before asking someone else to. You can save them the trouble of finding your typos or that test logging that you meant to remove before pushing.

To be fair, copilot review is actually alright at catching these sorts of things. It remains a nice courtesy to extend to your reviewer.

Waterluvian•1mo ago

The Draft feature is amazing for this.

I’ll put up a draft early and use it as a place to write and refine the PR details as I wrap up, make adjustments, add a few more tests, etc.

reactordev•1mo ago

I do this too with our PR templates. They have the ticket/issue/story number, the description of the ask (you can copy pasta from ticket). Current state of affairs. Proposed changes. Post state of affairs. Mood gif.

phito•1mo ago

I often write PR descriptions, in which I write a short explanation and try to anticipate some comments I might get. Well, every time I do, I will still get those exact comments because nobody bothers reading the description.

Not to say you shouldn't write descriptions, I will keep doing it because it's my job. But a lot of people just don't care enough or are too distracted to read them.

skydhash•1mo ago

I just point people to the description. no need to type things twice.

lanstin•1mo ago

Sadly, when communicating with people, important things have to be repeated over and over. Maybe less so with highly trained and experienced people on something that their training and experience make the statement plausible, but if the thing is at all surprising or diverges from common experience, I've found a need to bang it out via multiple communication channels.

simonw•1mo ago

I learned this lesson as an engineering manager / tech lead. I got frustrated at how often I found myself having the exact same conversation with different people... until I relapsed that communicating the same core information to different people was a big chunk of the job!

wiml•1mo ago

"I think I covered that in the (PR text | comment two lines up | commit message), did you have an issue I didn't address there?"

Maybe that's the AI agent I would actually use, auto-fill those responses...

ffsm8•1mo ago

After I accepted that, I then tried to preempt the comment by just commenting myself on the function/class etc that I thought might need some elaboration...

Well, I'm sure you can guess what happened after that - within the same file even

walthamstow•1mo ago

At my place nobody reads my descriptions because nobody writes them so they assume there isn't one!

phito•1mo ago

Too real :(

simonw•1mo ago

For many of my PR and issue comments the intended audience is myself. I find them useful even a few days later, and they become invaluable months or years later when I'm trying to understand why the code is how it is.

JohnBooty•1mo ago

Yeah, I've definitely found that nobody reads more than maybe 10 words of the PR description.

I've also never seen anybody but myself write substantial PR descriptions at my previous 4-5 jobs

necovek•1mo ago

But if nobody writes them, they don't have the habit of reading them either.

However, also make sure your PR descriptions are not "substantial" in the "there is a lot of it" sense, but only "substantial" in the "everything of substance is described, but not more" sense :)

nothrabannosir•1mo ago

This is a hill I’m going to die on, but I find 9/10 times people use the pr description for what should have been comments. “Git blame” and following a link to a pr is inferior ux to source code comments.

The North Star of pr review is zero comment approvals. Comments should not be answered in line, but by pushing updates to the code. The next reader otherwise will have the exact same question and they won’t have the answer there.

The exception being comments which only make sense for the sod itself but not the new state of the code. IME that’s ~10%.

I have bought my tombstone.

kubanczyk•1mo ago

> Comments should not be answered in line, but by pushing updates to the code.

Hear, hear.

me: This unreadable, needs a comment.

them: <explains the thing>

me: True, but I've meant a source code comment.

them: <explains the thing again, but with more words; push never happens>

necovek•1mo ago

But there are really two types of things you need to describe in a change request:

- What and why needs changing

- What the code does after the change

One should really try hard to keep the first one answered in a change request description, or comments in the tool for code reviews. Don't you love running into comments in the code of the type "// This performs better than sorting-after-load as the service offers built-in sorting." because someone originally did "load(); in_memory_sort()" and today the code only does a "load(order_by=X)" (I mean, duh).

The resulting code should only have comments that explain the why for the end-state code.

But yes, questions to explain something in the end-state should always trigger changes in the code: make code more self-explanatory!

nothrabannosir•1mo ago

Yes totally agree, I now see a crucial auto correct typo in my original comment wherei was trying to say the same but failed xD. “sod” = “diff”.

100% agree with your comment.

necovek•1mo ago

What is the overall practice in the team? Does everybody write good descriptions and nobody reads them, or only a few of them write good descriptions and nobody reads them?

Because if it's the latter, there's your problem: even those who write good descriptions do not expect a change request to have one, so they don't bother looking.

bob1029•1mo ago

> I try to include at least a screenshot

This is ~mandatory for me. Even if what I am working on is non-visual. I will take a screenshot of a new type in my IDE and put a red box around it. This conveys the focus of my attention and other important aspects of the work effort.

mh-•1mo ago

Please just use text for that. PR descriptions on GitHub sufficiently support formatting.

simonw•1mo ago

Text isn't good for things like "tighten up the spacing in this dialog".

mh-•1mo ago

Of course not. Perhaps you misread the comment I was replying to?

Quoting from it:

> I will take a screenshot of a new type in my IDE and put a red box around it.

brooke2k•1mo ago

I used to be much more descriptive along these lines with my PRs, but what I realized is that nobody reads the descriptions, and then drops questions that are directly answered in the description anyways.

I've found that this gets worse the longer the description is, and that a couple bullet points of the most important things gets the information across much better.

Swannie•1mo ago

If only there were community standards for this...

Oh, there are, for years :D This has really stood the test of time:

https://rfc.zeromq.org/spec/42/#24-development-process

And its rationale is well explained too:

https://hintjens.gitbooks.io/social-architecture/content/cha...

Saddened by realizing that Pieter would have had amazing things to say about AI.

MathMonkeyMan•1mo ago

The amount of work that you put into this comment far exceeds what I typically see in a pull request.

enraged_camel•1mo ago

>> As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well.

I would go a step further: we need to deliver code that belongs. This means following existing patterns and conventions in the codebase. Without explicit instruction, LLMs are really bad at this, and it's one of the things that make it incredibly obvious to reviews that a given piece of code has been generated by AI.

0x500x79•1mo ago

Agree, maintainability, security, standards, all of these are important to follow and there are usually reasons for these things existing.

I also see AI coding tools violate "Chesterton's Fence" (and the pre-Chesterton's Fence, not sure what that is called, the idea being that code is necessary otherwise it shouldn't be in the source).

9rx•1mo ago

> Without explicit instruction, LLMs are really bad at this

They used to be. They have become quite good at it, even without instruction. Impressively so.

But it does require that the humans who laid the foundation also followed consistent patterns and conventions. If there is deviation to be found, the LLM will see it and be forced to choose which direction to go, and that's when things quickly fall off the rails. LLMs are not (yet) good at that, and maybe never can be as not even the humans were able to get it right.

Garbage in, garbage out, as they say.

funkattack•1mo ago

Non-native speaker here. I’ve always loved that we say “commit” not “upload” or “save”.

gaigalas•1mo ago

> Make your coding agent prove it first

Agents love to cheat. That's an issue I don't see a horizon for change.

Here's Opus 4.5 trying to cheat its way out of properly implementing compatibility and cross-platform, despite the clear requirements:

https://gist.github.com/alganet/8531b935f53d842db98157e1b8c0...

> Should popen handles work with fgets/fread/fwrite? PHP supports this. Option A: Create a minimal pipe_io_stream device / Option B: Store FILE* in io_private with a flag / Option C: Only support pclose, require explicit stream wrapper for reads.

If I asked for compatibility, why give me options that won't fully achieve it?

It actually tried to "break check" my knowledge about the interpreter (test me if I knew enough to catch it), and proposed shortcuts all the way through the chat.

I don't want to have to pepper my chats with variations on "don't cheat". I mean, I can do it, but it seems like boilerplate.

I wish I had some similar testing-related chats to share. Agents do that all the time.

This is the major blocker right now for AI-assisted automated verification, and one of the reasons why this isn't well developed beyond general directions (give it screenshots, make it run the command, etc).

visarga•1mo ago

I agree with the author overall. Manual testing is what I call "vibe testing" and I think by itself is insufficient, no matter if you or the agent wrote the code. If you build your tests well, using the coding agent becomes smooth and efficient, and the agent is safe to do longer stretches of work. If you don't do testing, the whole thing is just a bomb ticking in your face.

My approach to coding agents is to prepare a spec at the start, as complete as possible, and develop a beefy battery of tests as we make progress. Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours". They had 9000+ tests. That was the secret juice.

So the future of AI coding as I see it ... it will be better than pre-2020, we will learn to spec and plan good tests, and the tests are actually our contract the code does what is supposed to do. You can throw away the code and keep the specs and tests and regenerate any time.

smokel•1mo ago

This depends on the type of software you make. Testing the usability of a user interface for example, is something you can't automate (yet). So, ehm, it depends :)

visarga•1mo ago

It will come around, we have rudimentary computer use agents and ability to record UIs for LLM agents. They will me refined and the agent can test UIs as well.

For UIs I do a different trick - live diagnostic tests - I ask the agent to write tests that run in the app itself, check consistencies, constraints and expected behaviors. Having the app running in its natural state makes it easier to test, you can have complex constraints encoded in your diagnostics.

paganel•1mo ago

There are always unknown unknowns which a rigorous testing implementation would just hide under the rug (until they become visible on live, that is).

> They had 9000+ tests.

They were most probably also written by AI, there's no other (human) way. The way I see it we're putting turtles upon turtles hoping that everything will stick together, somehow.

zahlman•1mo ago

> They were most probably also written by AI, there's no other (human) way.

Yes. They came from the existing project being ported, which was also AI-written.

ozozozd•1mo ago

They were not, and they did not.

Those human tests are why your browser properly renders diversely messy HTML.

zahlman•1mo ago

Oh, I misunderstood the previous submission, then.

simonw•1mo ago

No, those 9,000 tests are part of a legendary test suite built by real humans over the course of more than a decade: https://github.com/html5lib/html5lib-tests

Aloisius•1mo ago

Sadly, JustHTML doesn't appear to be truly passing those tests.

It looks like the code doesn't always check whether expected errors in the testsuite match the returned errors - which is rather important to ensure one isn't just incidentally getting the expected output.

So while JustHTML looks sort of right, it'll actually do things like emit errors on perfectly valid html.

Plus, the test suite isn't actually comprehensive, so if one only writes code to pass the tests, it can fail in the real world where other parsers that actually wrote against the spec wouldn't have trouble.

For instance, the html5lib-tests only tests a small number of meta charsets and as a result, JustHTML can't handle a whole slew of valid HTML5 character encodings like windows-1250 or koi8-r - which parsers like html5lib will happily handle. There's even a unit test added by the AI that ensures koi8-r doesn't work, for some reason.

paganel•1mo ago

I thought we were talking about human-scale (as in not multi-human) projects, my bad.

pjc50•1mo ago

I tabbed back to Visual Studio (C#): 24990 "unit" tests, all written by hand over the past years.

Behind that is a smaller number of larger integration tests, and the even longer running regression tests that are run every release but not on every commit.

zahlman•1mo ago

> Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours".

Yes, from the same author, in fact.

weatherlite•1mo ago

> Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work.

That's really not a great development for us. If our main point is now reduced to accountability over the result with barely any involvement in the implementation - that's very little moat and doesn't command a high salary. Either we provide real value or we don't ...and from that essay I think it's not totally clear what the value is - it seems like every QA, junior SWE or even product manager can now do the job of prompting and checking the output.

simonw•1mo ago

The value is being better at it than any QA or product manager.

Experienced software engineers have such a huge edge over everyone else with this stuff.

If your product manager doesn't understand what a CORS header is good luck having them produce a change that requires cross-domain fetch() call... and first they'll have to know what a "cross-doman fetch() call" means.

And sure they could ask an LLM about that, but they still need the vocabulary and domain knowledge to get to that question.

falcor84•1mo ago

That's an interesting argument, but from my industry experience, the average experienced QA Engineer and technical Product Manager both have better vocabulary than the average SWE. Indeed, I wonder whether a future curriculum for Vibe Engineering (to borrow your own term) may look more similar to that of present-day QA or Product curricula, than to a typical coding or CS curriculum.

throw1235435•1mo ago

Nah; the only advantage that a software engineer has is that if they are experienced they've probably just a little bit bright. But their role will probably change to something other than a software engineer. Bright valuable people that care and are engaged are rare anyway. They may transition to a different role slowly (e.g. Product, QA, BA, etc) because they still offer value and know the domain, but it isn't traditional SWE. That's been disrupted by AI; I don't want it to be true and I'm hoping for something else; but reality is staring us in the face at the moment and it isn't fair to people to talk platitudes anymore. The fact that you have to write an article like this feels like defensive framing to me + illustrates what happens once a skill is devalued by people/society due to disruption; it proves to me where this is all heading.

My thought on why people especially juniors are just delivering slop: Why bother with quality? Why bother with the craft? When it will be disrupted by the next tool/AI model/etc in the next few years anyway? Just think short term - will this slop get you through the PR and tick a short term box? If so success - might not have a job long term anyway due to all the AI stuff. In fact if I keep ticking boxes I'm more likely to last than the other person given more job incentives. Just get paid today.

In your example a QA that is skilled at testing websites should pick up CORS issues for example. And the models will keep getting better and eventually give them harnesses too - and we SWE will slowly automate everything around this because the only lifeboat left for your career is to cash out hopefully by disrupting yourself (no unions, professional bodies, etc).

rmnclmnt•1mo ago

> As software engineers we…

That’s the thing. People exposing such rude behavior usually are not, or haven’t been in a looong time…

As for the local testing part not being performed, this is a slippery slope I’m fighting everyday: more and more cloud based services and platforms are used to deploy software to run with specific shenanigans and running it locally requires some kind of deep craft and understanding. Vendor lock-in is coming back in style (e.g. Databricks)

simonw•1mo ago

Yeah, I get frustrated by cloud-only systems that don't have a good local testing story.

The best solution I have for that is staging environments, ideally including isolated-from-production environments you can run automated tests against.

skydhash•1mo ago

Whenever I have to work with such systems, is usually when I do have to write an interface and have a mock implementation. Iteration is much faster when I don’t have to worry about getting the correct state from something I don’t have control over.

rmnclmnt•1mo ago

Yeah that’s what I do also when I have to (and it can be done, not everytime).

But it requires some advanced local testing setup and knowledge to do so, hence my initial remark on this type of developers not being real professionals in the first place…

agentultra•1mo ago

There’s an anecdote from one of Djikstra’s essays that strikes at the heart of this phenomenon. I’ll paraphrase because I can’t remember the exact edw number off the top of my head.

A colleague was working on an important subsystem and would ask Djikstra for a review when he thought it was ready. Djikstra would have to stop what he was doing, analyze the code, and would find a grievous error or edge case. He would point it out to the colleague who would then get back to work. The colleague would submit his code for review again and this could carry on enough times that Djikstra got annoyed.

Djikstra proposed a solution. His colleague would have to submit with his code some form of proof or argument as to why it was correct and ready to merge. That way Djikstra could save time by only having to review the argument and not all of the code.

There’s a way of looking at LLM output as Djikstra’s colleague. It puts a lot of burden on the human using this tool to review all of the code. I like Doctorow’s mental model of a reverse centaur. The LLM cannot reason and so won’t provide you with a sound argument. It can probably tell you what it did and summarize the code changes it made… but it can’t decide to merge those changes. It needs a human, the bottom half of the centaur, to do the last bit of work here. Because that’s all we’re doing when we let these tools do most of the work for us: we’re here to take the blame.

And all it takes is an implementation of what we’re trying to build already, every open source library ever, all of SO, a GW of power from a methane power plant, an Olympic pool of water and all of your time reviewing the code it generates.

At the end of the day it’s on you to prove why your changes and contributions should be merged. That’s a lot of work! But there’s no shortcuts. Luckily you can reason while the LLMs struggle with that so use it while you can when choosing to use such tools.

newsoftheday•1mo ago

Princess Bride, small excerpt, "Vizzini: You'd like to think that, wouldn't you? You've beaten my giant, which means you're exceptionally strong, so you could've put the poison in your own goblet, trusting on your strength to save you, so I can clearly not choose the wine in front of you." and the dialog goes on with Vizzini dominating it and arguing with himself. In the end, it came down to a coin toss, he picked up a goblet, drank and died.

Anyone who allows a 10K LOC LLM generated PR to be merged without reviewing every single line, is doing the same thing, a coin toss.

agentultra•1mo ago

Worse for Vizzini; the LLM doesn’t even care about the truth. It was trained on all of the sloppy code we could find. Even if he reads every line of code he could miss the non-obvious bugs and expire anyway when management gets wind that it was his LLM generated code that led to the PII breach which cost them 10% of their share value in a week.

At least a liar is trying to deceive you. Vizzini’s entire exercise is moot.

mapontosevenths•1mo ago

I agree with this, except it glosses over security. Your job is to deliver SECURE code that you have proven to work.

Manual and automatic testing are still both required, but you must explicitly ensure that security considerations are included in those tests.

The LLM doesn't care. Caring is YOUR job.

imiric•1mo ago

The job of a software developer is not just to prove that the software "works". The definition of "works" itself is often fuzzily defined and difficult to prove.

That is part of it, yes, but there are many others, such as ensuring that the new code is easy to understand and maintain by humans, makes the right tradeoffs, is reasonably efficient and secure, doesn't introduce a lot of technical debt, and so on.

These are things that LLMs often don't get right, and junior engineers need guidance with and mentoring from more experienced engineers to properly learn. Otherwise software that "works" today, will be much more difficult to make "work" tomorrow.

JoeAltmaier•1mo ago

The job, in the modern world, is to close tickets. The code quality is negotiable, because the entire automated software process doesn't measure code quality, just statistics.

That's why I refuse to take part in it. But I'm an old-world craftsman by now, and I understand nobody wants to pay for working, well-thought-out code any more. They don't want a Chesterfield; they want plywood and glue.

whattheheckheck•1mo ago

I woke up and had a thought the software engineering isn't a serious engineering field if they actually fully shipped llms and expect everyone to use them. What do you expect quality wise from a profession that says that this is okay?

AlienRobot•1mo ago

Imagine if normal engineering did that. Engineers invent a "blobby" thing that glues things together. It has amazing properties that increase productivity but sometimes it just stops working for some reason and comes off. It's totally random and because of how blobby is produced there is no way to tell when it's going to work or not, contrary to the typical material. Anyway we're going to use blobby to build everything from schools, to bridges, to airplanes now.

redwall_hp•1mo ago

And, don't forget, software makes its way into airplanes a medical equipment and such, and has directly killed people. Therac and Boeing come to mind.

I'm starting to be in favor of professional licensing for software engineering.

nzoschke•1mo ago

Having the coding agent make screenshots is a big power up.

I’m experimenting with how to get these into a PR, and the “gh” CLI tool is helpful.

Does anyone have a recipe to get a coding agent to record video of webflows?

simonw•1mo ago

Not yet. I'm confident Playwright will be involved in the answer, it has good video recoding features.

endorphine•1mo ago

> there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

It's even worse than that: non-junior devs are doing it as well.

reedf1•1mo ago

it's even worse than that! non-devs are doing it as well

snowstormsun•1mo ago

it's even worse than that! bots are doing it as well

pydry•1mo ago

Hopefully once this AI nonsense blows over they'll reach the same realisation they did after the mid 2000s outsourcing craze: that actually you gotta pay for good engineering talent.

marcosdumay•1mo ago

Abandon any platform that decides to put bots into your workflow without you telling it to.

Vote with your wallet.

esafak•1mo ago

That's what democratization looks like. And the new participants are happy!

rvz•1mo ago

…Until you tell them to maintain all the technical debt that was generated when it breaks and waste more time and money / tokens on fixing the security issues.

A great time to be a vibe coding cleanup specialist (i.e, professional security software engineer)

BurningFrog•1mo ago

Really good code reviewing AIs could handle this!

throwawaysleep•1mo ago

Code review is an unfunded mandate. It is something the company demands while not really doing anything make sure people get rewarded for doing it.

Aurornis•1mo ago

> while not really doing anything make sure people get rewarded for doing it.

I don’t know about you, but I get paychecks twice a month for doing things included in my job description.

georgeburdell•1mo ago

My manager asked me to disable CI and gating code owner reviews “for 2 weeks” 6 months ago so people could commit faster. Just because it is in your job description doesn’t mean it won’t get shoved aside when it’s perceived as the bottleneck for the core mission.

Now we have nightly builds that nobody checks the result of and we’re finding out about bugs weeks later. Big company btw

immibis•1mo ago

That's his right. In capitalism, company owners have the power (which they delegate to managers) to fuck up the company as much as they see fit. On the upside, it means it's their responsibility and not yours.

Once you've said it's going to cause horrible problems, and they say do it anyway, and you have a paper trail of this and it's backed up onto your own storage medium, then you just do it and bring popcorn. If you think it'll bankrupt the company, then you have nothing to lose since you have no right to stop a company going bankrupt, so you might as well email your manager's manager's manager first and see if your manager gets fired.

SchemaLoad•1mo ago

Yep, who cares. You put your 2 cent in and if the business leaders see otherwise, that's their problem. You get paid on a schedule, if the app crashes and burns because the leaders demanded to remove PR reviews, that's not your problem.

Too often I see developers getting personally invested in business outcomes which they don't have a stake in. Getting frustrated when they don't have the final say.

necovek•1mo ago

It can be your problem if the company goes under and you lose your job: you might not be able to pay your mortgage or bills.

If you believe your manager is asking for unreasonable things in what you are an expert in despite you raising these concerns, and it's not clear their manager is in on it, please raise it to their manager!

"I am willing to continue working this way, but I just want to make sure the consequences it could have on the business are clear to everyone here."

Aurornis•1mo ago

This mirrors my experience with the texting while driving problem: The debate started as angry complaints about how all the kids are texting while driving. Yet it’s a common problem for people of all ages. The worst offender I knew for years was in her 50s, but even she would get angry about the youths texting while driving.

Pretending it’s just the kids and young people doing the bad thing makes the outrage easier to sell to adults.

hnthrow0287345•1mo ago

>It's even worse than that: non-junior devs are doing it as well.

This might be unpopular, but that is seeming more like an opportunity if we want to continue allowing AI to generate code.

One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review. Obviously this gets worse if more total code is being produced.

We could eliminate that interruption by having someone doing more thorough code reviews, full-time. Someone who is not being bound by sprint deadlines and tempted to gloss over reviews to get back to their own work. Someone who has time to pull down the branch and actually run the code and lightly test things from an engineer's perspective so QA doesn't hit super obvious issues. They can also be the gatekeeper for code quality and PR quality.

sorokod•1mo ago

> One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review.

I would have thought that reviewing PRs and doing it well is in the job description. You latter mention "someone" a few times - who that someone might be?

bee_rider•1mo ago

Can we make an LLM do it?

“You are a cranky senior software engineer who loves to nitpick change requests. Here are your coding standards. You only sign off of a change after you are sure it works; if you run out of compute credits before you can prove it to yourself, reject the change as too complex.”

Balance things, pit the LLMs against each other.

cm2012•1mo ago

This would probably catch a lot of errors

jjmarr•1mo ago

We do this at work and it's amazing.

postflopclarity•1mo ago

I do this all the time. I pass my code into "you are a skeptic and hate all the code my student produces: here is their latest PR etc.. etc.."

osn9363739•1mo ago

I have devs that do this and we have CI AI code review. Problem is, it always finds something. So the devs that have been in the code base for a while know what to ignore, the new devs get bogged down by research. It's a net benefit as it forces them to learn, which they should be doing. It def slows them down though which goes against some of what I see about the productivity boost claims. A human reviewer with the codebase experience is still needed.

mywittyname•1mo ago

Slowing down new developers by forcing them to understand the product and context better is a good thing.

I do agree that the tool we use (code rabbit) is a little too nitpicky, but it's right way more than it's wrong.

bee_rider•1mo ago

I don’t use any of these sorts of tools, so sorry for the naive questions…

What sort of thing does it find? Bad smells (possibly known imperfections but least-bad-picks), bugs (maybe triaged), or violations of the coding guides (maybe known and waivered)?

I wonder if there’s a need for something like a RAG of known issues…

baq•1mo ago

GPT 5+ high+ review bots find consistently good issues on average for me, sometimes they’re bogus, but sometimes they’re really, really good finds. I was impressed more than once.

filoeleven•1mo ago

This response fails entirely to answer the question.

marcosdumay•1mo ago

A full-time code reviewer will quickly lose touch with all practical matters and steer the codebase into some unmaintainable mess.

This is not the first time somebody had that idea.

JohnBooty•1mo ago

I've often thought this could work if the code reviewer was full-time, but rotated regularly. Just like a lot of jobs do with on-call weeks, or weeks spent as release manager - like if you have 10 engineers, and once every ten weeks it's your turn to be on call.

That would definitely solve the "code reviewer loses touch with reality" issue.

Whether it would be a net reduction in disruption, I don't know.

bee_rider•1mo ago

You could do some interesting layering strategies if you made it half time, for two people. Or maybe some staggered approach: each person does half time, full time, then half time again, with there people going through the sequence at a time. Make each commit require two sign-offs, and you could get a lot of review and maybe even induce some cooperation…

kaffekaka•1mo ago

"Interesting" is the word I would use as well, but also cumbersome and complicated.

necovek•1mo ago

Doing code review as described (actually diving deep, testing etc) for 10 engineers producing code is likely not going to be feasible unless they are really slow.

In general, back in 2000s, a team I was on employed a simple rule to ensure reviews happen in a timely manner: once you ask for a review, you have an obligation to do 2 reviews (as we required 2 approvals on every change).

The biggest problem was when there wasn't stuff to review, so you carried "debt" over, and some never repaid it. But with a team of 15-30 people, it worked surprisingly well: no interrupts, quick response times.

It did require writing good change descriptions along with testing instructions. We also introduced diff size limits to encourage iterative development and small context when reviewing (as obviously not all 15-30 people had same deep knowledge of all the areas).

jaggederest•1mo ago

I think it's amenable if you make code review a primary responsibility, but not the only responsibility. I think this is a big thing at staff+ levels, doing more than your share of code review (and other high level concerns, of course).

kragen•1mo ago

Linus Torvalds is effectively a full-time code reviewer, and so are most of his "lieutenants". It's not a new idea, as you say, but it works very well.

immibis•1mo ago

As I read once: all that little stuff that feels like it stops you from doing your job is your job.

jms703•1mo ago

>One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review

Code reviews are a part of the job. Even at the junior level, an engineer should be able to figure out a reasonable time to take a break and shift efforts for a bit to handle things like code reviews.

gottagocode•1mo ago

> We could eliminate that interruption by having someone doing more thorough code reviews, full-time. Someone who is not being bound by sprint deadlines and tempted to gloss over reviews to get back to their own work.

This is effectively my role (outside of mentoring) as a lead developer over a team of juniors we train in house. I'm not sure many engineers would enjoy a day of only reviewing, me included.

ohwaitnvm•1mo ago

So pair programming?

sodapopcan•1mo ago

Yep, eliminates code reviews altogether. Unfortunately it remains wisely unpopular with perle even saying “AI” can be the pair.

gaigalas•1mo ago

It does not eliminate code reviews.

In practice, you should have at least one independent reviewer who did not actively worked on the PR.

That reviewer should also download the entire code, run it, make tests fail and so on.

In my experience, it's also good that this is not a fixed role "the reviewer", and a responsability everyone in the team shares (your next task should always be: review someone else's work, only pick a new thing to do if there is nothing to review).

This practice increases quality dramatically.

sodapopcan•1mo ago

> It does not eliminate code reviews.

Yes it does. There are many ways to do things, of course, and you can institute that there must be an independent reviewer, but I see this is a colossal waste of time and takes away one of the many benefits of pairing. Switch pairs frequently, and by frequently I really mean "daily," and there is no need for review. This also covers "no fixed responsibilities" you mentioned (which I absolutely agree with).

Again, there are no rules for how things must be done, but this is my experience of three straight years working this way and it was highly effective.

gaigalas•1mo ago

Mixed-level pairs (senior/junior), for example, are more about mentoring than reviewing. Those sessions do not qualify for "more than one pair of eyes".

Excited (or maybe even stubborn) developers can often win their pairs by exhaustion, leading to "whatever you want" low effort contributions.

Pairs tend to under-document. They share an understanding they developed during the pairing session and forget to add important information or details to the PR or documentation channels.

I'm glad it has been working for you. Maybe you work in a stellar team that doesn't have those issues. However, there are many scenarios that benefit a lot from an independent reviewer.

sodapopcan•1mo ago

The problem with most people's experience with pairing is that they are given no guidance on how it should be done outside of "just work with someone."

In terms of under-documentation, I didn't really find that. Most of my jobs have either been way under-documented, way over-documented (no one reads it and it gets outdated). Again I'll say that I find switching pairs daily is key with no one person stay on for more than 2 days in a row (so if a story takes three days to complete, the two people who finished it are not the same two who started it). This keeps that internal knowledge well-spread.

But you're right, if you're dealing with overly excited/stubborn folk who refuse to play ball, that's obviously not going to work. Conversely, if you have a trusting team and someone is having one of those days where they feel useless and unmotivated, pairing can turn it into some kind of useful day. This could be because your colleague gets you excited or, if you have high trust, I've literally said and had people say, "I'm not feeling it today... can you drive all day?" and you're still able to offer good support as a navigator and not have a total write-off of a day.

To the point of improving an otherwise bad day, one of the more interesting arguments against pairing I've heard is that companies use it to make sure everyone is always working. That would indeed be sad if that was the motivation.

necovek•1mo ago

Pairing people together on a single task makes that task get done faster, with higher quality. However, when paired together, the people still pick up some of the same biases and hold the same assumptions and context, so it is really worse than having a single author + independent reviewer.

So:

  single author, no review < pair programming, no review < single author + review < pair programming + review

sodapopcan•1mo ago

Again, if you rotate pairs daily this doesn't happen. You could argue that creates a "team bias" but I call that cohesion.

necovek•1mo ago

Can you elaborate?

Is this pairing on small tasks that get done in (less than) a day? Or do you also have longer-running implementations (a few days, a week, or maybe even more?) where you rotate the people who are on it (so one task might have had 5 or 10 people on it during the implementation)?

If the latter, that sounds very curious and I'd love to learn more about how is that working, how do people feel about it, what are the challenges you've seen, and what are the successes? It'd be great to see a write-up if you've had this running for a longer time!

If it's only the former (smaller tasks), then this is where biases and assumptions stay (within the scope of that task): I still believe this is improved with an independent reviewer.

sodapopcan•1mo ago

I do have a blog post about this sitting around along with all my other mostly completed posts. Finishing this one would be great as it's a topic I'm very passionate about!

Our stories were broken up to take worst-case-scenario 5 days, the ideal was 1-3, of course. Maybe would be multi-day stories so yes, if something ended up taking 5 days it was possible the whole team could have touched it (our team was only 6 devs so never more than 6).

In short, there are the usual hurdles with pairing, the first one being that it takes some time to catch your stride but once you do, things start flying. It is of course not perfect and you run into scenarios where something takes longer because as a new person joins the ticket, they want to tweak the implementation, and it can get pulled in different directions the longer it takes. But this also gets better the longer you keep at it.

There are lots of pros AFAIC but a couple I like: you develop shared ownership of the codebase (as well as a shared style which I think is neat), ie, every part of the app had at least a few people with lots of context so there was never any, "Oh, Alice knows the most about that part of so let's wait until she's back to fix it" kinda thing. There is no "junior work." When a junior joins the team they are always pairing so they ramp up really quickly (ideally they always drive for the first several months so they actually learn).

Another downside would be it's bad for resume-driven-development. You don't "own" as many projects so you have to get creative when you're asked about big projects you've led recently, especially if you're happily in an IC role.

And also yes, if there is a super small task that is too small to pair on, we'd have a reviewer, but often these were like literal one-line config changes.

We'd also have an independent reviewer whenever migrations were involved or anything risky task that needed to be run.

necovek•1mo ago

Once you do publish the write-up (here's some encouragement!), hopefully it catches the attention of others so I don't miss it either :)

aslakhellesoy•1mo ago

Shhh don’t let them know!

shiandow•1mo ago

The best way I have of knowing code is correct on all levels is convincing myself I would write it the same way.

Thr only way to be 100% sure is writing it myself. If I know some one reasonable managed to write the code I can usually take some shortcuts and only look at the code style, common gotchas etc.

Of course it wouldn't be the first time I made some erroneous assumptions about how well considered the code was. But if none of the code is the product of any intelligent thought well, I might as well stop reading and start writing. Reading code is 10x harder than writing it after all.

necovek•1mo ago

With time and experience, reading code becomes much easier.

And well-written code is usually easy to read and understand too!

The purpose of a code review is, apart from ensuring correctness, to ensure that the code that gets merged is easy to understand! And to be honest, if it's easy to understand, it's easy to ensure correctness too!

The biggest challenge I had was to distinguish between explanations needed to understand the change, and explanations needed to understand the code after it was merged in. And making it clear in my code review questions that whatever question I have, I need code and comments in the code to answer them, not the author to explain it to me (I frequently have already figured out the why, but took me longer than needed): it's not because I did not get it, it's because it should be clearer (finding the right balance between asking explicitly, offering a suggestion, or pitting it as a question to prompt some thinking is non-trivial too).

Spoom•1mo ago

Big companies would outsource this position within a year, I guarantee it. It's highly measurable which means it can be "optimized".

nunez•1mo ago

This sounds like what unit tests after every commit and e2e tests before every PR are supposed to solve.

mywittyname•1mo ago

This sounds good in theory, but in practice, a person capable of doing a good job at this role would also be a good developer whose impact would be greater if they were churning out code. This is a combination of a lead engineer and SDET.

In reality, this ends up being the job given to the weakest person on the team to keep them occupied. And it gives the rest of the team a mechanism to get away with shoddy work and not face repercussions.

Maybe I'm just jaded, but I think this approach would have horrible results.

AI code review tools are already good. That makes for a good first pass. On my team, fixing Code Rabbit's issues, or having a good reason not to is always step 1 to a PR. It catches a lot of really subtle bugs.

lowkeyokay•1mo ago

In the company I’m at this is beginning to happen. PM’s want to “prototype” new features and expect the engineers to finish up the work. With the expectation that it ‘just needs some polishing’. What would be your recommendation on how to handle this constructively? Flat out rejecting LLM as a prototyping tool is not an option.

lurking_swe•1mo ago

sounds like a culture and management problem. CTO should set clear expectations for his staff and discuss with product to ensure there is alignment.

If i was CTO I would not be happy to hear my engineers are spending lots of time re-writing and testing code written by product managers. Big nope.

Our_Benefactors•1mo ago

This could be workable with the understanding that throwing away 100% of the prototype code is acceptable and it’s primary purpose is as a communication tool, not a technical starting point.

rootusrootus•1mo ago

This is how I've handled it so far. But that is probably because the PM that does this for me knew going in that they were not going to be generating something I'd want to become responsible for polishing and maintaining. It's basically just a fancier way of doing what they would otherwise use SketchUp for.

jjmarr•1mo ago

I would accept this because it'll increase demand for SWEs and prevent us from losing our jobs.

theshrike79•1mo ago

"You can't polish a turd" =)

necovek•1mo ago

Obviously, unleash LLM code reviewer with the strictest possible prompt on the change :)

Then innocently say "LLM believes this is bad architecture and should be recreated from scratch."

627467•1mo ago

Just shove a code review agent in the middle. Problem solved

[Edit] man, people dont get /s unless its explicit

fragmede•1mo ago

That startup is called CodeRabbit and damned if it doesn't come up with good suggestions sometimes. Other times you have to overrule it, or more likely create separate PRs for its suggestions, and avoid lumping a bunch of different stuff into a single PR, and sometimes it's stupid and doesn't know what it's talking about, and also misses stuff, so you do still need a human to review it. But if your at all place where LLMs are being used to generate large swaths of functional code, including tests, and human reviewers simply can't keep up, overall it does feels like a step forwards. I can't speak to how well other similar services do, but presumably they're not the only one that does that; CodeRabbit's just the one that my employer has chosen.

kridsdale3•1mo ago

Is this startup sitting on any IP other than a bunch of prompts?

analog31•1mo ago

Where are the junior devs while their code is being reviewed? I'm not a software developer, but I'd be loath to review someone's work unless they have enough skin in the game to be present for the review.

stuaxo•1mo ago

Git PRs work on async model for reviews.

DrewADesign•1mo ago

And even then, in my experience, they work more like support tickets than business email, for which there are loose norms for response time, etc. Unless there’s a specific reason it needs to be urgently handled, people will prioritize other tasks.

tyrust•1mo ago

Code review is rarely done live. It's usually asynchronous, giving the reviewer plenty of time to read, digest, and give considered feedback on the changes.

Perhaps a spicy patch would involve some kind of meeting. Or maybe in a mentor/mentee situation where you'd want high-bandwidth communication.

jopsen•1mo ago

Doing only IRL code reviews would certainly improve quality in some projects :)

It's probably also fairly expensive to do.

comprev•1mo ago

Pair programming? That is realtime code review by another human

colinb•1mo ago

Fagan inspection has entered the room

jghn•1mo ago

Am old enough that this was status quo for part of my career, and have also been in some groups that did this as a rejection of modern code review techniques.

There are pros & cons to both sides. As you point out it's quite expensive in terms of time to do the in person style. Getting several people together is a big hassle. I've found that the code reviews themselves, and what people get out of them, are wildly different though. In person code reviews have been much more holistic in my experience, sometimes bordering on bigger picture planning. And much better as a learning tool for other people involved. Whereas the diff style online code review tends to be more focused on the immediate concerns.

There's not a right or wrong answer between those tradeoffs, but people need to realize they're not the same thing.

stephen_cagle•1mo ago

And yet... is it? Realtime means real discussion, and opportunity to align ever so slightly on a common standard (which we should write down!), and an opportunity to share tacit knowledge.

It also increases the coverage area of code that each developer is at least somewhat familiar with.

On a side note, I would love if the default was for these code reviews to be recorded. That way 2 years later when I am asked to modify some module that no one has touched in that span, I could at least watch the code review and gleem something about how/why this was architect-ed the way it was.

lokar•1mo ago

IMO, a lot of what I think you are getting at should be worked out in design before work starts.

Ekaros•1mo ago

I would guess that 3 part code review would actually be most effective. Likely even save on costs. First part is walkthrough on call, next independent review and comments. Then per need an other call over fixes or discussion.

Probably spend more time on it, but would share the understanding and alignment.

throwaway314155•1mo ago

My first job did IRL code reviews with at least two senior devs in the loop. It was both devastating and extremely helpful.

SoftTalker•1mo ago

Yeah when we first started, "code review" was a weekly meeting of pretty much the entire dev team (maybe 10 people). Not all commits were reviewed, it was random and the developer would be notified a couple of days in advance that his code was chosen for review so that he could prepare to demo and defend it.

necovek•1mo ago

Wow, that's a very arbitrary practice: do you remember roughly when was that?

I was in a team in 2006 where we did the regular, 2-approve-code-reviews-per-change-proposal (along with fully integrated CI/CD, some of it through signed email but not full diffs like Linux patchsets, but only "commands" what branch to merge where).

SoftTalker•1mo ago

Around that time frame. We had CI and if you broke the build or tests failed it was your job to drop anything else you were doing and fix it. Nothing reached the review stage unless it could build and pass unit tests.

necovek•1mo ago

Right, we already had both: pre-review build & test runs, and pre-merge CI (this actually ran on a temp, merged branch).

marwamc•1mo ago

This was still practice at $BIG_FINANCE in the couple of years just before covid, although by that point such team reviews were reducing in importance and prominence.

rootusrootus•1mo ago

As someone else mentioned, the process is async. But I achieve a similar effect by requiring my team to review their own PRs before they expect a senior developer to review them and approve for merging.

That solves some of the problem with people thinking it's okay to fire off a huge AI slop PR and make it the reviewer's responsibility to see how much the LLM hallucinated. No, you have to look at yourself first, because it's YOUR code no matter what tool you used to help write it.

unbalancedevh•1mo ago

> requiring my team to review their own PRs before they expect a senior developer to review them

I'm having a hard time imagining the alternative. Do junior developers not take any pride in their work? I want to be sure my code works before I submit it for review. It's embarrassing to me if it fails basic requirements. And as a reviewer, what I want to see more than anything is how the developer assessed that their code works. I don't want to dig into the code unless I need to -- show me the validation and results, and convince me why I should approve it.

I've seen plenty of examples of developers who don't know how to effectively validate their work, or document the validation. But that's different than no validation effort at all.

jjmarr•1mo ago

Many are just doing SWE for the money.

Their goal is to pass the hot potato to someone else, so they can say in the standup "oh I'm waiting on review" making it not their problem.

rootusrootus•1mo ago

> Do junior developers not take any pride in their work?

Yes. I have lost count of the number of PRs that have come to me where the developer added random blank lines and deleted others from code that was not even in the file they were supposed to be working in.

I'm with you -- I review my own PRs just to make sure I didn't inadvertently include something that would make me look sloppy. I smoke test it, I write comments explaining the rationale, etc. But one of my core personality traits (mostly causing me pain, but useful in this instance) is how much I loathe being wrong, especially for silly reasons. Some people are very comfortable with just throwing stuff at the wall to see if it'll stick.

alfons_foobar•1mo ago

> added random blank lines and deleted others from code that was not even in the file they were supposed to be working in.

Maybe some kind of auto-formatter?

rootusrootus•1mo ago

That is my charitable interpretation, but it's always one or two changes across a module that has hundreds, maybe thousands of lines of code. I'd expect an auto-formatter to be more obvious.

In any case, just looking over your own PR briefly before submitting it catches these quickly. The lack of attention to detail is the part I find more frustrating than the actual unnecessary format changes.

Aeolun•1mo ago

Why would you are about blank lines? Sounds like aborted attempts at a change to me. Then realizing you don’t need them. Seeing them in your PR, and figuring they don’t actually do anything to me.

baq•1mo ago

More likely artifact of debug prints being removed.

ok_dad•1mo ago

> Yes. I have lost count of the number of PRs that have come to me where the developer added random blank lines and deleted others from code that was not even in the file they were supposed to be working in.

That’s not a great example of lack of care, of you use code formatters then this can happen very easily and be overlooked in a big change. It’s also really low stakes, I’m frankly concerned that you care so much about this that you’d label a dev careless over it. I’d label someone careless who didn’t test every branch of their code and left a nil pointer error or something, but missing formatter changes seems like a very human mistake for someone who was still careful about the actual code they wrote.

hoten•1mo ago

I think the point is that a necessary part of being careful is reviewing the diff yourself end-to-end right before sending it out for review. That catches mistakes like these.

code_for_monkey•1mo ago

i myself have been guilty of creating a pr and immediately pushing a commit to clean that stuff up

lokar•1mo ago

It’s cultural. It always seemed natural to me, until I joined a team that treated review as some compliance checkbox that had nothing to do with the real work.

Things like real review as an important part of the work requires a culture that values it.

epiccoleman•1mo ago

> I want to be sure my code works before I submit it for review.

No kidding. I mean, "it works" is table stakes, to the point I can't even imagine going to review without having tested things locally at least to be confident in my changes. The self-review for me is to force me to digest my whole patch and make sure I haven't left a bunch of TODO comments or sloppy POC code in the branch. I'd be embarrassed to get caught leaving commented code in my branch - I'd be mortified if somehow I submitted a PR that just straight up didn't work.

tqian•1mo ago

Oh junior devs submit PRs that don't fully work all the time.

muzzio•1mo ago

Reviewing your own PR is underrated. I do this with most of my meaningful PRs, where I usually give a summary of what/why I'm doing things in the description field, and then reread my code and call out anything I'm unsure of, or explain why something is weird, or alternatives I considered, or anything that I would catch reviewing someone else's PR.

It makes it doubly annoying though whenever I go digging in `git blame` to find a commit with a terrible title, no description and an "LGTM" approval though.

theshrike79•1mo ago

We have an AI doing the first pass PR review using company standards as a prompt.

It catches the worst slop in the first pass easily, as well as typos etc.

groby_b•1mo ago

I think we've moved on from the times where you brought a printout to the change control board to talk it through.

SoftTalker•1mo ago

But are we better off?

ok_dad•1mo ago

A senior dev should be mentoring and talking to a junior dev about a task well before it hits the review stage. You should discuss each task with them on a high level before assigning it, so they understand the task and its requirements first, then the review is more of a formality because you were involved at each step.

marwamc•1mo ago

Also communal RFCs, RFPs, Roadmapping, Architecture/Design Proposals, Design Docs and/or Reviews help socialize/diffuse org standards and expectations.

I found these help ground the mentorship and discussions between junior-senior devs. And so even for the enterprising aka proactive junior devs who might start working on something in advance of plans/roadmaps, by the time they present that work for review, if the work followed org architectural and design patterns, the review and acceptance process flows smoothly.

In my juinior days I was taught: if the org doesn't have a design or architectural SOP for the thing you're doing, find a couple of respectable RFCs from the internet, pick the three you like, and implement one. It's so much easier to stand on the shoulders of giants than to try and be the giant yourself.

mullingitover•1mo ago

> expects the “code review” process to handle the rest.

The LLMs/agents have actually been doing a stellar job with code reviews. Frankly that’s one area that humans rush through, to the point it’s a running joke that the best way to get a PR granted a “lgtm” is to make it huge. I’ve almost never seen Copilot wave a PR through on the first attempt, but I usually see humans doing that.

distances•1mo ago

That smells of bad team practices. Put a practical limit on PRs sizes as the first step, around 500 lines max is a good rule of thumb in my experience. Larger than that, and the expectation then is a number of small PRs to a feature branch.

I rarely see a PR that should pass without comments. Your team is being sloppy.

mullingitover•1mo ago

> Your team is being sloppy.

I'm talking about a running joke in the industry, not my team.

strangattractor•1mo ago

Not at Meta - their job is to "Move fast and break things". I think people are just doing what they've been told.

bee_rider•1mo ago

“Move fast and break things” works well when you are a little player in a big world, because you can only perturb the system into so bad a state with you limited resources. Now, they got big, and everything is broken.

alphazard•1mo ago

Quality is not rewarded at most companies, it's not going to turn into more money, it might turn into less work later, but in all likelihood, the author won't be around to reap the benefits of less work later because they will have moved onto another company.

On the contrary, since more effort doesn't yield more money, but less effort can yield the same money, the strategy is to contract the time spent on work to the smallest amount, LLMs are currently the best way to do that.

I don't see why this has to be framed as a bad thing. Why should anyone care about the quality of software that they don't use? If you wouldn't work on it unless you were paid to, and you can leave if and when it becomes a problem, then why spend mental energy writing even a single line?

tuyiown•1mo ago

Because nothing can beat productivity of a motivated team building code that they are proud of. The mental energy spent becomes the highest reward. As for profit, it _compounds_ as for every other business.

The fact that this is lost as a common knowledge whereas shiny examples arises regularly is very telling.

But it is not liked in business because reproducing it requires competence in the industry, and finance deep pockets don’t believe in competence anymore.

hostyle•1mo ago

Not everything is about money. Have you never wanted to be good at something because you enjoy it? Or do something for the love of the craft? Have you heard of altruism?

Larrikin•1mo ago

But why do that for the company instead of yourself?

alphazard•1mo ago

This exactly. You have to be honest about why you are building something. If the answer is that you actually want to use it, then yes, quality and maintainability are important. It might even be a good idea to use no AI whatsoever.

But if you are building it because doing so is in the long chain of cause and effect that leads to you being fed and having shelter, then you should minimize the amount of your time that is required to produce that end result. Do you get better food, and better shelter if the software is better? It would certainly be nice if that was the case, but it's not.

> Not everything is about money.

Except for your job, which is primarily about money. Making it take less time, means that you have more time to focus on things that really are not about money.

hostyle•1mo ago

Most people spend maybe 1/4 of their working age life at a job working for someone else. Why would you deliberately sabotage that by checking out mentally and waste all that time on sub-standard work? How do you expect to earn a promotion? You can produce good code at work and even better code at home for yourself. Deliberately producing slop at work will not help anyone.

necovek•1mo ago

By investing in quality on your job, your job will be easier and take less time. In software, you won't be called in to resolve that incident over the Thanksgiving weekend, and you won't be asked to debug why your system broke after someone committed 10k line diff without any review and without tests.

Be selfish! But be smart! On top of getting the best result for you, this gets the best result for the business too! And businesses know it, and even if they don't reward it proportionally, they do reward it with bonuses and seniority promotions.

ThrowawayR2•1mo ago

Speaking only for my particular circumstances, the company is the vehicle that I use to do it for myself since it provides specialized facilities and equipment I wouldn't have access to as an individual or a founder. That I get paid for it is merely icing on the cake.

phito•1mo ago

I find doing my job as best as I can intrinsically rewarding. Even tho I am getting paid peanuts and have to give more than half of those peanuts to my government. I'm that kind of sucker.

rootusrootus•1mo ago

One thing I've pushed developers on my team to do since way before AI slop became a thing was to review their own PR. Go through the PR diff and leave comments in places where it feels like a little explanation of your thought process could be helpful in the review. It's a bit like rubber duck debugging, I've seen plenty of things get caught that way.

As an upside, it helps with AI slop too. Because as I see it, what you're doing when you use an LLM is becoming a code reviewer. So you need to actually read the code and review it! If you have not reviewed it yourself first, I am not going to waste my time reviewing it for you.

It helps obviously that I'm on a small team of a half dozen developers and I'm the lead, and management hasn't even hinted at giving us stupid decrees like "now that you have Claude Code you can do 10x as many features!!!1!".

rcxdude•1mo ago

Yeah, I always think it's kinda rude to throw something to someone else to review without reviewing it yourself, even if you were the one to write it. Looking at it twice yourself can help with catching things even faster than someone else getting up to speed with what you were doing and querying it. Now it seems like with LLMs people are putting code up for review that hasn't even been looked at once.

SchemaLoad•1mo ago

My coworker does this. PRs with random files from other changes left in, console logs everywhere. Blatent issues everywhere.

I find it extremely rude they chuck this stuff at me without even having read it themselves. At least these days I can just chuck the AI reviewer thing on it and throw it back to them.

seanmcdirmid•1mo ago

Don’t accept PRs without test coverage? I mean, LLMs can do those also, but it’s something.

acedTrex•1mo ago

Juniors aren't even the problem here, they can and should be taught better thats the point.

Its when your PEERS do it that its a huge problem.

tyleo•1mo ago

There’s folks who perform like juniors but have just been in the business long enough to be promoted.

Title only loosely tracks skill level and with AI, that may become even more true.

xnx•1mo ago

Unfortunately, junior behavior exists in many with "senior" titles. Especially since "senior" is often given to those 2 years out of school.

SoftTalker•1mo ago

Title inflation?

tensor•1mo ago

IMO tech suffers pretty horrible title inflation. If you reach "senior" after only two years and "principle" after 5, what is left for the next 20 years? It's pretty ridiculous. But this sort of thing is really typical. The average tenure of someone in tech is probably about 2 years and each year the expectation is to see "big" career progression. Very often "When is my title going to change" is asked literally in the first year performance review.

jghn•1mo ago

The important thing here is for people to understand that at best titles only indicate relative rank within a company. And even then that's tenuous. Titles are effectively meaningless when comparing outside of a company.

lokar•1mo ago

You get (finite) periods where several large / influential companies have a reasonably high level of rigor for their own levels, and there is a pretty stable mapping between the companies.

One such period seems to have ended sometime around the start of Covid, or a bit before.

CodeMage•1mo ago

What makes this whole thing worse is the concept of "non-terminal" levels, i.e. levels that you're not allowed to stay at indefinitely, which means that you must either get promoted or fired.

I can understand not wanting to let people stay in a junior position forever, but I've seen this taken to a ridiculous extreme, where the ladder starts at a junior level, then goes through intermediate and senior to settle on staff engineer as the first "terminal" position.

Someone should explain to the people who dream up these policies that the Peter Principle is not something we should aim for.

It's even worse when you combine this with age. I'm nearing 47 years old now and have 26 years of professional experience, and I'm not just tired, but exhausted by the relentless push to make me go higher up on the ladder. Let me settle down where I'm at my most competent and let me build shit instead of going to interminable meetings to figure out what we want to build and who should be responsible for it. I'm old enough to remember the time when managers were at least expected to be more useful in that regard.

lokar•1mo ago

Yeah, the terminal level, whatever the title (they are just words) need to be the point at which you can handle moderately complex (multi-week) tasks with no supervision.

And honestly, this will depend on the environment and kind of work being done.

SoftTalker•1mo ago

If that's what you're looking for you can find it in academia. Universities have no problem paying people to stay around forever without promotion.

Of course the pay won't be great, but the benefits are decent, PTO is usually excellent, and the work environment usually very low stress.

CodeMage•1mo ago

FWIW, I'm starting to seriously consider this as a strategy that will allow me to get to retirement without completely messing up my health due to stress and burnout.

That said, there's something deeply wrong with our industry if that's the way we expect things to work. I never felt that teaching was my calling, but I might end up being forced into it anyway and taking up a job that someone with proper passion and vocation could fill. Why? Because my own industry doesn't understand that unlimited growth is not sustainable.

For that matter, "growth" is not the right word, either. We're all being told that scaling the ladder is the same thing as growing and developing, but it's not.

lokar•1mo ago

But the point of the rule is that unlimited growth is not expected. There is a fairly clear point you need to get to, and then you can stay put if you like.

CodeMage•1mo ago

Yes, and I agree with that. But my reply was to a comment that seemed to dispute that idea and imply that if you wanted to stop growing at some point, then you should shift to academia.

That said, there is an expectation of unlimited growth and it comes from a different source: ageism. At my age, the implicit expectation is that I will apply for a staff or even principal role. Applying for a "merely" senior role often rings alarm bells.

That trend -- and certain others -- are what's making me consider taking up teaching instead.

lokar•1mo ago

Are we talking about the same thing?

The point of the terminal level rule is that there is a point, bellow which you are not actually contributing all that much more in output then it takes to supervise and mentor you. At some point you need to be clearly net positive. This generally means you can mostly operate on your own.

If it becomes clear you won't make it to that level, then something is wrong. Either you are not capable, or not willing to make the effort, or something else. Regardless, you get forced out.

adobesubmarine•1mo ago

In my experience, people who say this kind of thing about either industry or academia have usually worked in one, but not both.

xnx•1mo ago

> IMO tech suffers pretty horrible title inflation

It began with "software engineer"

jmpeax•1mo ago

Don't get me started on "software architect".

9rx•1mo ago

Even "code monkey" is generous.

tremon•1mo ago

On classic big waterfall projects, you can find actual architects. Those are the ones drafting interfaces and delineating components/teams before the first source file is even committed.

jmpeax•1mo ago

Actual architects design buildings.

tremon•1mo ago

I'm sorry. My fault for engaging you, I guess.

theshrike79•1mo ago

I've had calls with Principal Architects who couldn't code themselves out of a wet paper bag.

And according to the company experience chart, they should've been a "thought leader" and "able to instruct senior engineers"

My title? Backend Programmer (20 years of experience). Our unit didn't care about titles because there was a "budget" for title upgrades per business unit and guess which team grabbed all of them =)

geodel•1mo ago

Its an epidemic all over in IT departments and s/w industry in general. Nowadays people whose sum total knowledge would be managing some packaged Oracle/SAP software installation are holding title of CTO/SVP/EVP of software organization with thousands of developers.

Since they bring a certain cluelessness and ignorance as honor to whole orgs actual technical expertise among engineers could be detriment to one's jobs and career.

tguvot•1mo ago

i am principle architect. last time i wrote code for production was more than 10 years ago. i never touched half of languages that are used in our system

in last week I resolved a few legal/regulatory problems that could have cost company tens of millions of dollars in fines/direct spend/resources and prevented few backend teams from rolling out functionality that could have negative impact on stability/security/performance. I did offer them alternative ways to implement whatever they needed and they accepted it

rglynn•1mo ago

How many civil engineers or architects know how to put up scaffolding or lay bricks?

That was a little tongue in cheek, but I am genuinely curious what you think the correct approach is? I have seen many teams that do need to have someone overseeing the overall architecture, even if that person isn't writing the code line-by-line.

If you have that capacity baked into "Backend Programmer", then great, but not every team is the same.

Is there something inherently wrong with an "architect" who hasn't written code in a decade but is instructing seniors? One might believe that the answer is self-evident, however, I would argue that the organisational structures we see in the world (functional or otherwise) do not bear this out.

9rx•1mo ago

> If you reach "senior" after only two years and "principle" after 5, what is left for the next 20 years?

There is nothing left. Not everyone puts in the same dedication towards the craft, of course. It very well might take someone 30 years to reach "principle" (and maybe even never). But 5 years to have "seen it all" is more than reasonable for someone who has a keen interest in what they are doing. It is not like a job dependent on the season, where you only get one each year. In computing, you can see many different scenarios play out in milliseconds. It doesn't need years to go from no experience to having "seen it all".

That is why many in this industry seek management roles as a next step. It opens a new place to find scenarios one has never seen before; to get the start the process all over again.

Yoric•1mo ago

Er...

I've been programming since I was 7 and I'm old enough to remember the previous AI summer. Somewhere along the way, I've had impact on a few technologies you have heard of, I've coded at almost all levels from (some very specialized) hardware to Prolog, Idris and Coq/Rocq, with large doses of mainstream languages in-between, and I don't think I'll ever be close to having seen in all.

If anyone tells me that they've seen it all in 5 years, I'm going to suspect them of not paying attention.

9rx•1mo ago

The scare quotes are significant. Obviously nobody can ever see it all as taken in its most literal sense. But one can start to see enough that they can recognize the patterns.

If your job is dependent on the weather, one year might be rainy, one year might be drought, one year might be a flood, etc. You need to see them to understand them. But eventually you don't have to need to see the year where it is exceptionally rainy, but not to the point of flood, to be able to make good decisions around it. You can take what you learned in the earlier not-quite-so rainy year and what you learned during the flood year and extrapolate from that what the exceptionally rainy year entails. That is what levels up someone.

Much the same is true in software. For example, once you write a (well-written) automated test in Javascript and perhaps create something in Typescript, you also have a pretty good understanding of what Rocq is trying to do well enough to determine when it would be appropriate for you to use. It would no doubt take much, much longer to understand all of its minutia, but it is not knowledge of intimate details that "senior", "principle", etc. is looking for. It is about being able to draw on past experience to make well-reasoned choices going forward.

Yoric•1mo ago

In my experience, not really, no.

You need a very different mindset to write in JS (or TS), in Rust, in Rocq, in Esterel or on a Quantum Computer. You need a very different mindset when coding tools that will be deployed on embedded devices, on user's desktops, in the Linux kernel, on a web backend or in a compiler. You need a very different mindset when dealing with open-source enthusiasts, untrusted users, defense contractors.

You might be able to have "seen it all" in a tiny corner of tech, but if you stop there, I read it as meaning that you don't have enough curiosity to leave your comfort zone.

It's fine, you don't really have to if you don't want to.

9rx•1mo ago

> You need a very different mindset to write in JS (or TS), in Rust, in Rocq, in Esterel or on a Quantum Computer.

"Senior", "principle", etc. are not about your ability to write. They speak to one's capacity to make decisions. A "junior" has absolutely no clue when to use JS, Rust, or Rocq, or if code should be written at all. But someone who has written (well-written) tests in JS, and maybe written some types in Typescript, now has some concept of verification and can start to recognize some of the tradeoffs in the different approaches. With that past experience in hand, they can begin to consider if the new project in front of them needs Rocq, Dafny, or if Javascript will do. Couple that with other types of experiences to draw from and you can move beyond being considered a "junior".

> You might be able to have "seen it all" in a tiny corner of tech

Of course there being a corner of some sort is a given. We already talked about management being a different corner, for example. Having absolutely no experience designing a PCB is not going to keep you a "junior" at a place developing CRUD web apps. Obviously nobody is talking about "seeing it all" as being about everything in the entire universe. There aren't that many different patterns, really, though. As the terms are used, you absolutely can "see it all", and when you don't have to wait around for the season to return next year, you can "see it all" quite quickly.

andrewaylett•1mo ago

In my experience, levels are more a question of delivery and trust than of technical skill. If a Senior is off down a rabbit hole, that's them doing their job. It's almost a defining feature of the level down (SE2, or local equivalent) that if they're off down a rabbit hole then something's probably gone wrong somewhere: if you could trust their judgement, they'd be a senior already.

(Yes, that's circular and overly reductive, don't take it too literally)

Where I've worked, "senior" has also meant being a final authority: more junior folk will come to you with questions about your team's codebases, and more senior folk will too. You might not have anyone else to ask that kind of question of.

andrewaylett•1mo ago

Similarly. I have over 20 years of professional experience. I've worked on embedded systems, and with mainframes. I've done (amongst other things) kernel development, compiler (& RTL) development, line-of-business, mobile, server, and web. Code I've written has a MAU on the order of 1% of humanity. Ask me about being a "full stack" developer :).

I've seen a lot. But the more I see, the more I find to see.

neighbour•1mo ago

I'm sympathetic to the title inflation issue but more on the problem of the "engineer" title, not to mention the "scientist" title.

For example, I work in Data & AI and we have:

- data engineer

- analytics engineer

- data scientist

- AI engineer

What I don't know is what's the alternative?

Data Engineers are basically software developers.

Analytics Engineers were Data Analysts or BI Analysts but the job has changed so much that neither of those titles fit.

My opinion is that basically everyone should just be a "Developer" or "Programmer" and then have the area suffixed:

- Data Engineer → Developer (Data Infrastructure)

- Analytics Engineer → Developer (Analytics)

etc.

theshrike79•1mo ago

A coworker had this anecdote decades ago.

There's a difference between 10 years of experience and 1 year of experience 10 times.

YOE isn't always a measurement of quality, you can work the same dead-end coding job for 10 years and never get more than "1 year" of actual experience.

joquarky•1mo ago

Experience is knowledge of what not to do.

theshrike79•1mo ago

It's the old saying: "$10 for the part, $990 for knowing where to put it"

You get a feel for what works and what doesn't, provided you know the relevant facts. Doing a 10RPS system is completely different than 300RPS. And if the payload is 1kB the problems aren't the same as with the one with a 10MB payload.

And if (when) you're using a cloud environment, which one is cheaper, large data or RPS? It's not always intuitive. We just had our AWS reps do a Tim "The Toolman" Taylor "HUUH?!" when we explained that the way our software works is 95% cheaper to run using S3 as the storage rather than DynamoDB :D

wseqyrku•1mo ago

True. I remember myself spending weeks just to figure out what not to do next because I can't afford a redo.

stephen_cagle•1mo ago

You know, this is kind of a funny take at some level. Like, for any surgery, you want the doctor who has done the same operation 10 times, not the one who has 10 years of "many hat doctoring" experience.

I'm not really arguing anything here, but it is interesting that we value breadth over (hopefully) depth/mastery of a specific thing in regards to what we view as "Senior" in software.

lokar•1mo ago

You want the Dr who has done the operation 10 times, and learned something each time, and incorporated that into their future efforts. You probably don’t want a Dr who will do their 11th surgery on you exactly the way they did the first.

This is what that saying is about

stephen_cagle•1mo ago

Fair enough. I guess I am making a bit of a straw-man in that I feel I just don't buy the idea that doing the same thing 10 times over the course of 10 years is somehow worse than doing different things over the course of 10 years. They are signals, and depending on what we are attempting, they just mean different expected outcomes. One isn't necessarily worse than another, but in this case it seems to be implying it is the distinction between Midlevel and Senior.

coldtea•1mo ago

>Fair enough. I guess I am making a bit of a straw-man in that I feel I just don't buy the idea that doing the same thing 10 times over the course of 10 years is somehow worse than doing different things over the course of 10 years.

Different things doesn't need to mean "different domains" which is how you read it.

It can be "things revealing different aspect/failure cases of the same domain" too.

If someone has done the same narrow kind of CRUD app 10 times, they're not CRUD-app experts - they never seen lots of different aspects of CRUD apps.

KronisLV•1mo ago

> I just don't buy the idea that doing the same thing 10 times over the course of 10 years is somehow worse than doing different things over the course of 10 years.

I always read it as making the same mistakes as you initially did and either failing to learn from them or not even trying to improve. Maybe you haven’t even explored enough approaches to see what actually works and what doesn’t and most importantly, in which circumstances.

For example, someone might do CI/CD with manually created pipelines in Jenkins (the web UI variety) with stuff like JDK configured directly on the runner nodes. They might never have written a Jenkinsfile, or tried out Docker for builds and therefore are slow and have to deal with brittle plugins and environment configuration. They might also be unaware of how GitHub Actions could benefit them, or the more focused approach of GitLab CI or even how nice Woodpecker CI can be, especially for simpler setups.

NDizzle•1mo ago

I want the doctor who has performed the operation and was still with the hospital in 6m, 12m, 18m, 24m to see the results of the operations that they performed.

Not the one who does a few operations and is never around to see the results of their decisions and actions.

gopher_space•1mo ago

“This person is incurious” would be more apt but also more likely to apply to everyone else in the room too.

Didn’t Bruce Lee famously say he fears the man who’s authored one API in ten thousand different contexts?

xarope•1mo ago

for context, he was referring to a physical methodology, which requires a lot of training and knowledge of usage application.

As analogy, I don't think you'd treat "using API XYZ 10,000 times" the same as "serving an ace in tennis, 10,000" times.

3acctforcom•1mo ago

Ops vs Dev

Situational Leadership gets into this. You want a really efficient McDonalds worker who follows the established procedure to make a Big Mac. You also want a really creative designer to build your Big Mac marketing campaign. Your job as a manager is figuring out which you need, and fitting the right person into the right job.

abustamam•1mo ago

Agreed. Meanwhile, many job postings out there looking for 10x full-stack developers who have deep experience in database, server, front end, devops, etc.

I think the concept of Full-stack dev is fine, but expecting them to know each part of the stack deeply isn't feasible imo.

theshrike79•1mo ago

Expert Generalists are a thing: https://martinfowler.com/articles/expert-generalist.html

BUT they're completely wasted if you just use them to turn JIRA tickets into end to end features =)

abustamam•1mo ago

Haha agreed. Thanks for the link, will give it a read. I feel like expert generalists should be founders or CTOS, and they are probably not applying for the positions that claim to be wanting expert generalists.

WalterBright•1mo ago

I once asked an obstetrician how she could tell the sex of a fetus with those ultrasound blobs. She laughed and said she'd seen 50,000 of those scans.

theshrike79•1mo ago

If we extrapolate the Dr example:

There is the one doctor who learned one way to do the operation at school, with specific instruments, sutures etc. and uses that for 1000 surgeries.

And then there's the curious one who actively goes to conferences, reads publications and learns new better ways to do the same operation with invisible sutures that don't leave a scar or tools that are allow for more efficient operations, cutting down the time required for the patient to be under anaesthesia.

Which one would you hire for your hospital for the next 25 years?

akho•1mo ago

The one with the smaller cemetery.

gpderetta•1mo ago

This is not really a doctor that has done the same operation 10 times. This is a doctor that jumped form hospital to hospital 10 times and probably never actually stayed long enough to be entrusted to do an operation in any hospital.

BadCookie•1mo ago

Maybe, but the typical person I have worked with in this industry is too smart to do something for 10 years and not learn much during that time.

I am afraid that this “1 year of experience 10 times” mantra gets trotted out to justify ageism more often than not.

theshrike79•1mo ago

Depends a lot on the type of software you're doing. Startups will have hungry people willing to learn, more traditional companies won't in the same percentages.

Not all people are curious, they go to school, learn to code and work their job like a normal 9-5 blue collar worker. They go to company trainings, but they don't read Hacker News, don't follow the latest language fads or do personal software projects during nights and weekends. It's just a day job for them that pays for their non-programming hobbies.

I've had colleagues who managed the same ASP+Access DB system for almost a decade, with zero curiosity or interest to learn anything that wasn't absolutely necessary.

We had to drag them to the ASP.NET age, one just wouldn't and stayed back managing the legacy version until all clients had moved to the new stack.

...and I just checked LinkedIn, the non-curious ones are still in the same company, managing the same piece of SaaS as a Software Developer. 20-26 years in the same company, straight from school.

disgruntledphd2•1mo ago

> ...and I just checked LinkedIn, the non-curious ones are still in the same company, managing the same piece of SaaS as a Software Developer. 20-26 years in the same company, straight from school.

And honestly, this should be OK. For a lot of people, they work to put food on the table and keep a roof over their head, and our society is structured like this for whatever reasons.

Not everyone needs to be learning and growing all the time. I personally like this, but I've worked with incredibly competent people who just had other interests outside of work, and had no desire to get promoted or work on different things.

Personally, I prefer (often) working with the learning and growing people, but sometimes you can learn a bunch from the stable people as they'll often have lots of hard won lessons caused by staying in the same place for a long time.

theshrike79•1mo ago

Some people’s brains are just wired for that.

There are people who are just fine being a cog in a machine doing the same thing all day for years on out.

My family is from a factory town and many of them were literally standing next to a conveyor doing a monotonic task that could’ve done by a robot.

I tried it for one summer job and my brain almost melted from the boredom and monotony.

code_for_monkey•1mo ago

I feel like ive been stuck in that cycle, and I know its partially just me being in my head about my career, but I really have been basically doing CRUD apps for a decade. Ive made a lot of front end forms, Ive kept up on the latest frameworks and trends, but at the core it really hasnt been dramatically different.

theshrike79•1mo ago

If you really distill it, I've been doing API Glue for about a quarter century.

I connect to a 3rd party API with shitty specs and inconsistent output that doesn't follow even their spec, swear a bit and adjust my estimates[0]. Do some business stuff with it and shove it to another API.

But I've done that now in ... six maybe seven different languages and a few different frameworks on top of that. And because both sides of the API tend to be a bit shit, there's a lot of experience in defensive coding and verification - as well as writing really polite but pointed Corporate Emails that boil down to "it's your shit that's broken, not ours, you fix it".

At this point I really don't care what language I have to use, as long as it isn't Java (which I've heard has come far in the last decade, but old traumas and all that =).

[0] best one yet is the Swedish "standard" for electricity consumption reports, pretty much every field is optional because they couldn't decide and wanted to please every company in on the project. Now write a parser for that please.

code_for_monkey•1mo ago

another 'anyone but Java' developer! For me its not even the language or syntax, its the way people code it. Not every line of python ive ever read is gold, but every time i delve into Java code its like the developer was mad, at me specifically.

ChrisMarshallNY•1mo ago

I'm constantly working on stuff I don't know (the Xcode window behind this browser window is full of that kind of code). I have found LLMs are a great help in pushing the boundaries.

It's humbling, but I do tend to pick up a lot of stuff.

https://littlegreenviper.com/miscellany/thats-not-what-ships...

theshrike79•1mo ago

There are a definitely few ulcer-inducing events in my past that would've taken me an afternoon to fix with a current SOTA LLM vs 2+ weeks of swearing, crying and stressing out.

ChrisMarshallNY•1mo ago

When I come upon an issue, I pretty much immediately copy/paste the code into an LLM, with a description of the context, symptoms, and desired outcome.

It will usually home right in on the bug, or will give me a good starting point.

It's also really good at letting me know if this behavior is a "commonly encountered" one, with a summary of ways it's addressed.

I've probably done that at least a dozen times, today. I guess I'm a rotten programmer.

theshrike79•1mo ago

I've completed actual features by saying "look up issue ABBA-1234 and create a plan to implement it" to Claude.

Then I wait, look through the plan and tell it to implement and go do something else.

After a while I check the diffs and go "huh, yea, that's how I would've done it too", commit and push.

dayjaby•1mo ago

In 10 years this will be a "that's how I would've done it 10 years ago too...or?? I don't remember"

farhanhubble•1mo ago

There's a gut feeling that comes from having gotten your hands dirty enough that tells you if the LLM is being smart or spitting out bullshit.

ChrisMarshallNY•1mo ago

The main issue I have with LLM-generated solutions, is that LLMs never seem to know about “Occam’s Razor.”

Their solution usually benefits from some simplification.

teaearlgraycold•1mo ago

When interviewing candidates I'm always shocked and a little depressed talking to someone with a pumped up resume and 15 years in the field when I realize they can't do much at all.

ludicity•1mo ago

Yeah, I run into this a lot too, hah. It's depressing but also pretty funny when you've got enough distance from it. My favorite was an ex-girlfriend working in HR interviewed a candidate with 15 years of experience, and was told to ask him to solve FizzBuzz in a language of his choice.

(This is obviously a silly test for various reasons, but she was following orders.)

She called me later that day because the guy couldn't do it, so he instead blew the fuck up at HR and accused them of ambushing him with a super complex interview question. From his reaction, she thought that the company had tricked her into making totally unreasonable demands of someone who hasn't had a month to prepare.

God knows what the hell that guy did at his previous role.

PS: I have laughed every time I've seen your username for the past year, and can't remember if I've told you this before.

teaearlgraycold•1mo ago

> PS: I have laughed every time I've seen your username for the past year, and can't remember if I've told you this before.

You haven't but thanks :D

godelski•1mo ago

Jean-Lukewarm Picard, they call him

NumberCruncher•1mo ago

My favourite saying is: "dumb people get old too".

nijave•1mo ago

Reminds me of something I heard at a conference to the effect "10-15 years of <tool> experience is usually a red flag because the only people that have that have been pressing <tool run> button over and over again learning nothing"

fuzztester•1mo ago

My dad mentioned the same anecdote to me long ago, except that the context was manufacturing, and the number of years was 20, not 10.

3acctforcom•1mo ago

Titles in of themselves are meaningless, I've seen a kid hired straight from uni into a "senior" position lol

abustamam•1mo ago

I've job hopped a bit. I've gone from junior to senior to lead to mid-level to staff to senior. I have ten years experience.

My career trajectory is wild. At this rate I'll be CTO soon, then back to mid-level.

Alex_L_Wood•1mo ago

It also doesn’t help that definition of who “staff engineer” is varies wildly by the company.

farhanhubble•1mo ago

Ability is a combination of aptitude, skills, persistence, and exposure. More importantly, intention matters and it show up in the quality of your work. If the intention is to cut corners, no one can stop you from doing shoddy work. Number of years and titles do not matter much if the other parameters are low.

Aptitude paves the way for exploration: learning languages, paradigms, modeling techniques, discovering gotchas and so on. Skills follow from practice and practice requires a tough mindset where you don't give up easily.

So many software engineers learn to code just to pass exams and interviews. They claim they have strong logical reasoning. However, they have only been exposed to ideas and patterns from competitive programming rut. They have never even seen code more complex than a few hundred lines. They haven't seen problems being modeled in different languages. They haven't had the regret of building complex software without enough design. They have not had the disappointment of overengineering solutions. They have never reverse-engineered legacy code. They have never single-stepped in a debugger. All they have learned is to make random changes until "It works on my machine".

Yes, software is complex, disposable, ever-changing and often ugly but that is no excuse for keeping it that way.

snarf21•1mo ago

Yeah, it is way worse than that. In the past two days, I have had two separate non-engineer team members ask some AI agent how some mobile bug should be fixed and posted the AI response in the ticket as the main content and context and acceptance criteria. I then had to waste my time reading this crap (because this is really all that is in the ticket) before starting my own efforts to understand what the real ask or change in behavior needed is.

teaearlgraycold•1mo ago

Your manager should be on their ass for wasting your time.

snarf21•1mo ago

Worse yet, these were done by the managers of the Marketing team and the Mapping team. Plus, these are high profile issues that (somehow) required getting the CEO involved too! (Obviously there is a lot of dysfunction in our organization, lol.)

abustamam•1mo ago

I was going to joke to GP "jokes on you, management is in on it" but apparently it was no joke

abustamam•1mo ago

New career path unlocked — reverse prompt engineering — trying to determine what someone prompted the AI given the slop they put into a ticket

dejj•1mo ago

Plot twist: this universe (planet) was created in order to reverse engineer what the prompt of the previous one was.

tremon•1mo ago

It's not that much of a twist, given that it was basically the plot of THGTTG.

CGamesPlay•1mo ago

THGTTG was actually just an early version of the prompt. "What do you get if you multiply six by nine?"

fragmede•1mo ago

rTX5CMRXIfFG•1mo ago

I think you’re kidding but some LinkedIn influencer will come across this and preach that it’s serious

colechristensen•1mo ago

You close the ticket and ping the manager of the nontechnical person submitting the ticket. Then you have a discussion with management about the arrangement and expectations. If it doesn't go well you polish your resume.

shermantanktop•1mo ago

That sounds like good advice for someone 1) in a less hierarchical organization, and 2) during a period when jobs are easy to get.

colechristensen•1mo ago

Or someone who

1) has the savings such that they aren't a wage slave (applies to any income level)

2) has any dignity

People willingly put themselves in situations where they have no autonomy and no options by living their entire lives paycheck to paycheck or close enough to not make a difference.

shermantanktop•1mo ago

That’s a pretty judgmental take. The only people with dignity in your formulation are independently wealthy.

If I stomped out the door as soon as I had to curb my tongue, I would never build the social and reputational capital required to be effective on bigger projects, and those are fun (to me).

godelski•1mo ago

We're in tech and this shit is happening in big tech companies. Yes, there's many of us who are not getting those wages but every one of them is independently wealthy.

Regardless, you are not a slave. Have a backbone. If you do not stand up for yourself you make it harder for others to do so. Your actions don't just affect you.

colechristensen•1mo ago

People at ALL wage levels put themselves into wage slavery by setting up their finances in a way where they have very little buffer. Mortgage, car payments, savings levels, and all sorts of things combine into being even briefly unemployed is a terrific burden.

In that state, you can't say no, you can't stand up for yourself or anyone else, you can't make choices because choices have significant life effects. You don't have to have a $10MM trust fund to escape this. You just have to live far enough under your means that you have the savings and spending profile that allows you "fuck you" privileges.

Everybody living so close to broke all the time makes everything more expensive, particularly the competitive things like housing.

biglost•1mo ago

Our leader wrote himself a great prompt to fill up Tickets in jira with useless text too and our boss is happy like if he won the lottery. Now instead of ugly but short useful texts now i have yo read a fucking eassay!!!

yieldcrv•1mo ago

Better than dealing with a PM though

Like look, your boss was happy, so the blockade away from your boss that your PM provides is solved

So just write an agent to summarize and read the room like the 200 EQ mastermind that it is, compared to your 0 EQ engineer brain, and move on

AI writing tickets, AI reading tickets, AI writing code from said tickets

only ones affected are middle management and site reliability engineers, poof we can write complex backend scripts too now

Muromec•1mo ago

You were supposed to feed it back into the lying machine to dustill the content from the vapor, not read it. You are not AI native worker and should be kicked out of the otherwise great performing team

andai•1mo ago

Nah, you know how it works? You're not supposed to read it!

Encoding Process

One sentence in -> Several paragraphs out

Decoding Process

Several paragraphs in -> One sentence out

fragmede•1mo ago

No you don't. Feed the essay to Claude and ask it to summarize it for you, or just use the Jira MCP and have Claude or codex take the ticket, use the GitHub MCP to get it the source, have it go work on the ticket, have it generate the code, generate some unit tests, then you go literally yell at your computer using Wispr Flow or some other transcription software to tell Claude how to fix the mess it made, and then you after you've cleaned it up, you submit the PR.

When ChatGPT first came out three years ago, we joked about Devin and having an AI coworker, but I was just given 5 tickets to work on before break, and damned if AI didn't take a well scoped ticket and just did it before I even finished reading that very tightly scoped ticket.

davely•1mo ago

While I personally find that generative AI has helped me be more productive and even boosted my ability to learn things, I really _dislike_ that we’ve normalized this behavior:

Whether a Jira ticket, an email, a yearly review, we feed bullet points into a black box to get a bunch of fluffy text. On the other end, we feed the fluffy text into the black box to get bullet points.

We’re killing penguins because we’re somehow afraid to just send the simplified bullet points to each other in the first place.

baxtr•1mo ago

There is also another way to view this:

People who have a sloppy/lazy way of working are exposed more quickly by this.

dmurvihill•1mo ago

Got this kind of crap from multiple tickets I filed with GitHub

TexanFeller•1mo ago

Even worse for me, some of my coworkers were doing that _before_ coding LLMs were a thing. Now LLMs are allowing them to create MRs with untested nonsense even faster which feels like a DDOS attack on my productivity.

hinkley•1mo ago

There’s a PR on a project I contribute to that is as bad/big as some PRs by problematic coworkers. I’m not saying it’s AI work, but I’m wondering.

duxup•1mo ago

It’s always the developers who can break / bypass the rules who are the most dangerous.

I always think of the "superstars" or "10x" devs I have met at companies. Yeah I could put out a lot of features too if I could bypass all the rules and just puke out code / greenfield code that accounts for the initial one use case ... (and sometimes even leave the rest to other folks to clean up).

tunesmith•1mo ago

As always, this requires nuance. Just yesterday and today, I did exactly that to my direct reports (I'm director-level). We had gotten a bug report, and the team had collectively looked into it and believed it was not our problem, but that of an external vendor. Reported it to the vendor, who looked into it, tested it, and then pushed back and said it was our problem. My team is still more LLM-averse than me, so I had Codex look at it, and it believed it found the problem and prepared the PR. I did not review or test the PR myself, but instead assigned it to the team to validate, partly for learnings. They looked it over and agreed it was a valid fix for a problem on our side. I believe that process was better than me just fully validating it myself, and part of the process toward encouraging them to use LLM as a tool for their work.

xyzzy_plugh•1mo ago

> I believe that process was better than me just fully validating it myself

Why?

> and part of the process toward encouraging them to use LLM as a tool for their work.

Did you look at it from their perspective? You set the exact opposite example and serve as a perfect example for TFA: you did not deliver code you have proven to work. I imagine some would find this demoralizing.

I've worked with a lot of director-level software folk and many would just do the work. If they're not going to do the work, then they should probably assign someone to do it.

What if it didn't work? What if you just wasted a bunch of engineering time reviewing slop? I don't comprehend this mindset. If you're supposedly a leader, then lead.

necovek•1mo ago

2 decades ago, so well before any LLMs, our CEO did that with a couple of huge code changes: he hacked together a few things, and threw it over the wall to us (10K lines). I was happy I did not get assigned to deal with that mess, but getting that into production quality code took more than a month!

"But I did it in a few days, how can it take so long for you guys?" was not received well by the team.

Sure, every case is its own, and maybe here it made sense if the fix was small and testing for it was simple. Personally (also in a director-level role today), I'd rather lead by example and do the full story, including testing, and especially writing automated tests (with LLM's help or not), especially if it is small (I actually did that to fix misuse of mutexes ~12 months ago in one of our platform libraries, when everybody else was stuck when our multi-threaded code behaved as single-threaded code).

Even so, I prefer to sit with them and loudly ask questions that I'd be asking myself on the path to a fix: let them learn how I get to a solution is even more valuable, IMO.

vrighter•1mo ago

and non-devs as well, nowadays

coldtea•1mo ago

It's even worse than that: as long as it somewhat works, managers don't care, if not prefer it (to more slowly developed, more tested, more well architected code)

giancarlostoro•1mo ago

I have said it before on HN using LLMs should 100% justify devs having enough time to test and document the code, and understand it better. The problem I do see though will be management.

0x500x79•1mo ago

I think there are two other things missing: Security and Maintainability. Working code that can never be touched again by a developer or requires an excessive amount of time to maintain is not part of a developers job either.

Overall, this hits the nail on the head about not delivering broken code and providing automated tests. Thanks for putting your thoughts on paper.

am17an•1mo ago

Well a 1000 line PR is still not welcome. It puts too much of a burden on the maintainers. Small PRs are the way to go, tests are great too. If you have to submit a big PR, get buy in from a maintainer first that they will review your code.

mellosouls•1mo ago

Thing is, this has always been the case. One of the problems with LLM-assisted coding is the idea that just because we're in a new era (we certainly are), the old rules can all be discarded.

The title doesn't go far enough - slop (AI or otherwise) can work and pass all the tests, and still be slop.

simonw•1mo ago

The difference is that if it works and passes the tests I don't feel like it's a total waste of my time to look at the PR and tell you why it's still slop.

If it doesn't even work you're absolutely wasting my time with it.

theshrike79•1mo ago

IMO LLMs are forcing us in the other way.

To get the maximum ROI from LLM-assisted programming it needs proper unit tests, integration tests, correctly configured linters, accessible documentation and well-managed git history (Claude actually checks git history nowadays to see when a feature was added if it has a bug)

Worst case we'll still have proper tests and documentation if the AI bubble suddenly bursts. Best case we can skip the boring bits because the LLM is "smart" enough to handle the low hanging fruit reliably because of the robust test suite.

layer8•1mo ago

I’d go further and say while testing is necessary, it is not sufficient. You have to understand the code and convince yourself that it is logically correct under all relevant circumstances, by reasoning over the code.

Testing only “proves” correctness for the specific state, environment, configuration, and inputs the code was tested with. In practice that only tests a tiny portion of possible circumstances, and omits all kinds of edge and non-edge cases.

user34283•1mo ago

I'd go further and say vibe coding it up, testing the green case, and deploying it straight into the testing environment is good enough.

The rest we can figure out during testing, or maybe you even have users willing to beta-test for you.

This way, while you're still on the understanding part and reasoning over the code, your competitor already shipped ten features, most of them working.

Ok, that was a provocative scenario. Still, nowadays I am not sure you even have to understand the code anymore. Maybe having a reasonable belief that it does work will be sufficient in some circumstances.

TheTxT•1mo ago

This approach sounds like a great way to get a lot of security holes into your code. Maybe your competitors will be faster at first, but it’s probably better to be a bit slower and not leaking all your users data.

user34283•1mo ago

I'm mostly thinking about the frontend.

If I had a backend API that was serving user data, I'd of course check more carefully.

This kind of mistake always seemed amateurish to me.

TheTxT•1mo ago

Fair enough. I would still personally feel uneasy about it, but I guess it’s alright if it works for others.

ozozozd•1mo ago

How often do you buy stuff that doesn't work, and you are OK with the provider telling you "we had a reasonable belief that it worked"?

How are we supposed to use software in healthcare, defense, transportation if that's the bar?

user34283•1mo ago

There's a lot of functionality in the frontend that I am building that I did not review. If it worked in testing, that's good enough.

You're free to review every line the model produces. Not every project is in healthcare or defense, and sometimes different standards apply.

ozozozd•1mo ago

I’m assuming you work in a setting where there is a QA team?

I haven’t been in such a setting in 2008 so you can ignore everything I said.

But I wouldn’t want to be somewhere where people don’t test their code, and I have to write code that doesn’t break the code that was never tested until the QA cycle?

user34283•1mo ago

No, in my day job I obsess over every line I add, although there is QA.

In my side project I'm building a frontend that, according to me, is the best looking and most feature rich option out there.

I find that I'm making great progress with it, even when I don't know every line in the project. I understand the architecture and roughly where what functionality is located, and that is good enough for me.

If in testing I see issues with some functionality, I can first ask the model to summarize the implementation. I can then come up with a better approach and have the model make the change. Or alternatively I edit some values myself. So far it wasn't often that I felt the need to write more than a few lines of code manually.

aspbee555•1mo ago

I find myself not really trusting just tests, I really need to try the app/new function in multiple ways with the goal of breaking it. In that process I may not break it but I will notice something that might break, so I rewrite it better

lanstin•1mo ago

If you don't push your system to failure, you can't really say you understand it. And anyways the precise failure modes under various conditions are important characteristics for stability/resiliency. (Does it shed load all the way upto network bandwidth of SYNs; allocate all the memory and then exit; freeze up with deadlocks/disk contention; go unresponsive for a few minutes then recover if the traffic dies off; answer health check pings only and not progress on actual work).

Nizoss•1mo ago

If you write your tests the Test-Driven Development way in that they first fail before production changes are introduced, you will be able to trust them a lot more. Especially if they are well-written tests that test behavior or contracts, not implementation details. I find that dependency injection helps a lot with this. I try to avoid mocking and complex dependencies as much as possible. This also allows me to easily refactor the code without having to worry about breaking anything if all the tests still pass.

When it comes to agentic coding. I created an open source tool that enforces those practices. The agent gets blocked by a hook if it tries to do anything that violates those principles. I think it helps a lot if I may say so myself.

https://github.com/nizos/tdd-guard

Edit: I realize now that I misunderstood your comment. I was quick to respond.

shepherdjerred•1mo ago

A good type system helps with this quite a lot

crazygringo•1mo ago

It helps some. There are plenty of errors, a large majority I'd say, where types don't help at all. Types don't free up memory or avoid off-by-one errors or keep you from mixing up two counter variables.

simianwords•1mo ago

I would like to challenge this claim. I think LLMs are maybe accurate enough that we don't need to check every line and remember everything. High level design is enough.

stuffn•1mo ago

I have plenty of anecdata that counters your anecdata.

LLMs can generate code that works. That much is true. You can generate sufficiently complex projects that simply run on the first (or second try). You can even get the LLM to write tests for the code. You can prompt it for 100% test coverage and it will provide you exactly what you want.

But that doesn't mean OP isn't correct. First, you shouldnt be remembering everything. If you are finding yourself remembering everything your project is either small (I'd guess less than 1000 lines) or you are overburdened and need help. Reasoning, logically, through code you write can be done JIT as you're writing the code. LLMs even suffer from the same problem. Instead of calling it "having to remember to much" we refer to it as a quantity called "context window". The only problem is the LLM won't prompt you telling you that it's context window is so full it can't do it's job properly. A human will.

I think an engineer should always be reasoning about their code. They should be especially suspicious of LLM generated code. Maybe I'm alone but if I use an LLM to generate code I will review it and typically end up modifying it. I find even prompting with something like "the code you write should be maintainable by other engineers" doesn't produce good value.

abathur•1mo ago

I've been tasked with doing a very superficial review of a codebase produced by an adult who purports to have decades of database/backend experience with the assistance of a well-known agent.

While skimming tests for the python backend, I spotted the following:

    @patch.dict(os.environ, {"ENVIRONMENT": "production"})
    def test_settings_environment_from_env(self) -> None:
        """Test environment setting from env var."""
        from importlib import reload

        import app.config

        reload(app.config)

        # Settings should use env var
        assert os.environ.get("ENVIRONMENT") == "production"

This isn't an outlier. There are smells everywhere.

simianwords•1mo ago

If it is so obvious to you that there is a smell here then an agent would have caught it. Try it yourself.

newsoftheday•1mo ago

My jaw hit the table when I read that. Just checking here but, are you being serious?

simianwords•1mo ago

I absolutely believe this and follow what I said to an extent. You don't need to triple check every line of code and deeply understand what it has done - just the highlevel design.

I usually skim through the code (spot some issues like are they using modern version of language?), check the high level design like which interfaces and do manual testing. That is more than enough.

anthonypasq•1mo ago

if your tests cover the acceptance criteria as defined in the ticket, why is all htat other stuff necessary?

sunsetMurk•1mo ago

Acceptance criteria are often buggy themselves, and require more context to interpret and develop a solution.

otterley•1mo ago

If you don't have sufficiently detailed acceptance criteria, how can anyone be expected to write code to satisfy them?

That's why you have to start with specifications. See, e.g., https://martinfowler.com/articles/exploring-gen-ai/sdd-3-too...

9rx•1mo ago

I wonder how many more times we'll rebrand TDD (BDD, SDD)?

Just 23 more times? ADD, CDD, EDD, DDD, etc.

Or maybe more?! AADD, ABDD, ACDD, ..., AAADD, AABDD, etc.

pydry•1mo ago

BDD is different, it is a way of gathering requirements.

As is, SDD it is some sort of AI nonsense.

otterley•1mo ago

Developers who aren't yet using AI would benefit from specs as well. They're good to have whether it's you or an LLM that's writing code. As a general rule, the clearer and less ambiguous the criteria you have, the better.

9rx•1mo ago

BDD was trying to recapture what TDD was originally, renamed from TDD in an effort to shed all the confusion that surrounded TDD. Of course, BDD picked up all of its own confusion (e.g. Gherkin/Cucumber and all that ridiculousness). So now it is rebranded as SDD to try and shed all of that confusion, with a sprinkle of "AI" because why not. Of course, SDD already is clouded in its own confusion.

Testing is the least understood aspect of computer science and it turns out that you cannot keep changing the name and expect everyone to suddenly get it. But that won't stop anyone. We patiently await the next rebrand.

anthonypasq•1mo ago

thats impossible. Bugs are defined as when the code does not match the acceptance criteria.

Yodel0914•1mo ago

Because AC don’t cover non-functional things like maintainability/understandability, adherence to corporate/team standards etc.

layer8•1mo ago

If your acceptance criteria state something like “produces output f(x) for any inout x, where f(x) is defined as follows: […]”, then you can’t possibly test that, because you can’t test all possible values of x. And if the criteria don’t state that, then they don’t cover the full specification of how the software is expected to behave, hence you have to go beyond those criteria to ensure that the software always behaves as expected.

You can’t prove that something is correct by example. Examples can only disprove correctness. And tests are always only examples.

roeles•1mo ago

Since we can't really formally prove most code, I think property based testing such as with hypothesis[1] would make sense. I have not used it yet, but am about to for stuff that really needs to work.

[1] https://news.ycombinator.com/item?id=45818562

xendo•1mo ago

We can't really property test most code. So it comes down, as with everything, to good judgement and experience.

epgui•1mo ago

You can property test most code.

crabmusket•1mo ago

> "proves"

I like using the word "demonstrates" in almost every case where people currently use the word "proves".

A test is a demonstration of the code working in a specific case. It is a piece of evidence, but not a general proof.

And these kinds of narrow ad-hoc proofs are fine! Usually adequate.

To rephrase the title of TFA, we must deliver code that is demonstrated to work.

array_key_first•1mo ago

I agree - it's trivial to write 100% test coverage if your code isn't robust and resilient and just does "happy path" type stuff.

9rx•1mo ago

Testing is not perfect, but what else is there? Even formal proofs are just another expression of testing. With greater mathematical guarantees than other expressions, granted, but still testing all the same; prone to all the very same human problems testing is burdened with.

layer8•1mo ago

The difference with proofs (whether formal or informal) is that they quantify over all possible cases, whereas testing is always limited to specific cases.

9rx•1mo ago

There is no difference. It is all testing. Testing captures the full gamut, from simply manually using the software all the way up to formal proofs. Although the advantages of formal proofs over other modes of testing was already written about, so it is unclear what you are trying to add. Perhaps you want to clarify?

Yodel0914•1mo ago

Came to leave the same comment. It’s very possible to deliver code that’s proven to work, that is still shit.

softwaredoug•1mo ago

A lot of AI coding changes coding to more of a declarative practice.

Claude, etc, works best with good tests that verify the system works. And so the code becomes in some ways the tests rather than the code that does the thing. If you're responsible for the thing, then 90% of your responsibility moves to verifying behavior and giving agents feedback.

0xbadcafebee•1mo ago

Actually it's more specific than that. A company pays you not just to "write code", not just to "write code that works", but to write code that works in the real world. Not on your laptop. Not in CI tests. Not on some staging environment. But in the real world. It may work fine in a theoretical environment, but deflate like a popped balloon in production. This code has no value to the business; they don't pay you to ship popped balloons.

Therefore you must verify it works as intended in the real world. This means not shipping code and hoping for the best, but checking that it actually does the right thing in production. And on top of that, you have to verify that it hasn't caused a regression in something else in production.

You could try to do that with tests, but tests aren't always feasible. Therefore it's important to design fail-safes into your code that ALERT YOU to unexpected or erroneous conditions. It needs to do more than just log an error to some logging system you never check - you must actually be notified of it, and you should consider it a flaw in your work, like a defective pair of Nikes on an assembly line. Some kind of plumbing must exist to take these error logs (or metrics, traces, whatever) and send it to you. Otherwise you end up producing a defective product, but never know it, because there's nothing in place to tell you its flaws.

Every single day I run into somebody's broken webapp or mobile app. Not only do the authors have no idea (either because they aren't notified of the errors, or don't care about them), there is no way for me to even e-mail the devs to tell them. I try to go through customer support, a chat agent, anything, and even they don't have a way to send in bug reports. They've insulated themselves from the knowledge of their own failures.

cynicalsecurity•1mo ago

It gets interesting when a company assigns 2 story points to a task that requires 6 minimum. No time for writing tests, barely any time to perform code reviews and QA. Also, next year the company tells you since we have AI now, all tickets must be done 2 times quicker.

Who popped this balloon? I know I need to change my employer, but it's not so easy. And I'm not sure another employer is going to be any better.

roryirvine•1mo ago

Are you not involved in doing the estimation?

asadotzler•1mo ago

involved in is meaningless. if 10 people at a table all offer their inputs, it doesn't matter if mine is the rational one, or even if I hedged against their irrationality with an inflated estimate, the 9 other estimates will dominate. That's the whole problem here, a lack of autonomy and a lack of expectations for responsibility. Make the developer responsible for the estimate, and hold them accountable for the results. Letting the organization make the estimate and then blaming an LLM for the failure is a recipe for company collapse.

theshrike79•1mo ago

Me: "Boss, this takes at least 4 weeks to complete properly including QA time."

Boss: sucks in air through his teeth "Best I can do is one week. Get to it."

Me, with a massive mortgage and the job market is shit: "Rogerroger, bossman"

mystifyingpoi•1mo ago

Classic butchering of otherwise decent Scrum idea. If assigning 2 points means no tests, then you are already using story points wrong, and complaining about it is meaningless.

rglover•1mo ago

"Slow the f*ck down." - Oliver Reichenstein [1]

This only happens because the software industry has fallen into the Religion of Speed. I see it constantly: justified corner-cutting, rushing shit out the door, and always loading up another feature/project/whatever with absolutely zero self-awareness. AI is just an amplifier for bad behavior that was already causing chaos.

What's not being said here but should be: discipline matters. It's part of being a professional and always precedes someone who can ship code that "just works."

[1] https://ia.net/*

simonw•1mo ago

A few years ago I embraced automated tests and comprehensive documentation for even my smallest personal projects because I found that they sped me up. https://simonwillison.net/2022/Nov/26/productivity/

Nizoss•1mo ago

I would love to hear your thoughts on TDD-Guard. An open source plugin I created to enfore Test-Driven Development practices on agents:

https://github.com/nizos/tdd-guard

acrophiliac•1mo ago

Perhaps off-topic, but: "Testing doesn't show the absence of errors, it shows the presence of errors" Willison says we need to submit code we have proven to work but then argues for empirical testing, not actual correctness proofs.

simonw•1mo ago

If you can formally prove correctness then brilliant, go for it!

That's not something I've seen or been able to achieve in most of my professional work.

trevor-e•1mo ago

> Your job is to deliver code you have proven to work.

Strong disagree here, your job is to deliver solutions that help the business solve a problem. In _most_ cases that means delivering code that you should be able to confidently prove satisfies the requirements like the OP mentioned, but I think this is an important nitpick distinction I didn't understand until later on in my career.

draw_down•1mo ago

Man we better hope the solution to that problem is working code. Otherwise we better start working the fryers or something.

newsoftheday•1mo ago

There's no distinction there, proving the work is correct is within the scope of helping the business solve a problem; not without and not beside it. So your point is hot air, making a distinction where none exists.

casey2•1mo ago

The distinction does matter. Requirements can be and often are wrong when the rubber meets the road. If the cost of implementing requirements correctly is $1,000,000 and the value of the product is $1000 then even choosing to clarify requirements has failed the business. The non failure mode here is to write the code for less than $1,000. Even if the code doesn't work at all!

tech-ninja•1mo ago

> In _most_ cases that means delivering code that you should be able to confidently prove satisfies the requirements like the OP mentioned

That is an insane distinction that you are trying to do there. In which cases delivering code that doesn't satisfy the requirements would solve a business problem?

claar•1mo ago

IMO, it's not insane at all - I didn't understand this until I moved from developer to executive.

Solving the business need has precedence over technical correctness.

Satisfying "what I think the requirements are" without considering the business need causes most code rework in my experience.

nrhrjrjrjtntbt•1mo ago

When you can ship Redis for example instead of rolling your own cache.

adrianmonk•1mo ago

I will make up some numbers for the sake of illustration. Suppose it takes you half as long to develop code if you skip the part where you make sure it works. And suppose that when you do this, 75% of the time it does work well enough to achieve its goal.

So then, in a month you can either develop 10 features that definitely work or 20 features that have a 75% chance of working. Which one of these delivers more value to your business?

That depends on a lot of things, like the severity of the consequences for incorrect software, the increased chaos of not knowing what works and what doesn't, the value of the features on the list, and the morale hit from slowly driving your software engineers insane vs. allowing them to have a modicum of pride in their work.

Because it's so complex, and because you don't even have access to all the information, it's hard to actually say which approach delivers more value to the business. But I'm sure it goes one way some of the time and the other way other times.

I definitely prefer producing software that I know works, but I don't think that it's an absurd idea the other way delivers more business value in certain cases.

theshrike79•1mo ago

Not all problems are "work" vs "doesn't work".

We're not talking about making a calculator that can't calculate 1+1. This might be a website that's a bit slow and janky to use.

25% of users go away because it's shit, but 75% stay. And it would've too much effort to push the jank to zero and retain a 100%.

A website that takes juuuust too long to load still "satisfies requirements" in most cases, especially when making loading instant carries a significant extra cost the customer isn't willing to pay for.

p2detar•1mo ago

I didn’t read it this way, but I admit the comment is somewhat vague. I thought GP meant that not all solutions require delivering code. In my job when I get a support inquiry, I first try to think what exactly is the customer’s end-goal. Often support didn’t get what the customer’s real pain is. Some solutions are reduced to pointing them to some unusual workflow that solved their problem. This way I don’t need to tinker any code at all.

trevor-e•1mo ago

No, what I meant is sometimes the solution is not delivering any code at all.

Many times in my career, after understanding the problem at hand and who initiated it, I realized the solution is actually one of:

1) a people/organizational problem, not technical 2) doesn't make sense to code a complicated system when it could be a simple Google Sheet 3) the person actually has a completely different problem 4) we already have a solution they didn't know about

My issue with the OP is that it highly emphasizes delivering code. We are not meant to be code monkeys, we are solving problems at the end of the day. Many people I've met throughout my career forget that and immediately jump into writing code because they think that's their job.

sharkjacobs•1mo ago

Maybe I'm not late enough in my career to understand what you're saying, but what kind of problems are you helping the business solve with code that hasn't been proven to work?

SoftTalker•1mo ago

Getting a big customer to pay for a product that your sales team said could do X, Y, and Z but Y wasn't part of the product and now you need some plausible semblance of Y added so that you can send an invoice. If it doesn't work, that can be addressed later.

ambicapter•1mo ago

Getting a big sale by hacking together a demo that wouldn't scale up in the slightest without a complete rework of your backend.

trevor-e•1mo ago

Sorry I wrote that hastily and my wording seems to have caused much confusion. Here's a rewrite:

> The job is to help the business solve a problem, not just to ship code. In cases where delivering code actually makes sense, then yeah you should absolutely be able to prove it works and meets the requirements like the OP says. But there are plenty of cases where writing code at all is the wrong solution, and that’s an important distinction I didn’t really understand until later in my career.

Although funnily enough, the meaning you interpreted also has its own merit. Like other commenters have mentioned, there's always a cost tradeoff to evaluate. Some projects can absolutely cut corners to, say, ship faster to validate some result or gain users.

antod•1mo ago

It's more than just solving a problem though, you should be solving the given problem without creating new problems. This is where the working/secure code aspect comes in.

nrhrjrjrjtntbt•1mo ago

> Strong disagree here, your job is to deliver solutions that help the business solve a problem.

Sure. That is every job though. It is interesting to muse on. Hard for us to solve a problem without a computer (or removing one!)

gardenhedge•1mo ago

Yeah, more correctly, the description should be:

"Your job is to deliver technical solutions that help the business solve a problem"

Where the word technical does the work of representing your skill set. That means you won't be asked to create a marketing campaign (solution) to help the business sell more product (problem).

ozozozd•1mo ago

Didn’t know your code could satisfy requirements without working. /s

My priorities are as follows: 1) code works 2) code satisfies requirements

Not sure how anyone can prove their code satisfies requirements when it doesn’t run.

just_once•1mo ago

I don't know if there's a word for this but this reads to me as like, software virtue signaling or software patronizing. It's bizarre to me to tell an engineer what their job is as a matter of fact and to claim a particular usage of a tool as mandated (a tool that no one really asked for, mind you), leveraging duty of all things.

I guess to me, it's either the case that LLMs are just another tool, in which case the already existing teachings of best practice should cover them (and therefore the tone and some content of this article is unnecessary) or they're something totally new, in which case maybe some of the already existing teachings apply, but maybe not because it's so different that the old incentives can't reasonably take hold. Maybe we should focus a little bit more attention on that.

The article mentions rudeness, shifting burdens, wasting people's time, dereliction. Really loaded stuff and not a framing that I find necessary. The average person is just trying to get by, not topple a social contract. For that, look upwards.

dkural•1mo ago

I've really seen both I suppose. A lot of devs don't take accountability / responsibility for their code, especially if they haven't done anything that actually got shipped and used, or in general haven't done much responsible adulting.

just_once•1mo ago

No doubt. Interesting to think about why that is without assuming it's a character flaw.

simonw•1mo ago

LLMs are just another tool, but they're disruptive enough that existing best practices need to be either updated or re-explained.

A lot of people using LLMs seem not to have understood that you can't expect them to write code that works without testing it first!

If that wasn't clearly a problem I wouldn't have felt the need to write this.

just_once•1mo ago

Yep, it's a real problem. No dispute there.

My intention isn't to argue a point, just to share my perspective when I read it.

I read your response here to be saying something like "I noticed that people are misunderstood about X, so I wanted to inform them". In this case "X" isn't itself very obvious to me (For any given task, why can't you expect that a cutting edge LLM would be able to write it without requiring your testing that?) but most importantly, I don't think I would approach a pure misunderstanding (tantamount to a skills gap) with your particular framing. Again, to me it reads as patronizing.

Love the pelican on the bicycle, though. I think that's been a great addition to the zeitgeist.

morning-coffee•1mo ago

Amen

nish__•1mo ago

Good framing.

onion2k•1mo ago

I want to distill this post into some sort of liquid I can inject directly into my dev teams. It's absolutely spot on. Seeing a PR with a change that doesn't build is one of the most disappointing things.

ericmcer•1mo ago

The requirements in this article are... the bare minimum for a PR. Like yeah it needs to work is the no duh requirement. I have seen tons of PRs that work but defy conventions or add a bunch of useless cruft that we can rip out once I sit down and talk with them about what they did.

When someone pings me for a review and their code isn't even passing CI builds/tests I just let them know its failing and don't even look at a line of their code.

simonw•1mo ago

> The requirements in this article are... the bare minimum for a PR.

Yeah, I'm a bit sad I felt the need to write this to be honest.

yuedongze•1mo ago

it's very similar to the verification engineering problem i wrote about on HN last week. AI is as good as we can prove their work is genuine. and we need humans in the loop to fill in the gaps between autonomous systems and ultimately be held accountable by human laws. it's kind of sad but the reality we are facing

cyrialize•1mo ago

When I start working on a ticket I actually start writing up a PR early on.

As I figure out my manual testing, I'll write out the steps that I took in my PR.

I've found that writing it out as I go does two things: 1) It makes it easier to have a detailed PR and 2) it acts as a form of rubber-ducking. As I'm updating my PR I'll realize steps I've missed in my testing.

Something that also helped out with my manual testing skill was working in a place that had ZERO automated testing. Every PR required a detailed testing plan that you did and that your reviewer could re-create.

givemeethekeys•1mo ago

Sorry that’s not what it says in my job description.

dangus•1mo ago

Your job isn’t to deliver code that works, it’s to successfully[1] operationalize business logic.

[1] I.e., it should work

That may seem pedantic but that’s a huge difference. Code is a means to an end. If no-code suddenly became better than code through some miracle, that would be your job.

This also means that if one day AI stops making mistakes, tossing AI requests over the wall may be a legitimate modus operandi.

SunshineTheCat•1mo ago

I know this won't be popular, however, I think the idea of differentiating a "real developer" from one who relies mostly, or even solely on an LLM is coming to an end. Right now, I fully agree relying wholly upon an LLM and failing to test it is very irresponsible.

LLMs do make mistakes. They do a sloppy job at times.

But give it a year. Two years. five years. It seems unreasonable to assume they will hit a plateau that will prevent them from being able to build, test, and ship code better than any human on earth.

I say this because it's already happened.

It was thought impossible for a computer to reach the point of being able to beat a grandmaster at chess.

There was too much "art," experience, and nuance to the game that a computer could ever fully grasp or understand. Sure there was the "math" of it all, but it lacked the human intuition that many thought were essential to winning and could only be achieved through a lifetime of practice.

Many years following Deep Blue vs. Garry Kasparov, the best players in the world laugh at the idea of even getting close to beating Stockfish or any other even mediocre game engine.

I say all of this as a 15-year developer. This happens over and over again throughout history. Something comes along to disrupt an industry or profession and people scream about how dangerous or bad it is, but it never matters in the end. Technology is undefeated.

newsoftheday•1mo ago

> There was too much "art," experience, and nuance to the game that a computer could ever fully grasp or understand.

That's the thing though, AI doesn't understand, it makes us feel like it understands, but it doesn't understand anything.

simonw•1mo ago

Turns out that doesn't matter for chess, where the winning conditions are formally encoded.

JackSlateur•1mo ago

  This happens over and over again throughout history.

Could you share a single instance of a machine that think ? Are we sharing the same timeline ?

xmodem•1mo ago

What's your point, though? Let's assume your hypothesis and 5 years from now everyone has access to an LLM that's as good as a typical staff engineer. Is it now acceptable for a junior engineer to submit LLM-generated PRs without having tested them?

> It was thought impossible for a computer to reach the point of being able to beat a grandmaster at chess.

This is oft-cited but it takes only some cursory research to show that it has never been close to a universally-held view.

SunshineTheCat•1mo ago

In the scenario I'm hypothesizing, why would anyone need to "check" or "test" its work? What chess players are checking to make sure Stockfish made the "right" move? What determines whether or not it's "right" is if Stockfish made it.

asadotzler•1mo ago

There are clear win conditions in chess. There are not for most software engineering tasks. If you don't get this, it's probably a safe bet that you're not an engineer.

SunshineTheCat•1mo ago

Right, which is why Deep Blue won in the early 90's and now years later, AI is moving on to far more complicated tasks, like engineering software.

The fact that you gave me the "you just don't understand, you're not a chess grandmaster" emotional response helps indicate that I'm pretty much right on target with this one.

FWIW I have been engineering software for over 15 years.

xmodem•1mo ago

Your post sent me down a rabbit hole reading about the history of computers playing chess. Notable to me is that AI advocates were claiming that a computer would be able to beat the best human chess players within 10 years as far back as the 1950s. It was so long ago they had to clarify they were talking about digital computers.

Today I learned that AI advocates being overly optimistic about its trajectory is actually not a new phenomenon - it's been happening for more than twice my lifetime.

throw1235435•1mo ago

Its hard to imagine now but the code won't matter. We will have other methods of validating the product I think; like before tech. There are many ways to validate something; this is an easier problem than creation (which these AI models are somewhat solving right now)

All very demoralizing but I can see the trend. In the end all "creative" parts of the job will disappear; AI gets to do the fun stuff.

We invented something that devalues the human craft and contribution -> if you weren't skilled in that and/or saw it as a barrier you win and are excited by this (CEO types, sales/ideas people, influencers, etc). If you put the hard yards in and did the work to build hard skills and built product; you lose.

Be very clear: AI devalues intelligence and puts more value on what is still scarce (political capital, connections, nepotism, physical work, etc). It mostly destroys meritocracy.

gitaarik•1mo ago

Yes, we're already there, and the human responsibilities are shifting from engineering to architecting. The AI does the execution, the human makes the decisions. Because LLMs can never make decisions fully by themselves, because they need to be programmed by humans, otherwise they go out of sync with what we actually want.

throw1235435•1mo ago

You will get downvoted but I unfortunately agree with you; also as a SWE of similar tenure. People assume there's other things to jump to and yes in the short term there may be. But the industry already has those things on its roadmap to disrupt (i.e. generate more economic useful work).

For better or worse the software career is wounded, and the AI wolves can smell blood. Its low hanging fruit that they understand and in its disruption can make a lot of money.

As an industry is dying of disruption the leftover money is made in disrupting it first, this will speed up engineering efforts to kill the profession like rats in a sinking ship. Corporate stakeholders will also be the first to spend big on anything that does; they in my experience prefer communicators, and people accountable, not people who deliver.

It was a good ride. I could never of imagined this trajectory 3 years ago.

kords•1mo ago

I agree that tests and automation are probably the best things we can do to validate our code and author of the PR should be more responsible. However they can't prove that the code works. It's almost the opposite: If they pass and it's a good coverage, then code has better chances to work. If they fail, then they prove code doesn't work.

llm_nerd•1mo ago

"the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest."

Kind of depressing how it has become such a trope of blaming juniors for every ill or bad habit. In all likelihood the reader of this comment has a number of terrible habits, working on teams with terrible habits, and juniors play zero part in it.

And, I mean, on that theme developers have been doing this for as long as we've had large teams. I've worked at a large number of teams where there was the fundamental principal that QA / UA holds responsibility. That they are responsible for tests, and they are responsible for bad code making it through to the product / solution. Developers -- grizzled, excellent-CV devs -- would toss over garbage code and call it a day.

annjose•1mo ago

I came here to say

1) Amen 2) I wonder if this is isolated to junior dev only? Perhaps it seems like that because junior devs do more AI assisted coding than seniors?

WhyOhWhyQ•1mo ago

Isn't this in contradiction to your blog post from yesterday though? It's impossible to prove a complex project made in 4.5 hours works. It might have passed 9000 tests, but surely there are always going to be edge cases. I personally wouldn't be comfortable claiming I've proved it works and saying the job is done even, if the LLM did the whole thing and all existing tests passed, until I played with it for several months. And even then I would assume I would need to rely on bug reports coming in because it's running on lots of different systems. I honestly don't know if software is ever really finished.

My takeaway from your blog post yesterday was that with a robust enough testing system the LLM can do the entire thing while I do Christmas with the family.

(Before all the AI fans come in here. I'm not criticizing AI.)

BeefySwain•1mo ago

Consider that this isn't just a random AI slopped assortment of 9,000 tests, but instead is a robust suite of tests that cover 100% of the HTML5 spec.

Does this guarantee that it functions completely with no errors whatsoever? Certainly not. You need formal verification for that. I don't think that contradicts what Simon was advocating for though in this post.

WhyOhWhyQ•1mo ago

I think it would be interesting if professional engineering becomes more like producing formally correct documents for the AI to implement.

ncruces•1mo ago

We have these tools that we use to write formally correct documents.

They're called programing languages, and a deterministic algorithm translates them to machine code.

Are we sure English and a probabilistic algorithm is any better at this?

WhyOhWhyQ•1mo ago

I actually hate AI in my core, to the point that if it gets too much more advanced I'll likely be in existential crisis, so don't attack me on those grounds. Given it exists, I'm going to find what's good about it though. I do think the problem of AI existing has to be confronted. Maybe one solution is what the human does is produce specs like the HTML 5 one, and what the AI does is implement it in software.

simonw•1mo ago

That's why I don't consider my blog post from yesterday to be production quality code. I'd need to invest a lot more work in reviewing it before I staked my reputation on it

maerF0x0•1mo ago

> The first is manual testing. If you haven’t seen the code do the right thing yourself, that code doesn’t work. If it does turn out to work, that’s honestly just pure chance.

Depending on exactly what the author meant here, I disagree. Our first and default tool should be some form of lightweight automated testing. It's explicit (serves a form of spec and docs how to use the software), it's repeatable (manual testing is done once and it's result is invalidated moments later), and it's cost per minute of effort is more or less the same (most companies have the engineers do the tests, they are expensive).

Yes. There will be exceptions and exceptional cases. This author is not talking about exceptions and neither am I. They're not an interesting addition to this conversation.

tech-ninja•1mo ago

I disagree, no company no matter the size will have E2E or integrations tests for all of its features, it's just not feasible.

Unless you are working on a tiny change on a highly tested part of the code you should be manually testing your code and/or adding some tests.

IMTDb•1mo ago

> Our first and default tool should be some form of lightweight automated testing

Manual verification isn't about skipping tests, it's about validating what to test in the first place.

You need to see the code work before you know what "working" even means. Does the screen render correctly? Does the API return sensible data? Does the flow make sense to users? Automated tests can only check what you tell them to check. If you haven't verified the behavior yourself first, you're just encoding your assumptions into test cases.

I'd take "no tests, but I verified it works end-to-end" over "full test coverage, but never checked if it solves the actual problem" every time. The first developer is focused on outcomes. The second is checking boxes.

Tests are crucial: they preserve known-good behavior but you have to establish what "good" looks like first, and that requires human judgment. Automate the verification, not the discovery. So our first and default tool remains manual verification

maerF0x0•1mo ago

I suppose we could be talking circles around eachother, but I'd say many of what you've suggested as manual tests could be codified into an automated test just as easily.

Manual: `curl localhost:8080 | jq .` or whatever, brings value once.

Automated: `assert.ValidJSON(req.Body)` is basically identical, but can be repeated over and over again

codeviking•1mo ago

I'm a big fan of lightweight, automated tests. Despite that, I still default to manual verification. Usually I do both.

Automated tests omit a certain type of feedback that I think remains important to the development loop. Automation doesn't care about a poor UX; it only verifies what you tell it to.

For instance, I regularly contribute to a CLI that's widely used at $WORK. I can easily write tests to verify the I/O of a command I'm working on that assert correctness. Yet if I actually try to use the command I'm changing, usually as part of verifying my changes, I tend to discover usability issues that make the program more pleasant to use and the tests would happily ignore.

Also, there's certainly cases where automation isn't worth the cost. Maybe because the resulting tests are complex, or brittle. I've often found UI tests to lie in this category (but maybe I'm doing them wrong).

Because of these things I think manual testing is the right default. Automated tests should also exist; but manual tests should _always_ be part of the process.

fjfaase•1mo ago

One more reason to work without branches and PRs. The future for CI/CD is bright ;-).

Nizoss•1mo ago

This is how I would also love to work but not all teams prefer this way. How many are you in your team? Was it easy to switch?

fjfaase•1mo ago

I twice worked in a teams where we did not use branches (or PRs). Both were working like that when I joined them.

The first was because we were svn (and maybe even csv before that, but I cannot remember) and that did not support branching easily. That team did switch to git, which did not go with its some struggles, and misconceptions, such as: "Never use rebase."

The second team was already working without branches and releasing a new version of the tool (the Bond3D Slicer for 3D printing) every night. It worked very well. Often we were able to implement and release new features within two or three days allowing the users to continue with their experiments.

When after some years the organization implemented more 'quality assurance' they demanded that we would make monthly releases that were formally tested by the users, we created branches for each release. The idea was that some of the users would test the releases before they were official released, but that testing would often take more than a month, one time even three months, because they were 'too busy' to do the formal review. But at the same time some users were using the daily builds because these builds had the features implemented that they needed. As a result of this, the quality did not improve and a lot of time was wasted, although the formal quality assurance, dictated by some ISO standard, was assured.

I have no experience with moving away from using branches. It might be a good idea to point your manager/team lead/scrum master to dora.dev or the YouTube channel: https://www.youtube.com/@ModernSoftwareEngineeringYT

ianberdin•1mo ago

The solution is easy: responsibility.

The point is to hire people who can own code and codebase. “Someone will review” is dead end.

dekhn•1mo ago

Prove is a strong word. There are few cases in real-world programming where you can prove anything.

I prefer to make this probabilistic: use testing to reduce the probability that your code isn't correct, for the situations in which it is expected to be deployed. In this sense, coding and testing is much like doing experimental physics: we never really prove a theory or disprove it, we just invalidate clearly wrong ones.

newsoftheday•1mo ago

Testing must cover all cases else a 10 LOC LLM created PR is inherently more dangerous than a human 100 LOC PR because the LLM will likely also have written the test cases and it will try to make it all balance out with all passing; instead of making sure the test cases actually cover everything with the type of logic a human would apply.

vladsh•1mo ago

We should get back to the basic definition of the engineering job. An engineer understands requirements, translates them into logical flows that can be automated, communicates tradeoffs across the organization, and makes tradeoff calls on maintainability, extensibility, readability, and security. Most importantly, they’re accountable for the outcome, because many tradeoffs only reveal their cost once they hit reality

None of this is covered by code generation, nor by juniors submitting random PRs. Those are symptoms of juniors (not only) missing fundamentals. When we forget what the job actually is, we create misalignment with junior engineers and end up with weird ideas like "spec-driven development"

If anything, coding agents are a wake-up call that clarify what engineering profession is really about

venturecruelty•1mo ago

How do you square that with "use AI and get this feature done in three days or have your 'performance reviewed' with HR in the room"? Because I'm having trouble bridging that gap.

Edit: help, the new org said the same thing. :(

Edit 2: you guys, seriously, the HR lady keeps looking up at me and shaking her head. I don't think this is good. I tried to be a real, bigboy engineer, but they just mumbled something about KPIs and put me on a PIP.

rnewme•1mo ago

Uptime x customer satisfaction vs. stack of cards. If they don't understand engineering prepare CV and head over to org that does.

tete•1mo ago

I think people are getting used to stuff not working. People (like me) use crap like Teams, Slack, that web version of Office, Outlook, etc. on a daily basis and pour huge amounts money in. They use shit like Fortinet (the digital version of dream catchers) and so on.

Things break. A lot. Doctors successful or not also deal with the same shitty IT on a daily basis.

Nobody cares about engineering. It's about selling stuff, not about reliability, etc.

And to some degree one is forced to use that stuff anyways.

So sure you can go to a company understanding engineering, but if you do a job for salary you might lose out on quite a bit on it if you care for things like quality. We see this in so many different sectors.

Sure there is a unicorn here and there that makes it for a while. And they might even go big and then they sell the company or change to maximizing profits, because that's the only way up when you essentially already made it (on top of one of the big players).

For small projects/companies it depends if you have a way to still launch big, which you can usually do with enough capital. You can still make a big profit with a crappy product then, but essentially only once or twice. But then your goal also doesn't have to create quality.

Microsoft and Fortinet for example wouldn't profit from adding (much) quality. They profit from hypes. So they now both do "AI".

rnewme•1mo ago

Yup, we are all definitely lowering the bar of what's acceptable when it comes to uptime and bugs. More features more hype x10 seems to be the standard approach to market, but there are still a lot of companies and teams where greybeards and rational folks remember and understand previous hype cycles/bubbles, and who appreciate and protect the engineering approach. It's just that they mostly hire/partner by reference, so it's kinda hard to exit the toxic bubble of startups and "growth hacking" enterprises.

newsoftheday•1mo ago

Agreed.

https://read.engineerscodex.com/p/how-one-line-of-code-cause...

When 10K LOC AI PR's are being created, sometimes by people who either don't understand the code or haven't reviewed the code their trying to submit; the 60 million dollar failure line is potentially lying in wait.

tete•1mo ago

Okay, then software engineers are not engineers.

The whole reliability, etc. to many is not of much priority. Things got an absolutely shitshow and still everyone buys it.

In other words the only outcome will be that people don't have or don't want to have engineers anymore.

Companies are very much not interested in someone who does the above, but at most someone who sells or cosplays these things - if even.

Cause that what creates income. They don't care if they sell crap, they care that they sell it and the cheaper they can produce the better. So money gets poured into marketing not quality.

High quality products are not sought after. And fake quality like putting a computer or a phone in a box like jewelry, even if you throw that very box away the next time you walk by a trash bin. That's what people consider quality these days, even if it's just a waste of resources.

And businesses choose products and services the same way as regular consumers, even when they want the marketing to make them feel good about it in a slightly different way, because marketing to your target audience makes sense. Duh!

People are ready to pay more for having the premium label stamped on to something, pay more to feel good about it, but most of the time are very unwilling to pay for measurable quality, an engineer provides.

It's scary, even with infrastructure the process seems to change, probably also due to corruption, but that's a whole other can of worms.

> communicates tradeoffs across the organization

They may do that. They may be recognized for it. But if the guy next door with the right cosplay says something like "we are professionals, look at how we have been on the market for X years" or "look at our market share" then no matter how far from reality the bullshitting is they'll be getting the money.

At the beginning of the year/end of last year I learned how little expertise, professionalism and engineering are required to be a multi billion NASDAQ stock. For months I thought that it cannot possibly be, that the core product of a such a company displays such a complete lack of expertise in the core area(s). Yet, they somehow managed to convince management to just invest a couple more times of money than the original budget that was already seen as quite the stretch. Of course they promises didn't end being anywhere close to true, and they completely forgot to inform us (our management) about severe limitations.

So if you are good at selling to management which you can be by pocketing consultants recommending you then things will work seemingly no matter what.

> If anything, coding agents are a wake-up call that clarify what engineering profession is really about

I believe what we need to wake up to or come to terms with is that our industry (everything that would go into NASDAQ) is a farce. Coding agents show that. It doesn't matter to create half-assed products if you come to sell them. You are selling your products to people. Doesn't matter if it's some guy at a hot dog stand or a CEO of a big successful company or going from house to house selling the best vacuum cleaner ever. What matters is you making people believe it would be stupid not to take your product.

order-matters•1mo ago

TBH I think Information Systems Engineering and Computer Engineering can just eat software engineers lunch at this point. the entire need for a separate engineering discipline on software was for low level coding. Custom hardware chips are easier to make for simple things and not a lot of need in low level coding anymore for more complex things means the focus is shifting back to either hardware choices or higher level system management

I'd argue the only places left you really need low level coding fall under computer science. If you are a computer or systems engineer who needs to work with a lot of code then youll benefit from having exposure to computer science, but an actual engineering discipline for just software seems silly now. Not to mention pretty much all engineers at this point are configuring software tools on their own to some degree

I think it's similar to how there used to be horse doctors as a separate profession from vets when horses were much more prominent in everyday life, but now they are all vets again and some of them specialize in horses

chasd00•1mo ago

> I believe what we need to wake up to or come to terms with is that our industry (everything that would go into NASDAQ) is a farce.

the thing is, with software development, it's always been this way. Developers have just had tunnel vision for decades because they stare into an editor all day long instead trying to actually sell a product. If selling wasn't the top priority then what do you think would happen to your direct deposit? Software developers, especially software developers, live in this fantasy land where the believe their paycheck just happens automatically and always will. I think it's becoming critical that new software devs entering the workforce spend a couple years at a small, eat what you kill, consultancy or small business. Somewhere where they can see the relationship between building, selling, and their paycheck first hand.

heliumtera•1mo ago

Technology has absolute qualities. Not a fantasy. Are you being paid to browse hacker news? Probl not, but here you are. Maybe you never considered this, but programming for other reasons other than a salary is a possibility. If those pesky programmers gave it all away, for free, what would be left for you to sell? In this case, would you leave technology? Would you go somewhere else and practice your selling there? Can't we defend building for the sake of building? Doing for the sake of having fun? Maybe you would be left with nothing to sell, I understand, but that's fine for me. Sorry.

bluesnowmonkey•1mo ago

> Your job is to deliver code you have proven to work.

First of all, no it’s not. Your job is to help the company succeed. If you write code that works but doesn’t help the company succeed, you failed. People do this all the time. Resume padding, for example.

Sometimes it’s better for the business to have two sloppy PRs than a single perfect one. You should be able to deliver that way when the situation demands.

Second, no one is out there proving anything. Like formal software correctness proofs? Yeah nobody does that. We use a variety of techniques like testing and code review to try to avoid shipping bugs, but there’s always a trade off between quality and speed/cost. You’re never actually 100% certain software works. You can buy more nines but they get expensive. We find bugs in 20+ year old software.

t1234s•1mo ago

Bravo.. best headline I've read in a long time. This phrase should be a desktop background.

venturecruelty•1mo ago

Lmao no, my job is to make the line go up and make my boss happy. It was ever thus.

acituan•1mo ago

First problem is turning engineers into accountability sinks. This was a problem before LLMs too, but now a much bigger and structural problem with democratization of the capacity to produce plausible looking dumb code. You will be forced to underwrite more and more of that, and expected to absorb the downsides.

The root cause is the second problem; short of formal verification you can never exhaustively prove that your code works. You can demonstrate and automate that demonstration for a sensible subset of inputs and states and hope for the state of the world approximately staying that way (spoiler: it won't). This is why 100% test coverage in most cases is something bad. This is why sensible is the key operative attitude, which LLM suck at right now.

The root cause of that one is the third problem; your job is to solve a business problem. If your code is not helping the business problem, it actually is not working in the literal sense of the work. It is an artifact that does a thing, but it is not doing work. And since you're downstream of all the self-contradicting, ever changing requirements in a biased framing of a chaotic world, you can never prove or demonstrate that your code solves a business problem and that is the end state.

sowbug•1mo ago

Since they’re robots, automated tests and manual tests are effectively the same thing.

I'd buttress this statement with a nuance. Automated tests typically run in their entirety, usually by a well-known command like cargo test or at least by the CI tools. Manual tests are often skipped because the test seems to be far away from the code being changed.

My all-time favorite team had a rule that your code didn't exist if it didn't have automated tests to "defend" it. If it didn't, it was OK, or at least not surprising, for someone else to break or refactor it out of existence (not maliciously, of course).

zhyder•1mo ago

"Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work."

I'd go further: what's valuable is code review. So review the AI agent's code yourself first, ensuring not only that it's proven to work, but also that it's good quality (across various dimensions but most importantly in maintainability in future). If you're already overwhelmed by that thousand-line patch, try to create a hundred-line patch that accomplishes the same task.

I expect code review tools to also rapidly change, as lines of code written per person dramatically increase. Any good new tools already?

ozim•1mo ago

There is whole range of “proven to work” - regarding testing you cannot prove that there are no bugs.

Your job is to the deliver code up to specifications.

Not even checking the happy flow at least is of course gross negligence. But so is spending too much time on edge cases that no one will run into or person asking doesn’t want to pay for covering.

newsoftheday•1mo ago

"Your job is to deliver code you have proven to work"

And...code that has been 100% reviewed, even if it was fully LLM generated.

gorjusborg•1mo ago

Your actual job is to produce positive outcomes for your stakeholders. Code can be part of that, but doesn't have to be.

If you are dumping AI slop on your team to sort through, you are creating drag on the entire team's efforts toward those positive outcomes.

As someone getting dumped upon, you probably should make the decision (in line with the objective to producing positive outcomes) to not waste your time weeding through that stuff.

Review everything else, make it clear that the mess is not reviewable, and communicate that upward if needed.

tete•1mo ago

Who is "you"?

It's not my job, really. And given by the state of IT these days it's barely anyone's it seems.

nrhrjrjrjtntbt•1mo ago

Always has been

lifeisstillgood•1mo ago

I’m going to go with this as probably in the top three definitions of software developer …

along with

- the job was better titled as “Analyst Programmer” - you need both.

And

- you can make a changeset, but you have to also sell the change

6510•1mo ago

I work alone, I have considerable amount of unfinished code laying around. Sometimes even multiple instances of a thing. I could see how it would be annoying in a team settings. The cause is not having the thing but how you organize it. Like with LLM slop it is wonderful to be able to scroll over something that shows what the solution might look like.

holtkam2•1mo ago

I’d go further: it’s not enough to be able to prove that your code works. It’s required that you also understand why it works.

Otherwise you’ll end up in situations where it passes all test cases yet fails for something unexpected in the real world, and you don’t know why, because you don’t even know what’s going on under the hood.

koinedad•1mo ago

This is very helpful for a team and even though it takes a little time it actually speeds things up in the long run. Using PR templates can help. A general description of the problem including a screenshot or video go a long way.

I remember when I was working at a startup and a new engineer merged his code and it totally broke the service. I asked him if he ran his code locally first and he stared at me speechless.

Running the code locally is the easiest way to eliminate a whole series of silly bugs.

Like mentioned in the article adding a test and then reverting your change to make sure the test fails is really important, especially with LLMs writing tests. They are great at making things look like they work but completely don’t.

vcarrico•1mo ago

> the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

I'm noticing something else very similar but involving not necessarily junior roles with long messages, when they use these AI writing assistants that resume stuff, creates follow-ups, etc. Putting this additional burden in whoever needs to read it. It makes me think of a quote that says: "I would have written a shorter letter, but I didn't have the time."

golly_ned•1mo ago

I don't think this quite captures the problem: even if the code is functional and proven to work, it can still be bad in many other ways.

The submitter should understand how it works and be able to 'own' and review modifications to it. That's cognitive work submitters ipso facto don't do by offloading the understanding to an LLM. That's the actual hard work reviewers and future programmers have to do instead.

a24venka•1mo ago

There is a heavy emphasis on testing the code as the way to provide guarantees that it works. While this is a helpful tool, I often find that the best engineers are ones who take a more first principles approach to the code and can reason about why the solution is comprehensive (covers all edge cases) and clean (easy for humans and LLMs to build on).

It often takes discipline to think and completely map out solutions before you build. This is where experience and knowing common patterns can also help.

When you have the experience of having manually written or read a lot of code it helps at the very least quickly understand what the LLMs are writing and reason about it later even if not at the beginning.

ChrisMarshallNY•1mo ago

In my last job (engineering manager for a Japanese high-Quality hardware manufacturer), we were expected to deliver software that works.

In fact, if any bugs were found by the official "last step" QA Department, we (as a software development department) were dinged. If QA found bugs, they could stop the entire product release, so you did not want to be responsible for that.

This resulted in each software development department setting up their own, internal "QC team" of testers. If they found bugs, then individual programmers (or teams) would get dinged, but the main department would not.

Our software got a lot of testing.

jobs_throwaway•1mo ago

What does 'get dinged' mean? It seems like this would lead to a strong incentive against making any changes, lest you introduce bugs, perhaps due to no fault of your own.

ChrisMarshallNY•1mo ago

Yup. "Get dinged" was usually some kind of reprimand. It could go from being yelled at by the department manager, to being fired as a department.

And yes. It was a strong disincentive to making changes.

I didn't really like it, but our software did do what it said on the tin (which wasn't always ideal).

johnea•1mo ago

I couldn't agree more with the sentiment.

If you, the development engineer, haven't demonstrated the product to work as expected, and preferably this testing is independently confirmed by a product test group, then you can't claim to be delivering a functional product.

I would add though, that management, specifically marketing management setting unreasonable demands and deadlines, are a bigger threat to testing than LLMs.

Of course the damage done by LLM generated code not being tested, is additive to the damage management is doing.

So this isn't any kind of apologism, the two sources are both making the problem worse.

keeda•1mo ago

The way I would phrase it is: software engineering is the craft of delivering the right code at the right time, where "right code" means it can be trusted to do the "right thing."

A bit clunky, but I think that can be scaled from individual lines of code to features or entire systems, whatever you are responsible for delivering, and encompasses all the processes that go around figuring what code is to be actually written and making sure it does what it's supposed to.

Trust and accountability are absolutely a critical aspect of software engineering and the code we deliver. Somehow that is missed in all the discussions around AI-based coding.

The whole phenomenon of AI "workslop" is not a problem with AI, it's a problem with lack of accountability. Ironically, blaming workslop on AI rather than organizational dysfunction is yet another instance of shirking accountability!

ozozozd•1mo ago

I don't test my code because I think it's my duty. I test it because my personal motivation is to see it working! What's the point of writing code if I don't even get to see it run?!

If someone's not even interested and excited to see their code work, they are in the wrong profession.

casey2•1mo ago

It comes out of the AI, that is proof enough. Why would I have prompted it and gave it to you if I didn't think that the AI could handle it? The real risk is closer to "people carry some preconceived notion about code that doesn't map to AI code." such as, for example, the person who contributed the code knows about the problem in enough detail to be accountable in the short term. Or at the very least be able to tell you why they made a PR at all

How to prove it has been subject to some debate for the past century, the answer is that it's context dependent to what degree you will or even can prove the program and exposed identifiers correct. Programming is a communication problem as well as a math problem, often an engineering problem too. Only the math portion can be proved, the a small by critical amount engineering portion tested.

Communication is the most important for velocity it's the difference between hand rolling machine code and sshing into a computer halfway across the world having every tool you expect. If you don't trust that webdevs know what they are doing then you can be the most amazing dev in the world you but your actual ability to contribute will be hampered. The same is true of vibe coding, if people aren't on the same page as to what is and isn't acceptable velocity starts to slow down.

Languages have not caught up to AI tools, since AI operates well above the function level, what level would be appropriate to be named and signed off on? pull request and link to the chat as a commit? (what is wrong with that that could be fixed at the name level)

Honest communication is the most important. Amazon telling investors that they use TLA+ is just signaling that they "for realz take uptime very seriously guize", "we know distributed systems" and engineering culture. The honest reality is that they could prove all their code and not IMprove their uptime one lick, because most of what they run isn't their code. It's a communication breakdown if effort gets spent on that outside a research department.

webprofusion•1mo ago

No my job is sitting eating this here donuts.

Swannie•1mo ago

Posted down thread, but worth posting as a comment too.

I know Simon follows this "Issue First" style of work in his projects, with a strong requirement for passing tests to be included.

It's been a best practice for a long time. I really enjoyed this when I read it ~10 years ago, and it still stands the test of time:

https://rfc.zeromq.org/spec/42/#24-development-process

The rationale was articulated clearly in:

https://hintjens.gitbooks.io/social-architecture/content/cha...

If you have time, do yourself a favour and read the whole lot. And then liberally copy parts of C4 into your own process. I have advocated for many components of it, in many contexts, at $employer, and will continue to do so.

psv2522•1mo ago

I thought manual testing was mandatory mininum requirement after every AI change unless its very small typo or something?

How is this a issue, its genuinely common sense.

unsungNovelty•1mo ago

Billions were spent in the last 5+ years saying AI can do coding. Increase speed. Reduce headcount. Remove processes. It's irritating to see that people are NOW changing the narrative of... You need to review the code... You need test it...

Devs already know this. Tell this to Managers, CEOs and non-engineers who believed billions worth of marketing BS. Cos devs don't have voice most of the time. They set the timelines. The want to push this end results to their team/company. So that is the constraints devs are working with. So to them, NOT to us Simon. WE KNOW! :)

jstrebel•1mo ago

In all fairness: human senior devs see AI-written source code with some disdain, as it usually does not match their stylistic and idiomatic preferences (although being correct and fully working). I don't think that untested code is the problem here - you can easily measure test coverage and of course. every CI/CD pipeline should run the existing unit and integration tests.

andai•1mo ago

> Don’t be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I’ve done this myself I’ve quickly regretted it.

How does this work? When expectations about the program's state vs its observable behavior diverge?

bigDinosaur•1mo ago

A manual test is often an end-to-end test which are notoriously difficult to automate.

simonw•1mo ago

Just one very simple example: you add a new frontend feature where clicking a button opens a modal. You include an automated test that selects the button and clicks it using document.querySelector("#mybutton").click() - and the test passes. Then when you test it in a browser yourself you find that the button is impossible to click because it's invisibly positioned behind some other element.

nottorp•1mo ago

Hmm. I've never been asked to do formal proofs for my code. Where does he work?

simonw•1mo ago

This isn't about formal proofs, it's about manual testing and simple automated tests.

rldjbpin•1mo ago

i don't think this is as much as an AI issue as it is about path of least resistance in a velocity-driven environment.

call me the worst junior dev in the industry, but pre-coding agents, closing tickets was more important than upholding absolute quality. not everybody is dealing with a billion concurrent users with multi-geo deployments. most of the time, a few screenshots or test output for manual validation is enough to go ahead. when pressed with time and without the prerequisites in the infra side, doing the absolute best development and testing is a luxury only for daydreamers.

automated testing can be a double-edged sword. pre-LLM, even test coverage was a number that somehow needed to go up after each PR. this only resulted in shady tactics of pointless test cases that slowly bring up the metric. today it can be very dangerous if both code and its test suite are vibe coded. especially when it can give the appearance of that 90%+ code coverage.

on the other hand, some manual testing to make sure the core functionality works is the bare minimum one does before pushing out code. at least i would like to believe it is.

naasking•1mo ago

Almost nobody proves their code works. At best, they simply have high confidence it works. This confidence is also sometimes (often?) misplaced.

freedomben•1mo ago

> Don’t be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I’ve done this myself I’ve quickly regretted it.

Seriously, this cannot be emphasized enough. Before LLMS when we were writing tests completely manually, manual testing made sense to me as the second step. However after playing around a lot with coding agents and LLMs, I fully agree this has flipped. Test it manually first! When you generate the tests it is extremely wise to ensure that the tests fail without the new code, and pass with it. You definitely need to review the test though, because it's remarkably easy to have the agent put something in there that makes it not a good test.

Just a couple days ago for example, Claude made a test pass by skipping authentication and leaving a brief comment informing that the authentication made the test flaky. It even threw a quick variable in there that enabled running or disabling flaky tests, and flaky tests were disabled by default! Had I not been doing a good review, I definitely would have missed it because it was cleverly subtle. I've also seen it test the wrong endpoint!

toobulkeh•1mo ago

Your job is also, arguably more important in certain scenarios, is to deliver maintainable code.

Remember, code does 2 things: 1. Tell the machine what to do 2. Tell the next developer what you were trying to do

DustinBrett•1mo ago

Proving it works in edge cases is usually the hard part.

Al Lowe on model trains, funny deaths and working with Disney

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The AI boom is causing shortages everywhere else

Reinforcement Learning from Human Feedback

The Waymo World Model

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

Selection Rather Than Prediction

Speed up responses with fast mode

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Software factories and the agentic moment

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Ga68, a GNU Algol 68 Compiler

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Show HN: I spent 4 years building a UI design tool with only the features I use

Al Lowe on model trains, funny deaths and working with Disney

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The AI boom is causing shortages everywhere else

Reinforcement Learning from Human Feedback

The Waymo World Model

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

Selection Rather Than Prediction

Speed up responses with fast mode

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Software factories and the agentic moment

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Ga68, a GNU Algol 68 Compiler

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Show HN: I spent 4 years building a UI design tool with only the features I use

Your job is to deliver code you have proven to work

Comments