Now that being said a person should feel free to do what they want with their code. It’s somewhat tough to justify the work of setting up infrastructure to do that on small projects, but AI PRs aren’t likely a big issue fit small projects.
Alas…
Some people will absolutely just run something, let the AI work like a wizard and push it in hopes of getting an "open source contribution".
They need to understand due diligence and reduce the overhead of maintainers so that maintainers don't review things before it's really needed.
It's a hard balance to strike, because you do want to make it easy on new contributors, but this is a great conversation to have.
...that's just scratching the surface.
The problem is that LLMs make mistakes that no single human would make, and coding conventions should anyway never be the focus of a code review and should usually be enforced by tooling.
E.g. when reading/reviewing other people's code you tune into their brain and thought process - after reading a few lines of (non-trivial) code you know subconsciously what 'programming character' this person is and what type of problems to expect and look for.
With LLM generated code it's like trying to tune into a thousand brains at the same time, since the code is a mishmash of what a thousand people have written and published on the internet. Reading a person's thought process via reading their code doesn't work anymore, because there is no coherent thought process.
Personally I'm very hesitant to merge PRs into my open source projects that are more than small changes of a couple dozen lines at most, unless I know and trust the contributor to not fuck things up. E.g. for the PRs I'm accepting I don't really care if they are vibe-coded or not, because the complexity for accepted PRs is so low that the difference shouldn't matter much.
Like a recognition that there's value there, but we're passing the frothing-at-the-mouth stage of replacing all software engineers?
It feels like people and projects are moving from a pure “get that slop out of here” attitude toward more nuance, more confidence articulating how to integrate the valuable stuff while excluding the lazy stuff.
You and I know that using AI is a metric to consider when judging ability and quality.
The difference is that it's not judgment but a broadcast, announcement.
In this case a snotty one from Discourse.
I mention that it lingers because I think that is a real psychological effect that happens.
Small announcements like this carry over into the future and flood any evaluation of yourself which can be described as torture and sabotage since it has an effect on decisions you make sometimes destroying things.
"Slop" doesn't seem to be Yiddish: https://www.etymonline.com/word/slop, and even if it was, so what?
I really liked the paragraph about LLMs being "alien intelligence"
> Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison.
I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time.
Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to.
It's a similitude I find compelling. The way they produce code and the way you have to interact with them really feels "alien", and when you start humanizing them, you get emotions when interacting with it and that's not correct.
I mean, I do get emotional and frustrated even when good old deterministic programs misbehaved and there was some bug to find and squash or work-around, but the LLM interactions can bring the game to a complete new level. So, we need to remember they are "alien".These new submarines are a lot closer to human swimming than the old ones were, but they’re still very different.
I really like the way Discourse uses "levels" to slowly open up features as new people interact with the community, and I wonder if GitHub could build in a way of allowing people to only be able to open PRs after a certain amount of interaction, too (for example, you can only raise a large PR if you have spent enough time raising small PRs).
This could of course be abused and/or lead to unintended restrictions (e.g. a small change in lots of places), but that's also true of Discourse and it seems to work pretty well regardless.
It's not rocket science.
>This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.
Isn't that exactly the point? Doesn't this achieve exactly what the whole article is arguing for?
A hard "No AI" rule filters out all the slop, and all the actually good stuff (which may or may not have been made with AI) makes it in.
When the AI assisted code is indistinguishable from human code, that's mission accomplished, yeah?
Although I can see two counterarguments. First, it might just be Covert Slop. Slop that goes under the radar.
And second, there might be a lot of baby thrown out with that bathwater. Stuff that was made in conjunction with AI, contains a lot of "obviously AI", but a human did indeed put in the work to review it.
I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review? (And a proof of competence, to boot?)
And from the point of view of the maintainers, it seems a terrible idea to set up rules with the expectation that they will be broken.
In a live setting, you could ask the submitter to explain various parts of the code. Async, that doesn’t work, because presumably someone who used AI without disclosing that would do the same for the explanation.
I don't like it but I can hardly blame them.
This feels extremely counterproductive and fundamentally unenforceable to me.
But it's trivially enforceable. Accept PRs from unverified contributors, look at them for inspiration if you like, but don't ever merge one. It's probably not a satisfying answer, but if you want or need to ensure your project hasn't been infected by AI generated code you need to only accept contributions from people you know and trust.
[pedantry] It bothers me that the photo for "think of prototype PRs as movie sets" is clearly not a movie set but rather the set of the TV show Seinfeld. Anyone who watched the show would immediately recognize Jerry's apartment.
https://nypost.com/2015/06/23/you-can-now-visit-the-iconic-s...
It looks a bit different wrt. the stuff on the fridge and the items in the cupboard
In any case, though, neither one is a movie set.
I don't think we agree. What do you mean by "it"?
> it not the orginal set, just something looking very similar.
Your NY Post link is explicitly not the original set but rather a recreation. It says so in that article.
However, the photo in the NY Post is very different from the photo in the submitted blog post. Are you claiming that the photo in the submitted blog post is also not the original set? If not, then what is it, and why would there be multiple recreations?
Will the contributor respond to code-review feedback? Will they follow-up on work? Will they work within the code-of-conduct and learn the contributor guidelines? All great things to figure out on small bugs, rather than after the contributor has done significant feature work.
There are plenty of open source projects where it is difficult to get up to speed with the intricacies of the architecture that limits the ability of talented coders to contribute on a small scale.
There might be merit in having a channel for AI contributions that casual helpers can assess to see if they pass a minimum threshold before passing on to a project maintainer to assess how the change works within the context of the overall architecture.
It would also be fascinating to see how good an AI would be at assessing the quality of a set of AI generated changes absent the instructions that generated them. They may not be able to clearly identify whether the change would work, but can they at least rank a collection of submissions to select the ones most worth looking at?
At the very least the pile of PRs count as data of things that people wanted to do, even if the code was completely unusable, placing it into a pile somewhere might be minable for the intentions of erstwhile contributors.
It’s an extension of the asymmetric bullshit principle IMO, and I think now all workplaces / projects need norms about this.
colesantiago•2h ago
I am the founder and a product person so it helps in reducing the number of needed engineers at my business. We are currently doing $2.5M ARR and the engineers aren't complaining, in fact it is the opposite, they are actually more productive.
We still prioritize architecture planning, testing and having a CI, but code is getting less and less important in our team, so we don't need many engineers.
pards•2h ago
That's a bit reductive. Programmers write code; engineers build systems.
I'd argue that you still need engineers for architecture, system design, protocol design, API design, tech stack evaluation & selection, rollout strategies, etc, and most of this has to be unambiguously documented in a format LLMs can understand.
While I agree that the value of code has decreased now that we can generate and regenerate code from specs, we still need a substantial number of experienced engineers to curate all the specs and inputs that we feed into LLMs.
HPsquared•1h ago
didericis•1h ago
We can (unreliably) write more code in natural english now. At its core it’s the same thing: detailed instructions telling the computer what it should do.
oompydoompy74•2h ago
colesantiago•2h ago
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
https://news.ycombinator.com/newsguidelines.html
lawn•1h ago
> the engineers aren't complaining
You're missing a piece of the puzzle here, Mr business person.
colesantiago•1h ago
sgarland•49m ago
wycy•1h ago
More productive isn't the opposite of complaining.
colesantiago•1h ago
blitzar•1h ago
theultdev•1h ago
> code is getting less and less important in our team
> the engineers aren't complaining
lays off engineers for ai trained off of other engineer's code and says code is less important and engineers aren't complaining.
colesantiago•1h ago
They can focus on other things that are more impactful in the business rather than just slinging code all day, they can actually look at design and the product!
Maximum headcount for engineers is around 7, no more than that now. I used to have 20, but with AI we don't need that many for our size.
theultdev•1h ago
I don't see how you could think 7 engineers would love the workload of 20 engineers, extra tooling or not.
Have fun with the tech debt in a few years.
BigTTYGothGF•52m ago
If I survived having 65% of my colleagues laid off you'd better believe I wouldn't complain in public.
hansmayer•1h ago
Tells me all I need to know about your ability for sound judgement on technical topics right there.
sangeeth96•15m ago