Some of the software that I maintain is critical to container ecosystem and I'm an extremely paranoid developer who starts investigating any github issue within a few minutes of it opening. Now, some of these AI slop github issues have a way to "gaslight" me into thinking that some code paths are problematic when they actually are not. And lately AI slop in issues and PRs have been taking up a lot of my time.
Maybe it’s only the really popular and buzzword-y repos that are targets?
In my experience, the people trying to leverage LLMs for career advancement are drawn to the most high profile projects and buzzwords, where they think making PRs and getting commits will give them maximum career boost value. I don’t think they spend time playing in the boring repos that aren’t hot projects.
There’s already more human produced content in the world than anyone could ever hope to consume, we don’t need more from AI.
But in general I think most people still remain excessively gullible and naive. Social media image crafting is one of the best examples of this. People create completely fake and idealized lives that naive individuals think are real. Now with AI enabling one to create compelling 'proof' of whatever lie you want, I think more people are becoming more suspicious of things that were, in fact, fake all along.
---
Going back to ancient times many don't know that Socrates literally wrote nothing down. Basically everything we know of him is thanks to other people, his student Plato in particular, instead writing down what he said. The reason for this was not a lack of literacy - rather he felt that writing was harmful because words cannot defend themselves, and can be spun into misrepresentations or falsehoods. Basically - the argumentative fallacies that indeed make up most 'internet debates', for instance. Yet now few people are not aware of this issue, and quotes themselves are rarely taken at face value, unless they confirm ones biases. People became less naive as writing became ubiquitous, and I think this is probably a recurring theme in technologies that transform our abilities to transfer information in some format or another.
It is still a serious problem just want that to be abundantly clear. Several thousand people (in the US alone) fall for it every year. I used to browse 419 eater regularly and up until a few years ago (when I last really followed this issue) these scams were raking in billions a year. Could be more or less now but doubt it’s shifted a ton.
There is the hope that in dumping so much slop so rapidly that it will break the bottom of the bucket. But there is the alternative that the bottom of the bucket will never break, it will just get bigger.
Sadly they also become suspicious of things that are, in fact, facts all along.
Video or photo evidence of a crime become useless the better AI gets
This is probably a good thing because photoshop and CGI have existed for a very long time and people shouldn't have the ability to frame an innocent for a crime or even get away with one just because they pirate some software and put in a few hours watching tutorials on youtube.
The sooner jurors understand that unverified video/photo evidence is worthless the better.
Absolutely not. All that happened is most people became aware that "Nigerians offering you money are scammers." But they still fall for other get rich quick schemes so long as it diverges a little bit from that known pattern, and they'll confidently walk into the scam despite being warned and saying, "It's not a scam, dumbass. It's not like that Nigerian price stuff." If anything, people seem to be becoming more confident that they're in on some secret knowledge and everyone else is being scammed.
Even if you think the harms of AI/machine generated content outweigh the good, this is not a winning argument.
People don’t just consume arbitrary content for the sake of consuming any existing content. That’s rarely the point of it. People look for all kinds of things that don’t exist yet — quite a lot of it referring to things that are only now known or relevant in the given moment or to the given niche audience requesting it. Much of it could likely never exist if it weren’t possible to produce it on demand and which would not be valuable if you had to wait for a human to make it.
Check this profile for the email if you wanna ask for more info or get updates.
Edit: To the people downvoting this, what have you done to help solve the spam problem that will be made worse by LLMs?
The only solution I can see is a hard-no policy. If I think this bug is AI, either by content or by reputation, I close without any investigation. If you want it re-opened, you'll need to IRL prove its genuine in an educated, good-faith approach that involves independent efforts to debug.
> "If you put your name on AI slop once, I'll assume anything with your name on it is (ignorable) slop, so consider if that is professionally advantageous".
All comes down to accountability.
I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?
[append] Getting a lot of hate for this, which I guess is a pretty clear answer. I guess the reason I'm not receiving the "fuck off" clearly is because when I see these threads of people complaining about AI content, it's really clearly low-quality crap that (for example) doesn't even compile, and wastes everyone's time.
I feel different from those cases because I did spend my time to resolve the issue for myself, did review the code, did test it, and do stand by what I'm putting under my name. Hmm.
I've also been on the other side of this, receiving some spammy LLM-generated irrelevant "security vulnerabilities", so I also get the desire for some filtering. I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
I think it's also important to disclose how rigorously you tested your changes, too. I would hate to spend my time looking at a change that was never even tested by a human.
It sounds like you do both of these. Judging by the other replies, it seems that other reviewers may take a harsher stance, given the heavily polarized nature of LLMs. Still, if you made the changes and you're up front about your methodology, why not? In the worst case, your PR gets closed and everybody moves on.
Honestly, this is kind of where I see LLM generated content going where you'll have to pay for ChatGPT 9 to get information because all the other bots have vandalized all the primary sources.
What's really fascinating is you need GPUs for LLMs. And most LLM output is, well, garbage. What did you previously need GPUs for? Mining crypto and that is, at least in the case of Bitcoin, pointless work for the sake of pointless work ie garbage.
I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
I do not use Copilot, Claude, etc, although I partially agree with one of the comments there, that using LLM for minor auto-completion is probably OK, as long as you can actually see that the completion is not incorrect (although that should apply to other uses of auto-completion too, even if LLM is not used; but it is even more important to check more carefully if LLM is used). I think it would be better to not accept any LLM generated stuff otherwise (although the author might use LLM to assist before submitting it if desired (I don't, but it might help some programmers), e.g. in case the LLM finds problems with it, that they will then have to review themself to check if it is correct, before correcting and submitting it; i.e. don't trust the results of the LLM).
It links to https://github.com/lxc/incus/commit/54c3f05ee438b962e8ac4592... (add .patch on the end of the URL if it is not displayed), and I think the policy described there is good. (However, for my own projects, nobody else can directly modify it anyways; they will have to make their own copy and modify that instead, and then I will review it by myself and can include the changes (possibly with differences from how they did it) or not.)
If it is copying prior work, then you are right that there would be a lot of cross licensing bleed through. The opposite is also true in that it could take proprietary code structure and liberate it into GPL 3 for instance. Again what is the legal standing on this?
Years back there was a source code leak of Microsoft Office. Immediately the Libre office team put up restrictions to ensure that contributors didn't even look at it for fear that it would end up into their project and become a leverage point against the whole project. Now with LLM's it can be difficult to know where anything comes from.
I guess as some point there will be a massive lawsuit. But, so much of the economy is wrapped up in this stuff nowadays, the folks paying for Justice System Premium Edition probably prefer not to have anything solid yet.
https://github.com/umami-software/umami/pull/3678
The goal is "Taiwan" -> "Taiwan, Province of China" but via the premise of updating to UN ISO standards, which of course does not allow Taiwan.
The comment after was interesting with how reasonable it sounds: "This is the technical specification of the ISO 3166-1 international standard, just like we follow other ISO standards. As an open-source project, it follows international technical standards to ensure data interoperability and professionalism."
The politics of the intent of the PR was masked. Luckily, it was still a bit hamfisted. The PR incorrectly changed many things and the user stated their political intention in the original PR (the above is from a later comment).
The open-source community, generally speaking, is a high-trust society and I'm afraid that LLM abuse may turn it into a low-trust society. The end result will be worse than the status quo for everyone involved.
This is arguably isomorphic to Kernighan's Law:
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
The more code a PR contains, the more thorough knowledge of it the submitter must demonstrate in order to be accepted as a serious contributor.
It doesn't matter if it was written by a human or an LLM. What matters is whether we can have a productive discussion about some potential problem. If an LLM passes the test, well it's a good doggy, no need to kick it out. I'd rather talk with an intelligent LLM than a clueless human who is trying to use github as a support forum.
The responsibility then is for an open source project to be strict and not be shy on calling out low quality work, have good integ tests and linters, and have guidance like AGENTS.md files that tell coding robots how to be successful in the repo.
dropbox_miner•3h ago
And honestly, its becoming annoying