So far it reintroduced several security issues and replaced the README.md.
Is this a configuration that's not common and thus not tested?
If people think they can do better, I want to see their forks and them keeping up with it.
https://github.com/RsyncProject/rsync/graphs/contributors?fr...
If you want me to read your analysis, you are going to have to make it not read like Claude wrote it. What does "placement" even mean here?
Also, it wasn't written by Claude FWIW, GLM 5.1.
The use of "regime shift" is what gave it away for me. I've never seen a human write that, but Claude does from time to time.
At least they removed occurrences of "load-bearing".
If you don't want to read the LLM prose, you can just go to the GitHub of my project, grab the scripts, and run the full pipeline. It will gather the data, build the database, and run the analysis from scratch for you, and you can look at the numbers directly. It's all repeatable.
Please, why can't people write stuff by hand themselves any more? It's a good analysis but how can I trust it without reviewing everything myself?!
At this point we're all used to skimming through thousands of AI-generated sentences every working day and constantly thinking "this is likely to be 20% bullshit", it's hard to turn that off even if I try.
- I used GLM 5.1 to help with the coding and math for this.
- However, I explicitly dictated where the data should be pulled from (GitHub, Bugzilla, mailing list), how it should be tagged and grouped, and what data to look at (e.g. bugs instead of regressions)
- Additionally, I consulted with my wife, who has a master's degree in statistics from Penn State University for what sort of statistical methodology would be justified for this very limited data set, while still giving as much information as possible.
- I know the website looks like we stereotypically consider vibe-coded websites to look, but I actually explicitly asked for that. The original HTML design looked like a website from 1995, and I just prefer how this looks. It's pretty!
> A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement.
So the criticism was bad, and that somehow makes it ok to use a bad metric?
Bugs per commit as a metric papers over severity, both in terms of security severity as well as the effect on the user. A mislabeled button has the same weight as the entire app crashing in this framework.
$ apt-cache policy rsync | grep Installed
Installed: 3.4.1+ds1-7ubuntu0.2
$ sudo apt-mark hold rsync
rsync set on hold.I didn't have the time to actually think about any "arguments" at all tbh it's just a knee jerk reaction as I get ready to log off for the weekend. Not actually looking to argument for or against your post at all lol.
As usual, Ubuntu backported fixes and didn't upgrade to a new version. Whether or not they also backported regressions in edge cases that afflict the latest rsync, I don't know. Pinning the Ubuntu package may prevent getting further regressions, but is preventing you getting any future such backported security fixes.
If by fairest you mean to say that this analysis and response is sufficient, then I'm sorry but I have to disagree. We really need to understand if the nature of the bugs are worse from a user's perspective. Even if the rate stayed unchanged, if the result is the perceived quality of the software declined then I would personally consider that worse, especially if I were a project maintainer.
That's not meant to be wholly dismissive either. But in general, I don't think quantitative analysis alone is enough to fully answer this type of question.
Of course this is a bigger problem, as its now harder to distinguish content that is "AI slop" with "content co-authored with AI that is carefully reviewed" with a quick glimpse, and the "AI smell" is quite off-putting. My initial reaction was also negative, but after glimpsing it through and reading the summaries, I found it decent summary, which also... speaks of this thread, of the content of the blog post and everything about the discussion and the strong feelings people have developed around the use of LLMs.
Anyhow, it would be good to disclose the repo with the code for the statistics & use of LLM in the writing right up front. Which model, and why it was used to do the writing, etc. Its enough to say "I think it writes better than I do" or "I was in a hurry, sorry" or what ever, but it really should be disclosed. It reads more honest.
ps. really... that sideways scroll? plz fix it.
The problem I see is that this is indistinguishable to a reader at a glance.
Distancing the writing from the "AI smell" not only improves the quality by dropping the unnecessary ocean of rhetorical devices, it forces the human to have real weight and agency on what's being said.
I think that act of distancing from raw LLM output through refinement is a huge quality leap. Even if you're only doing the refinement with an LLM, it forces the writing to have more voice and ideas from the author.
I can see the work that went into the analysis here but again, as a casual reader, it's impossible to tell that there were any original ideas here expressed by the author.
I am pretty insensitive to AI writing. I have never commented before about something sounding like AI, because mostly I don't notice. But this was so over the top that I spent the whole article trying to decide whether it was an intentional parody of AI writing style.
This article's language is not en-US. It's not en-BR. It's en-SLOP.
Yes, that was my clumsy attempt at AI parody. Here's another: this article doesn't just have AI tells. It is AI tells.
Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.
As for the substance of the analysis, it seems pretty good to me but I see some flaws that weaken it a bit.
The presence of "The Outlier Nobody Noticed" proves nothing and deserves no more than a passing mention. A random release introduced way more bugs than the Claude-containing releases. That provides evidence that Claude doesn't introduce more bugs only if your hypothesis is a very naive "AI is the only thing that can ever increase bug introduction rates."
The whole analysis has very limited data. It's necessarily based off a single pair of releases at the very end of the chronological timeline. You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis. (By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.)
"The critics' claim is a simple comparison: did the rate go up?" That's reductive. For one, these releases are known to be in reaction to a flood of (AI-discovered!) security reports, which is a novel situation and in fact is a huge confound to anyone arguing about what those two releases mean -- they're both heavily AI-written, but in response to an unusual situation. When the samples are only drawn from a distinct scenario, statistic analysis can only speak to the quality of code in that scenario.
Also, another reasonable hypothesis could be: AI-written code has bugs of a different flavor that bothers users more. It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it. (If true, it still doesn't support the claim that depending on AI code is a catastrophe, fwiw.)
I'm not arguing the conclusion is wrong. I'm saying the analysis proves far less than it claims to. As for whether it's a debacle for rsync to become dependent on AI code generation, I think that's a reasonable debate to have but it's not going to be resolved this reductively.
(Also, I suggest clearly acknowledging where AI was/wasn’t used. I like CuriosityC’s suggestion: https://news.ycombinator.com/item?id=48411968)
This is low-quality--every single day I witness Codex and Claude misunderstand, mislead, and hallucinate responses based on "assumptions" and I have to fact-check them.
If I wanted a statistical analysis and to be the human in the loop, I would ask the LLM myself, and I would definitely NOT read an article that just dumps the LLM output as-is.
You didn't care enough to make a good writeup, why should we believe that you cared enough to make a good analysis?
At the time, I found this a bit irritating, but with a few weeks time I see the merit. The informational content tends to fall into “derivative” territory when LLM’s write stuff. And people are here for novelty and some socialization.
Also LLM prose seems optimized for engagement rather than concise communication. Takes longer to sift through linguistic boilerplate to get to the point. (The quoted bit being a case in point)
wookmaster•1h ago
everdrive•1h ago
"Cars are just a tool. The drivers who piloted the vehicles and weren't careful enough [are responsible for the deaths.]"
Angostura•1h ago
roywiggins•1h ago