Why Claude's Comment Paper Is a Poor Rebuttal

https://victoramartinez.com/posts/why-claudes-comment-paper-is-a-poor-rebuttal/

6•vectorhacker•7h ago

Comments

low_tech_love•14m ago

A fundamental problem that we’re still far away from solving is not necessarily that LLMs/LRMs cannot reason the same way that we do (which I guess should be clear by now); but that they might not have to. They generate slop so fast that, if one can benefit a little bit from each output, i.e. if you can find a little bit of use hidden beneath the mountain of meaningless text they’ll create, then this might still be more valuable than preemptively taking the time to create something more meaningful to begin with. I can’t say for sure what is the reward system behind LLM use in general, but given how much money people are willing to spend with models even in their current deeply flawed state, I’d say it’s clear that the time savings are outweighing the mistakes and shallowness.

Take the comment paper, for example. Since Claude Opus is the first author, I’m assuming that the human author took a backseat and let the AI build the reasoning and most of the writing. Unsurprisingly, it is full of errors and contradictions, to a point where it looks like the human author didn’t bother too much to check what was being published. One might say that the human author, in trying to build some reputation by showing that their model could answer a scientific criticism, actually did the opposite: it provided more evidence that its model cannot reason deeply, and maybe hurt their reputation even more.

But the real question is, did they really? How much backlash will they possibly get from submitting this to arxiv without checking? Would that backlash keep them from submitting 10 more papers next week with Claude as the first author? If one puts in a balance the amount of slop you can put out (with a slight benefit) vs. the bad reputation one gets from it, I cannot say that “human thinking” is actually worth it anymore.

An Architectural Approach to Decentralization

SelfDB: The last Back end as a service you will pay for

Can shoes be made in the US without cheap labour?

Dart and WebAssembly with JavaScript Interop

How I Passed the AWS Certified Security – Specialty (SCS-C02) Exam in 2025

Show HN: Mockstar – AI mock interviews and feedback for jobseekers

Jio and Jio-Fiber Down in Parts of India

What is cosh(List(Bool))? Or beyond algebra: analysis of data types

Google Is Scamming Users with VEO 3, While Delivering VEO 2 Instead

The right way to make AI part of your tech strategy

SAZ Caption AI

Show HN: Compiler for Writing Ethereum Smart Contracts with TypeScript

Show HN: Better Docx Import and Export Support for Tiptap Editor

Timdle

Choosing where to spend my team's effort

A Systematic Review and New Analyses of the Gender-Equality Paradox

Jordan's black refugees

Apple quietly makes running Linux containers easier on Macs

Best Antidetect Browser Setups for Social Media Marketers

The Gnarly Man

Show HN: Shame Meter

Technical co-founder, built everything. Offered 4%. Oof

Show HN: Gifty – A real-world gift hunt you play with your feet

Show HN: A Chrome extension that highlights one sentence at a time while reading

.NET Performance Testing: What Is Important to Know in 2025?

Use Copilot Agent Mode in Visual Studio (Preview)

Warner Bros: fright night for bondholders

Google Chrome Music Video

Founders: How do you audit code quality, infra costs, and dev team efficiency?

Show HN: Life Anti-Checklist