frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The End of Code Review: Coding Agents Supersede Human Inspection

https://arxiv.org/abs/2606.13175
19•cribwi•1h ago

Comments

sarchertech•44m ago
This is an undergraduate level persuasive essay masquerading as an academic paper.

The title implies some novel research or a review of existing research that that clearly shows agents are better at code review than humans but then provides this single paragraph on the review capabilities of agents:

> Beyond general software engineering, several strands of work speak specifically to the capabilities that code review re- quires. Pornprasit and Tantithamthavorn evaluate LLM-based automated review in industrial settings and find that agents detect the same categories of defect that human reviewers target: correctness errors, security weaknesses, performance inefficiencies, and style violations [12]. Li et al. demonstrate that CodeReviewer produces actionable inline comments at quality that is at least comparable to those of trained human reviewers on a significant fraction of the evaluation set [11].

cbarnes99•43m ago
This feels like some combination of monumentally stupid and incredibly naive.
SimianSci•43m ago
> "To support the claim that coding agents can displace human code review, we survey evidence of agent capability across three dimensions: benchmark performance on softwareengineering tasks, review-specific capabilities, and developer productivity with deployed tools"

Not sure I can agree with this premise, especially since there seems to be a complete lack of "real-world results" in this evaluation. This strikes me as being written by a theorist, who's only experience with Quality Assurance exists in studies or papers.

devin•42m ago
Big eye roll from me.

"Can't scale due to too many PRs" neglects answering questions like: Are these PRs valuable? Are they just additional PRs to right the wrongs of previous ill-conceived PRs? How much churn is going on here? Is the influx of PRs a permanent state, or something that we'll only live through temporarily because we have a lot of little things we can set our agents upon, but after they're done we'll return to a normal work cadence?

SpicyLemonZest•41m ago
I don't believe that a human being wrote this paper. The "review-specific capabilities" section is obviously the only one that matters to the thesis, and it does not actually point towards any data indicating that coding agents supersede human inspection. An LLM, though, could easily be distracted or prompted into making the leap from "same categories" + "comparable on a significant fraction of the evaluation set" to "superior".
jmuguy•40m ago
This seems to take the view that code review is essentially linting for simple issues. Our team is fairly small so "code review" usually involves QA and everything else you might want to do before something is pushed to production.

But yeah - I can have one LLM check another LLMs work. Kind of a waste of tokens for most PRs.

_se•40m ago
What in the LLM psychosis is this
jpgvm•39m ago
This is honestly the biggest battle with AI driven development right now. You have these extremely potent tools that can output a ton of really great code if they are wielded correctly but there is simply no way to keep up with their output at human review pace (which was already slower than human code creation pace).

I think the only real solution is to add increasingly strict guardrails that can be enforced with a combination of more AI agents and actual executable contracts. The other aspect is through using languages and tools that densify correctness. i.e languages like Rust that have very rich type system so both review and design can be focused on a small by volume slice which is the core types. The other main tools for densifying correctness are formal methods, (model checking, etc), fuzzing/property based testing and static analysis.

All of these tools are cheaper to use than they once were because of lot of the minutiae can be handled AI agents while core invariants can receive heavy human scrutiny.

IMO generative AI is here to stay in development so may as well get ahead of the game and start using these tools to try get the best out of it.

coldtea•34m ago
>the naive integration in which agents write code and humans remain the mandatory reviewers is a dead end because it neither provides meaningful assurance nor scales with AI-assisted throughput.

Who said it has to "scale with AI-assisted throughput"? AI can produce code all day, the goal is not to fill storage with AI code, is to make products, following product tradeoffs, timelines, and decisions.

AncoraImparo•23m ago
Ahhhhhhhhhh hahahahahahahahahahaha. No. We have all seen how bad these LLMs are at coding, let's not have it review it's own failed coding attempt, too.
synthesis•18m ago
Here is my take:

- I disagree with the contributions

- I suspect most of the paper was written / edited by "generative AI"

It is a shame that "researchers" have started using generative AI to such a large degree as it now masks the voice of the person. Generative AI tends to claim things that are not true and tends to use words that are unsuitable. This text leaves a bad taste in my mouth.

(Edits: formatting)

synthesis•11m ago
In a way, this discussion here is proof as to why human code review is necessary.

- The paper is garbage, and a human review process would reject it.

__As is the case in this discussion__. The sentiment of the discussion section (at the time of writing) seems to be in favor of rejecting this paper. (Of course, to make it a proper experiment, one would need to also give the paper to a "generative AI" reviewer to see if it would reject it or not, but I cannot bother.)

edwinjm•9m ago
It seems they didn’t even do any research. This is not science, it’s just an opinion.
vcarrico•4m ago
Ironically, the number of sentences in bold tells me that an AI wrote this with no human review.
nilirl•4m ago
And here I am huffing and puffing because the latest models keep adding nonsense to my codebase.

Have you seen the decisions LLMs make? They write code like the worst developers I know. They're lazy, short-sighted, and impossible to teach.

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

https://arxiv.org/abs/2606.23688
1•MediaSquirrel•26s ago•1 comments

The World Needs More Whimsy – The Big Banana Car [video]

https://www.youtube.com/watch?v=xvi0yjpa-es
1•coolwulf•45s ago•0 comments

Show HN: I explained 821 US patents in plain English, with JSON/Markdown

https://patentbrief.org
1•SahiAK•51s ago•0 comments

Hugging Face wants to become your artificial BFF (2017)

https://techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/
1•theanonymousone•1m ago•0 comments

LaTeX.wasm: LaTeX Engines in Browsers

https://www.swiftlatex.com/
2•theanonymousone•4m ago•0 comments

Animating a dialog element using the View Transition API

https://pqina.nl/blog/animating-the-dialog-element-using-view-transitions/
1•rikschennink•4m ago•0 comments

Wavacity: Audacity audio editor ported to WASM, under GNU GPL v2

https://wavacity.com/
2•theanonymousone•4m ago•0 comments

Finding a Feedback Loop: shipping my first prod agentic feature at Pair Team

https://pairteamtech.substack.com/p/finding-a-feedback-loop
3•robspairpears•6m ago•0 comments

Texas anti-ICE protesters sentenced to life in prison for protesting

https://www.theguardian.com/us-news/2026/jun/23/prairieland-ice-protesters-texas-sentenced
2•mrtesthah•7m ago•1 comments

Show HN: Pool – A lightweight financial account with members and rules

https://poolmoney.com/
1•inmygarage•8m ago•0 comments

Show HN: Browse design inspiration from terminal while Claude is thinking

1•reidotdev•8m ago•0 comments

Roald Amundsen

https://it.wikipedia.org/wiki/Roald_Amundsen
1•simonebrunozzi•8m ago•0 comments

Proton avoids internal IP fingerprinting

https://www.techradar.com/vpn/vpn-services/security-experts-reveal-proton-is-the-only-vpn-to-avoi...
2•chamsom•9m ago•0 comments

How the Dust Bowl Led to National Grasslands, Our Most Underrated Public Lands

https://ourpubliclandsandwaters.substack.com/p/how-the-dust-bowl-led-to-national
1•eightturn•9m ago•0 comments

Anosmia: Olfactory Intelligence

https://smelllm.com/
1•pcshah1996•12m ago•0 comments

AWS Lambda introduces MicroVMs: isolated sandboxes with full lifecycle control

https://aws.amazon.com/blogs/aws/run-isolated-sandboxes-with-full-lifecycle-control-aws-lambda-in...
3•piccirello•12m ago•0 comments

Worst-case European heat storylines generated using ensemble boosting

https://www.nature.com/articles/s43247-026-03699-2
1•littlexsparkee•12m ago•0 comments

Interview with Nick Bostrom

https://www.maxraskin.com/interviews/nick-bostrom
2•paulpauper•12m ago•0 comments

I Just Say Yes

https://anshulagx.substack.com/p/i-just-say-yes
1•paulpauper•13m ago•0 comments

Should People Avoid Whole-Body Screening Info?

https://www.astralcodexten.com/p/should-people-avoid-whole-body-screening
1•paulpauper•13m ago•0 comments

Throwing 107 GB and 5B fake rows of order data at DuckDB and Athena

https://fet.dev/posts/throwing-lots-of-data-on-duckdb/
2•b-man•19m ago•0 comments

Show HN: Proctor – signed isolation bundles for AI coding-agent benchmarks

https://github.com/dylanp12/proctor
2•dp12•20m ago•0 comments

Show HN: Sklearn-genetic-opt – evolutionary optimization for scikit-learn

https://rodrigo-arenas.github.io/Sklearn-genetic-opt/
3•rodrigo-arenas•20m ago•0 comments

Show HN: Persist OS – Durable decisions for AI code

https://github.com/Karthick-Ramachandran/persist-os
2•karthickrmchn•20m ago•0 comments

Ultra: An OS that aims for full ABI compatibility with Linux userland

https://github.com/UltraOS/Ultra
1•mrunix•22m ago•0 comments

Anthropic updates their terms to verify age or identity

https://www.anthropic.com/legal/privacy
44•arunc•23m ago•10 comments

Apple Shares Video on How Pro Surfers Use Apple Watch During Competition

https://www.macrumors.com/2026/06/23/apple-watch-world-surf-league/
1•Tomte•24m ago•0 comments

When Historical Fiction Is a Crime (2020)

https://newrepublic.com/article/160719/historical-fiction-crime-ahmet-altan-turkey
1•downbad_•24m ago•0 comments

Burp: A Universal Schema for Drift‑Free Reasoning

https://github.com/denisbailey-RS/BURP
1•ucroboticist•27m ago•1 comments

Death Is an Engineering Problem

https://originals.is/p/death-is-an-engineering-problem
1•MediaSquirrel•27m ago•1 comments