Devin Review: AI to Stop Slop

36•agtestdvn•2w ago

Comments

devin•2w ago

Apropos of nothing: I hate that they used my name for their product.

Y_Y•2w ago

People of the same name should have a union or something. It's apparently fine to annoy everyone in the world called Alexa, probably just because the world's Alexas can't act collectively.

swyx•2w ago

hahahah oh no

blibble•2w ago

so they graciously accept that "AI" agents generate slop (by the very title of the post)

so why would they be any better at reviewing?

Sharlin•2w ago

Easy! You just need a review-reviewer AI to check the review AI's output.

qayxc•2w ago

Bots all the way down...

gaodrew•2w ago

AI reviewing AI does lead to a cycle of ouroboros slop, that’s why Devin Review is actually more of a UI for you to read code easier, not like other code review tools that try to do all the reviewing themselves

Disclaimer: i work here

rednafi•2w ago

I’m all in for building more intuitive UIs to make the review process less cumbersome. But with the current capabilities of LLMs, under no circumstances should we allow AI to be the final judge of whether something should be merged into trunk.

Code review is the last line of defense we have against our systems being invaded by the massive amount of slop that’s getting generated left and right.

Instead of trying to automate the code review process, maybe we should spend more energy on making the scaffolding around it better: better diff tools, semantically grouped files (as Devin mentioned), and better UI for large diffs (GitHub’s UI is horrible for anything beyond a thousand lines).

gaodrew•2w ago

Agreed!

ninjha•2w ago

(I work at Cognition, opinions my own etcetc)

True! Devin Review doesn’t make the kind of judgements you mention, it just does its best to find bugs and help you understand the code faster. I managed to review a PR on an airplane (without starlink) with it earlier this week lol

rednafi•2w ago

Yeah, I wasn’t alluding to Devin reviewing and merging the changelog. This was more of a general statement, since a lot of code review tools seem to get this part wrong.

A lot of energy is being spent on making reviews faster, when reviews are intentionally meant to scale sublinearly. The goal should be: how can we make the process more convenient and less error-prone?

illnewsthat•2w ago

> Devin Review is free and available for PRs on regular GitHub repositories (not GitHub Enterprise). Public PRs don’t require a Devin account.

I guess the tokens are cheap enough or their pockets are deep enough, but this still seems surprising. I guess they can chalk it up to a marketing cost.

nl•2w ago

Asynchronous tokens are significantly cheaper than ones where you need them immediately at a high rate.

There a lots of free, high quality models on OpenRouter too, BTW...

briga•2w ago

I can foresee a future of induced demand, where by making PRs "easier" to review, you will end up with way more PRs to review, leading PR backlogs as backed up with PRs as ever. Except now dev teams will have trust-me-bro LLM reviews convincing them that they don't actually need to do full code reviews on code they're putting into production. What could go wrong?

gaodrew•2w ago

Very good point. So when we designed this we actually had that in mind. Devin Review is not supposed to replace your judgment and “give the answer”. It just organizes the PR in a way that makes it way easier for YOU to understand.

briga•2w ago

I was being partly facetious and I think this is probably the way things are going. I guess it's just hard to stomach that devs will end up relying on these tools more than their own intuition. But I suppose that ship has sailed already for a lot of people.

servercobra•2w ago

Overall I've been really impressed with Devin. IMO it's the best tool for AI generating features if you know what you're looking for, have patterns to follow, etc. I suspect the context they build about your project helps a ton.

I was literally just working on a system, using Devin to do the review no less, to add a bunch of the rules we have that are outside of linting's capability to tackle the same kind of thing. Tools like Copilot and Qodo have very high noise ratios, but do occasionally catch legit bugs. Devin Review could be a great complement, and hopefully they'll make it so we can add our own rules soon.

xnx•2w ago

"Devin" has negative brand value.

hrimfaxi•2w ago

Why?

esafak•2w ago

I think the OP's alluding to the initial hype about Devin replacing software engineers.

gaodrew•2w ago

I work at Cognition, lmk any feedback, will share with the team!

devin•2w ago

I don’t like the name of the product.

mattbergland•2w ago

Hey get back to work on my pr

libraryofbabel•2w ago

Devin? Now that's a name I've not heard in a long time...a lonnng time.

Seriously, in this age of Claude Code and Codex, does anyone use Devin, or even know someone who does? Do they have any users at all?

Ironically, their product has probably got massively better in the last couple of years, because the underlying LLMs got massively better at coding and long-context tasks. But that doth not a successful business model make, and unless you’re Cursor (and even then I’m not so sure) this is a very very hard space to succeed in without owning your own frontier model (i.e being Anthropic, OpenAI, or Google).

esafak•2w ago

I use their deepwiki often.

ninjha•2w ago

yeah there is apparently not a lot of overlap between hn/twitter users and devin users, and we don’t really do marketing campaigns either

logos on website if you want to see some of our customers lol

Der_Einzige•2w ago

We wrote the actual paper on “stopping slop”

https://arxiv.org/abs/2510.15061

snowmobile•2w ago

> code review—not code generation—is now the bottleneck to shipping great products.

Unsurprising, since a human still needs to understand and verify the code, be that as it's written or as it's reviewed. AI's only managed to move the brainpower required from the fun part to the tedious and boring part.

sjajshha•2w ago

Eh, code review has _always_ been the bottleneck (both for the author and any other reviewers). Pulling the agent slot machine for anything remotely challenging is just inflicting pain for no reason on yourself - if quality matters. If not, let it rip.

Otherwise, you’re gonna have to read every line (including those not in the diff) anyways. Typing it out - or getting the AI to do it at a speed you can comprehend - isn’t a meaningful slowdown at all.

joshstrange•2w ago

I wanted to look into their pricing for Devin+ and I have to say, ACU are entirely too opaque/confusing/complicated. The entire description of them is shrouded in mystery. And this part confuses me even more:

> Aside from the few ACUs required to keep the Devin VM running, Devin will not consume ACUs when:

> Waiting for your response

> Waiting for a test suite to run

> Setting up and cloning repositories

Ok, that kind of makes sense, but what does "the few ACUs required to keep the Devin VM running" mean? These cost $2.50/ea so "a few" means $5+ and on what time scale? Daily? Monthly?

The lowest plan comes with $20 ACUs but they don't list anywhere how far that gets you or even rough examples. I guess if you want to kick the tires $20 isn't a crazy amount to test it out yourself and maybe I'm just not the target market (I kind of feel like I am though?) but I wish their pricing made sense.

samyok•2w ago

Have been using Devin Review for a little bit, and I think it's the first of the many "code review" LLM-bots that have come out that doesn't actively feel like "slop". Seems like they must have some integrations with codemaps or deepwiki (the Cognition products I use most often) to power the insights.

My favorite feature has been organizing the files by "logical flow" rather than alphabetically, which feels like such a tiny change but it's such a huge QOL upgrade. A lot of the features seem inspired by Graphite, which is also really enjoyable.

nl•2w ago

I just tried this on a production PR and I liked it. It found some things that Claude review missed, but missed some that Codex review found.

I'd rank this the best of the three.

Generally I actually like Gemini reviews a lot (I guess code review tasks stop it going off track drunk like Gemini tends to do when coding?) but at the moment for some reason my Gemini auth is broken and I can't work out how to fix it. Yay Google.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

OpenClaw Is Changing My Life

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Total surface area required to fuel the world with solar (2009)

Software factories and the agentic moment

LLMs as the new high level language

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

First Proof

Substack confirms data breach affects users’ email addresses and phone numbers

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Vouch

Al Lowe on model trains, funny deaths and working with Disney

Start all of your commands with a comma (2009)

Show HN: A luma dependent chroma compression algorithm (image compression)

LineageOS 23.2

The AI boom is causing shortages everywhere else

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

OpenClaw Is Changing My Life

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Total surface area required to fuel the world with solar (2009)

Software factories and the agentic moment

LLMs as the new high level language

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

First Proof

Substack confirms data breach affects users’ email addresses and phone numbers

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Vouch

Al Lowe on model trains, funny deaths and working with Disney

Start all of your commands with a comma (2009)

Show HN: A luma dependent chroma compression algorithm (image compression)

LineageOS 23.2

The AI boom is causing shortages everywhere else

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Devin Review: AI to Stop Slop

Comments