It is a separation-of-duties argument for code written by AI agents and also reviewed by the same AI agents.
If you ask Agent1 to write code, the PR diff coming out of it should really be reviewed by Agent2 (or 3 or whatever), the reviewer should be a different system than the code gen agent.
The core claim is that a model is a poor judge of its own work — it shares the same priors and blind spots that produced a bug, so self-review trends toward a shallow review rather than catching blind spots.
The system that writes the code shouldn't also be the thing that signs off that it's safe to ship.
yxz123•5m ago
spot on! You shouldn't be grading your own homework
smb06•1h ago
If you ask Agent1 to write code, the PR diff coming out of it should really be reviewed by Agent2 (or 3 or whatever), the reviewer should be a different system than the code gen agent.
The core claim is that a model is a poor judge of its own work — it shares the same priors and blind spots that produced a bug, so self-review trends toward a shallow review rather than catching blind spots.
The system that writes the code shouldn't also be the thing that signs off that it's safe to ship.