Nope, no mention of how they do anything to alleviate overfitting. These benchmarks are getting tiresome.
I know this is focused solely on performance, but cost is a major factor here.
Story as old as time.
Apparently this is in support of their 2.0 release: https://www.qodo.ai/blog/introducing-qodo-2-0-agentic-code-r...
> We believe that code review is not a narrow task; it encompasses many distinct responsibilities that happen at once. [...]
> Qodo 2.0 addresses this with a multi-agent expert review architecture. Instead of treating code review as a single, broad task, Qodo breaks it into focused responsibilities handled by specialized agents. Each agent is optimized for a specific type of analysis and operates with its own dedicated context, rather than competing for attention in a single pass. This allows Qodo to go deeper in each area without slowing reviews down.
> To keep feedback focused, Qodo includes a judge agent that evaluates findings across agents. The judge agent resolves conflicts, removes duplicates, and filters out low-signal results. Only issues that meet a high confidence and relevance threshold make it into the final review.
> Qodo’s agentic PR review extends context beyond the codebase by incorporating pull request history as a first-class signal.
CuriouslyC•1h ago
Agents are pretty good at suggesting ways to improve a piece of code though, if you get a bunch of agents to wear different hats and debate improvements to a piece of software it can produce some very useful insights.