Apparently this is in support of their 2.0 release: https://www.qodo.ai/blog/introducing-qodo-2-0-agentic-code-r...
> We believe that code review is not a narrow task; it encompasses many distinct responsibilities that happen at once. [...]
> Qodo 2.0 addresses this with a multi-agent expert review architecture. Instead of treating code review as a single, broad task, Qodo breaks it into focused responsibilities handled by specialized agents. Each agent is optimized for a specific type of analysis and operates with its own dedicated context, rather than competing for attention in a single pass. This allows Qodo to go deeper in each area without slowing reviews down.
> To keep feedback focused, Qodo includes a judge agent that evaluates findings across agents. The judge agent resolves conflicts, removes duplicates, and filters out low-signal results. Only issues that meet a high confidence and relevance threshold make it into the final review.
> Qodo’s agentic PR review extends context beyond the codebase by incorporating pull request history as a first-class signal.
A lot of this stuff is really new, and we will need to find ways to standardize, but it will take time and consensus.
It took 4 years after the release of the automobile to coin the term milage to refer to miles driven per unit of gasoline. We will in due time create the same metrics for AI.
Nope, no mention of how they do anything to alleviate overfitting. These benchmarks are getting tiresome.
The question auditors actually ask isn't "did your tool catch this bug?" It's "can you prove this change was reviewed, by whom, and that the code didn't change between review and merge?"
None of the tools benchmarked here produce verifiable evidence. They produce comments. A green checkmark on a PR tells you someone clicked a button. It doesn't tell you what they saw, whether the diff changed after review, or what risk level the change carried.
We took a different approach: instead of building another AI reviewer, we built a governance layer that wraps whatever review process you already use. Every PR gets a cryptographically sealed evidence bundle -- the exact diff, risk tier (L0-L4), findings, and a SHA-256 hash chain. Verifiable offline with one command. Open source, Apache 2.0.
https://github.com/DNYoussef/codeguard-action
Not a replacement for code review tools. An audit trail for them.
With SOTA missing, it also is a strong indicator that someone like you is not the target audience.
Agents are pretty good at suggesting ways to improve a piece of code though, if you get a bunch of agents to wear different hats and debate improvements to a piece of software it can produce some very useful insights.
Still early in development and has a much simpler goal, but I like simple things that work well.
I know this is focused solely on performance, but cost is a major factor here.
Merged PRs being considered good code?
- vX.X.1 releases. when software was considered perfect but author had to write a fast follow up fix. very real bugs with real fixes
- Reverts. I'm sure anyone doing AI code review pays attention to this already. This is a sign of bad changes, but as important.
- PRs that delete a lot of code. A good change is often deleting code and making things simpler