Been working with spec-driven development since it first emerged (OpenSpec fan), and kept running into the same gap: specs get written, AI writes the code, but validating the implementation against the spec is still a manual slog.
So I put together a tool that runs multi-agent code reviews inside agentic AI environments (Claude Code, Cursor, Windsurf, etc).
The idea is simple – instead of one reviewer, you get a few different perspectives, modeled after common engineering teams: someone thinking architecturally, someone focused on code quality, another with security expertise, a QA engineer paying close attention to testing. They review changes independently, then discuss before synthesizing feedback. You can also configure custom reviewers and adjust redundancy if you want multiple sub-agents for a given role.
I originally built it to complement SDD tools like OpenSpec, but it drops into any codebase within an agentic environment (agentic IDE or agentic terminal).
My team has been running it against some larger codebases (~500k lines) and it's been genuinely useful – not as a replacement for human review, but as a way to catch things earlier. No doubt it has already saved us quite a bit of time.
mrxdev•1h ago
So I put together a tool that runs multi-agent code reviews inside agentic AI environments (Claude Code, Cursor, Windsurf, etc).
The idea is simple – instead of one reviewer, you get a few different perspectives, modeled after common engineering teams: someone thinking architecturally, someone focused on code quality, another with security expertise, a QA engineer paying close attention to testing. They review changes independently, then discuss before synthesizing feedback. You can also configure custom reviewers and adjust redundancy if you want multiple sub-agents for a given role.
I originally built it to complement SDD tools like OpenSpec, but it drops into any codebase within an agentic environment (agentic IDE or agentic terminal).
My team has been running it against some larger codebases (~500k lines) and it's been genuinely useful – not as a replacement for human review, but as a way to catch things earlier. No doubt it has already saved us quite a bit of time.
```bash npx @open-code-review/cli init /ocr-review ```
Works as a pre-push sanity check or wired into CI.
Though warning: your token expenditure WILL increase, especially if you ramp up the number of reviewer types or their redundancy configs.
That said, curious what others think – is this useful? What's missing? Happy to hear feedback or ideas.
https://github.com/spencermarx/open-code-review