AI coding agents have made code generation nearly free, and they’ve shifted the bottleneck to code review. Static-only analysis with a fixed set of checkers isn’t enough. LLM-only review has several limitations: non-deterministic across runs, low recall on security issues, expensive at scale, and a tendency to get ‘distracted’.
We spent the last 6 years building a deterministic, static-analysis-only code review product. Earlier this year, we started thinking about this problem from the ground up and realized that static analysis solves key blind spots of LLM-only reviews. Over the past six months, we built a new ‘hybrid’ agent loop that uses static analysis and frontier AI agents together to outperform both static-only and LLM-only tools in finding and fixing code quality and security issues. Today, we’re opening it up publicly.
Here’s how the hybrid architecture works:
- Static pass: 5,000+ deterministic checkers (code quality, security, performance) establish a high-precision baseline. A sub-agent suppresses context-specific false positives.
- AI review: The agent reviews code with static findings as anchors. Has access to AST, data-flow graphs, control-flow, import graphs as tools, not just grep and usual shell commands.
- Remediation: Sub-agents generate fixes. Static harness validates all edits before emitting a clean git patch.
Static solves key LLM problems: non-determinism across runs, low recall on security issues (LLMs get distracted by style), and cost (static narrowing reduces prompt size and tool calls).
On the OpenSSF CVE Benchmark [1] (200+ real JS/TS vulnerabilities), we hit 81.2% accuracy and 80.0% F1; vs Cursor Bugbot (74.5% accuracy, 77.42% F1), Claude Code (71.5% accuracy, 62.99% F1), CodeRabbit (59.4% accuracy, 36.19% F1), and Semgrep CE (56.9% accuracy, 38.26% F1). On secrets detection, 92.8% F1; vs Gitleaks (75.6%), detect-secrets (64.1%), and TruffleHog (41.2%). We use our open-source classification model for this. [2]
Full methodology and how we evaluated each tool: https://autofix.bot/benchmarks
You can use Autofix Bot interactively on any repository using our TUI, as a plugin in Claude Code, or with our MCP on any compatible AI client (like OpenAI Codex).[3] We’re specifically building for AI coding agent-first workflows, so you can ask your agent to run Autofix Bot on every checkpoint autonomously.
Give us a shot today: https://autofix.bot. We’d love to hear any feedback!
---
[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark