The idea: instead of one LLM pass that drops comments (like CodeRabbit/Sourcery), triplecheck runs a full loop:
1. Reviewer finds bugs → structured findings with file, line, severity 2. Coder writes actual patches (search/replace diffs, not suggestions) 3. Tests run automatically to catch regressions 4. Loop until no new findings or max rounds 5. Judge scores the final result 0–10
The key insight: with local LLMs, compute is free, so you can afford to be thorough. Run 5 review passes from different angles, vote to filter noise, let the coder fix everything, and re-review until clean. Try doing that with a $0.03/1K token API.
What works well: - Qwen3-Coder on vLLM/Ollama handles reviewer + coder surprisingly well - Multi-pass voting genuinely reduces false positives — 3 passes agreeing > 1 pass guessing - Tree-sitter dependency graph means the reviewer sees related files together, not random batches - Scanned a 136K-line Go codebase (70 modules) — found real bugs, not just style nits
What's missing (honest): - No GitHub PR integration yet (CLI only — you run it, read the report). This is the #1 gap vs CodeRabbit. It's on the roadmap. - No incremental/diff-only review — it reviews whole files. Fine for local LLMs (free), wasteful for cloud APIs. - Local LLMs still hallucinate fixes sometimes. The test gate catches most of it, but you should review the diff before merging.
Stack: Python, Click CLI, any OpenAI-compatible backend. Works with vLLM, Ollama, LM Studio, DeepSeek, OpenRouter, Claude CLI. Mix and match — e.g. local Qwen running on M3 Ultra for reviewer/coder + Claude for judge.
Would love feedback, especially from anyone running local models for dev tools. What review capabilities would make you actually use this in your workflow?