Show HN: cubic 2.0 – improving our AI code reviewer (3x more accurate,2x faster)

https://www.cubic.dev/blog/cubic-2.0

4•pomarie•3w ago

Hey HN, I'm Paul, the founder of cubic.dev, an AI code reviewer for complex codebases.

Over the past few months we’ve completely rebuilt our detection engine, and I wanted to share a few things we did to get more out of LLMs.

Context: cubic specializes in code reviews for teams with complex codebases, like Better Auth, Cal.com, and PostHog. Our users have high standards. It’s important that reviews have real depth and actually understand the codebase.

In the past, we've sometimes struggled with producing reviews that had deep insight into complex changes. It didn't feel like we were leaving comments that truly understood both the codebase and the intent behind the PR. If we pushed reasoning to the max, it could get there, but it would take ages, often 15+ minutes for a review, which many people disliked.

We've spent the last few months rebuilding our AI review engine from scratch, and we've completely redone how the reviewer works, bit by bit.

Like the Ship of Theseus, cubic ended up so different (and better) that we're releasing it as cubic 2.0.

I should say up front: I’m biased because I work on this, and part of the point is awareness. But the main reason I’m posting is that the work feels broadly useful if you’re building anything LLM-based where you need both quality and speed.

*Why this is a "2.0"*

We were optimizing for two things:

1. Higher signal reviews (comments people actually act on) 2. Lower latency

Quality: 3 months ago, about 20% of comments that cubic left would be addressed by a developer. We measure this by having an LLM look at commits after a cubic comment and judge whether the change implemented what cubic flagged. Today that number is 60%+. For some teams it’s over 90%.

Speed: median time to review a PR was roughly halved; P90 divided by 3.

*What we changed (the parts that mattered)*

1. Pre-mapping the codebase ("AI wiki")

A big inefficiency in LLM code review (and code writing) is that every PR forces the model to rebuild a mental map of the repo from scratch. In large repos, just figuring out "where am I" can consume a lot of context and tokens before you even get to reasoning about the diff.

We built an "AI wiki" that pre-maps the important parts of a codebase and reuses that as context for reviews.

As a side effect, the wiki is also useful to humans (and AIs through an MCP), especially for onboarding or for non-technical people trying to understand a system. Example Firecrawl: https://www.cubic.dev/wikis/firecrawl/firecrawl

2. External context tools, plus getting tool usage under control

We added tools to fetch external documentation when needed. The hard part was not adding the tool, it was getting the model to use it correctly. This took a lot of prompt iteration and guardrails, and it ended up being more important than we expected.

3. Learning loop, with more weight on senior reviewers

We leaned harder into learning from user interactions and feedback. One change that helped a lot recently was identifying the senior reviewers in an org and weighting their feedback more heavily. In practice, that made the system converge faster toward what "good" looks like for that team.

4. Sandbox snapshotting for large repos

On larger repos, we were wasting minutes on setup work (clone time, environment prep), and we were doing it for every PR. We added a snapshotting approach that cut a lot of that overhead.

Anyway, thanks for reading. Happy to answer questions about any of the above. I’d also love feedback from people who’ve tried AI code review tools:

* What made you keep them, or turn them off? * What metrics would you trust to measure "review quality"? * Where do current tools fail in ways that are genuinely harmful?

cubic is here (and free for public repos): https://cubic.dev/home

Comments

DenisDolya•3w ago

I really liked it - it hit the mark. The current balance works very well, and it genuinely surprised me. It provides more technical explanations than just basic checks; for example, compared to using a regular GPT or Claude. I may not be an experienced developer, but I can confidently say that your Cubic.dev is really powerful.

pomarie•3w ago

Thanks Denis!

shawabawa3•3w ago

How can I access cubic's wiki for a repo?

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam