frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: cubic 2.0 – improving our AI code reviewer (3x more accurate,2x faster)

https://www.cubic.dev/blog/cubic-2.0
4•pomarie•1h ago
Hey HN, I'm Paul, the founder of cubic.dev, an AI code reviewer for complex codebases.

Over the past few months we’ve completely rebuilt our detection engine, and I wanted to share a few things we did to get more out of LLMs.

Context: cubic specializes in code reviews for teams with complex codebases, like Better Auth, Cal.com, and PostHog. Our users have high standards. It’s important that reviews have real depth and actually understand the codebase.

In the past, we've sometimes struggled with producing reviews that had deep insight into complex changes. It didn't feel like we were leaving comments that truly understood both the codebase and the intent behind the PR. If we pushed reasoning to the max, it could get there, but it would take ages, often 15+ minutes for a review, which many people disliked.

We've spent the last few months rebuilding our AI review engine from scratch, and we've completely redone how the reviewer works, bit by bit.

Like the Ship of Theseus, cubic ended up so different (and better) that we're releasing it as cubic 2.0.

I should say up front: I’m biased because I work on this, and part of the point is awareness. But the main reason I’m posting is that the work feels broadly useful if you’re building anything LLM-based where you need both quality and speed.

*Why this is a "2.0"*

We were optimizing for two things:

1. Higher signal reviews (comments people actually act on) 2. Lower latency

Quality: 3 months ago, about 20% of comments that cubic left would be addressed by a developer. We measure this by having an LLM look at commits after a cubic comment and judge whether the change implemented what cubic flagged. Today that number is 60%+. For some teams it’s over 90%.

Speed: median time to review a PR was roughly halved; P90 divided by 3.

*What we changed (the parts that mattered)*

1. Pre-mapping the codebase ("AI wiki")

A big inefficiency in LLM code review (and code writing) is that every PR forces the model to rebuild a mental map of the repo from scratch. In large repos, just figuring out "where am I" can consume a lot of context and tokens before you even get to reasoning about the diff.

We built an "AI wiki" that pre-maps the important parts of a codebase and reuses that as context for reviews.

As a side effect, the wiki is also useful to humans (and AIs through an MCP), especially for onboarding or for non-technical people trying to understand a system. Example Firecrawl: https://www.cubic.dev/wikis/firecrawl/firecrawl

2. External context tools, plus getting tool usage under control

We added tools to fetch external documentation when needed. The hard part was not adding the tool, it was getting the model to use it correctly. This took a lot of prompt iteration and guardrails, and it ended up being more important than we expected.

3. Learning loop, with more weight on senior reviewers

We leaned harder into learning from user interactions and feedback. One change that helped a lot recently was identifying the senior reviewers in an org and weighting their feedback more heavily. In practice, that made the system converge faster toward what "good" looks like for that team.

4. Sandbox snapshotting for large repos

On larger repos, we were wasting minutes on setup work (clone time, environment prep), and we were doing it for every PR. We added a snapshotting approach that cut a lot of that overhead.

Anyway, thanks for reading. Happy to answer questions about any of the above. I’d also love feedback from people who’ve tried AI code review tools:

* What made you keep them, or turn them off? * What metrics would you trust to measure "review quality"? * Where do current tools fail in ways that are genuinely harmful?

cubic is here (and free for public repos): https://cubic.dev/home

Comments

DenisDolya•1h ago
I really liked it - it hit the mark. The current balance works very well, and it genuinely surprised me. It provides more technical explanations than just basic checks; for example, compared to using a regular GPT or Claude. I may not be an experienced developer, but I can confidently say that your Cubic.dev is really powerful.
pomarie•1h ago
Thanks Denis!

Toward an Archive of Everyday Urban Observations

https://medium.com/@pub_84245/toward-an-archive-of-everyday-urban-observations-e3d3cfc5607a
1•kappasan•46s ago•0 comments

Scanning ultrasound as a neuromodulation therapy in Alzheimer's disease

https://pubmed.ncbi.nlm.nih.gov/41404527/
1•mpweiher•1m ago•0 comments

Open Sourcing Dicer: Databricks' Auto-Sharder

https://www.databricks.com/blog/open-sourcing-dicer-databricks-auto-sharder
2•vivek-jain•1m ago•1 comments

Ask HN: Vxlan over WireGuard or WireGuard over Vxlan?

2•mlhpdx•4m ago•0 comments

Claude Code Orchestrator v2.1 – Ralph Wiggums

https://github.com/reshashi/claude-orchestrator
1•shashimudunuri•5m ago•1 comments

Zero-sumness: a framework to reason about how to scale teams post AI

https://dust.tt/blog/build-vs-run
1•rafaepta•6m ago•0 comments

Ask HN: Are you underutilizing your insurance too?

1•nemath•7m ago•2 comments

Show HN: I built an email API because of limited domains and flaky inbound

https://www.simpleemailapi.dev/
1•hamzaawan•8m ago•1 comments

Uganda Cuts Internet Days Before Presidential Election

https://www.nytimes.com/2026/01/13/world/africa/uganda-election-internet.html
1•donohoe•9m ago•0 comments

Dept of Defense to embed Grok family of models into GenAI.mil

https://www.war.gov/News/Releases/Release/Article/4366573/the-war-department-to-expand-ai-arsenal...
3•toomanyrichies•10m ago•0 comments

VibeOs: You're still arguing about which model is better?

https://github.com/kaansenol5/VibeOS/blob/main/README.md
1•bakigul•11m ago•0 comments

SkyPilot: One system to use and manage all AI compute (K8s, 20 clouds, Slurm)

https://github.com/skypilot-org/skypilot
1•covi•12m ago•0 comments

Meta is closing down three VR studios as part of its metaverse cuts

https://www.theverge.com/news/861420/meta-reality-labs-layoffs-vr-studios-twisted-pixel-sanzaru-a...
3•jsheard•15m ago•1 comments

Instagram AI Influencers Are Defaming Celebrities with Sex Scandals

https://www.404media.co/instagram-ai-influencers-are-defaming-celebrities-with-sex-scandals/
12•cdrnsf•18m ago•1 comments

X Is a Power Problem, Not a Platform Problem

https://connectedplaces.online/reports/a-power-problem-not-a-platform-problem/
2•cdrnsf•20m ago•0 comments

Who remembers AWS Spot's auction era before the 2017 pricing change?

1•aleroawani•21m ago•0 comments

Chrome 142 Mixed Content Local Network Access

https://developer.chrome.com/blog/local-network-access
1•goodburb•22m ago•0 comments

Show HN: Async bulkhead for Java with explicit overload semantics (v0.3.0)

https://github.com/janbalangue/async-bulkhead
1•janbalangue•22m ago•1 comments

Creating a Cistercian Numerals Generator

https://christianheilmann.com/2026/01/13/monky-business-creating-a-cistercian-numerals-generator/
1•ArmageddonIt•23m ago•0 comments

First AI Directed Reality TV Show

https://twitter.com/Cookiesarefunnn/status/1986463874435178651
1•Nadav--Shanun•24m ago•0 comments

Show HN: Serverless Compute Platform for AWS

https://github.com/acikelli/hyperp
1•oacikelli•25m ago•0 comments

Why Keeping Score Isn't Fun Anymore

https://www.nytimes.com/2026/01/13/books/review/why-keeping-score-isnt-fun-anymore.html
1•anarbadalov•25m ago•0 comments

Chaldean American Fact Sheet

https://www.sterlingheights.gov/DocumentCenter/View/484/Getting-to-Know-Your-Chaldean-American-Ne...
1•marysminefnuf•27m ago•0 comments

Just the Browser: Remove AI features and other annoyances from web browsers

https://justthebrowser.com/
1•oneeyedpigeon•27m ago•0 comments

Analysing Footage of Minneapolis ICE Shooting

https://www.bellingcat.com/news/2026/01/13/analysing-footage-of-minneapolis-ice-shooting/
4•tastyface•28m ago•0 comments

Iran makes high-tech additions to its age-old playbook for crushing protests

https://www.cnn.com/2026/01/13/middleeast/iran-high-tech-additions-playbook-crushing-protests-intl
2•acjohnson55•30m ago•1 comments

Why India's plan to make AI companies pay for training data should go global

https://restofworld.org/2026/india-ai-data-license-fee/
2•brandrick•32m ago•0 comments

Show HN: MemSky: Bluesky timeline viewer web app that saves where you left off

https://memalign.github.io/m/memsky/index.html
1•memalign•34m ago•0 comments

Heltun Removed from Works with Home Assistant

https://www.home-assistant.io/blog/2026/01/13/partner-update-heltun/
1•solarist•34m ago•0 comments

A Galaxy You Can Dig: When Human-Scale Intuition Breaks

https://medium.com/@jud.dagnall/a-galaxy-you-can-dig-when-human-scale-intuition-breaks-e00c4f834e7d
1•saulpw•40m ago•0 comments