Show HN: We beat Google, Cognition, Claude Code at codebase docs generation

https://prode.ai/blogs/we-benchmarked-ai-code-documentation-tools-prode-scored-highest

2•curious_nile•2h ago

Comments

curious_nile•2h ago

I'm Nilesh. My brother Abhishek and I built ProdE. Carnegie Mellon and IIT Delhi.

We benchmarked four AI code documentation tools: ProdE, DeepWiki, Claude Code, and Google Code Wiki. ProdE scored highest on usefulness for coding agents. 15% ahead of DeepWiki, 38% ahead of Google, 40% ahead of Claude Code.

I know this might feel like self praise, but we couldn't find an existing benchmark to use, so created one ourselves and open sourced it.

The biggest gap is coverage. Coding agents can only answer questions about parts of the codebase that are documented. If your docs cover routing but skip middleware, every middleware question becomes a hallucination. ProdE documents 114-140 files per project. Claude Code covers 13-17. So agents using Claude Code's docs are blind to roughly 90% of the codebase.

Zero hallucinations across all 9 evaluations. Every file path, function reference, and claim we checked pointed to real code. So it's not just that we cover more, what we cover is also accurate.

DeepWiki did really well here -- 5x more diagrams per project than us, best visual docs by far. Claude Code had the strongest writing quality of the four.

Honestly, if I saw this post I'd also assume the vendor rigged it. So here's everything we did to make it not that. Claude Opus judges all four tools using a published rubric. Claude Code's output was renamed to doc_x/ so the judge couldn't tell it was Claude Code. ProdE launched after Claude's training cutoff, so the judge had no prior knowledge of our tool. We don't use Claude anywhere in our pipeline. 9 evaluation passes across 3 open-source repos (FastAPI, Pydantic, Mermaid), all pinned to exact commits to tackle the non deterministic outputs.

We scored usefulness for coding agents and readability for humans as separate things, because what makes docs good for agents is different from what makes them good for humans. Agents need lots of references to specific files and functions. Humans need clear writing and good diagrams. The tool with the best writing scored lowest on usefulness for agents. Ofcourse the usefulness for Humans is better judged by humans.

Blog (full analysis): https://prode.ai/blogs/we-benchmarked-ai-code-documentation-... Repo (everything, run it yourself): https://github.com/abhishek-curiousboxai/code-documentation-... You can fork it and re-run. Everything is MIT licensed.

govarun•33m ago

this is amazing. Curious what you guys did different

2× – nine months later: We did it

Turn Your Codebase into a Podcast

Our Long Love Affair with Gold

Two inmates at an Ohio prison built a secret hacking operation from behind bars [pdf]

Show HN: Launchy – A Next.js template for weekly launch directories

Graupel

Playdate for Education

Show HN: Compiler outputs HTML for code display

The Quantity Trap: The Dangerous Disconnect Between AI Supply and User Demand

The Big Reveal in China's New Five-Year Plan

Android CLI: Build Android apps 3x faster using any agent

Show HN: Online Sound Decibel Meter

Thinking about building agents for humans

Zipper: the archival utility for macOS you didn't know you needed

Ask HN: How do you maintain flow when vibe coding?

What's the point of the App Store, if it can't protect users?

Ask HN: To open-source, or not to open-source

openDoJa — full reimplementation of DoCoMo's DoJa SDK in modern Java

Future Long Range Assault Aircraft Officially Named MV-75 Cheyenne II

Text of OS age verification bill (HR 8250) [pdf]

Gravtory – crash-proof Python workflows on your existing database

Slint 1.16 Released

Stakes high as Supreme Court set to rule on Monsanto's weed-killing pesticide

Fire risks and ugly designs are stalling EV charger adoption

Show HN: HyperFrames – Render Video from HTML via Chrome's BeginFrame API

Ask HN: How to Launch First SaaS

Djangocon EU: when SaaS is not allowed: shipping Django as a desktop app

Show HN: Claude Opus 4.7: Everything You Need to Know

The Unpleasant Side of Life with Horses in Cities

How a subsea cable is repaired