What 23K vulnerabilities reveal about audit report quality in Web3

https://colab.research.google.com/drive/1Wp4yyEmXYjHATak7Bmy2lf6DNgyjlxgI?usp=sharing

2•zaevlad•1h ago

Comments

zaevlad•1h ago

Over the past year I built and analyzed a dataset of 23K+ vulnerabilities extracted from smart contract audit reports published between 2023 and 2025. Sources include private auditors, audit firms, and competitive platforms such as Code4rena and Sherlock.

The dataset was cleaned before analysis: 99% of Informational-severity findings and ~40% of Low-severity were removed, as they consistently lacked sufficient detail to be informative.

The goal was to quantify report quality — not just flag vulnerabilities, but measure how well each one is documented. This became the foundation for a RAG-based audit assistant I've been building, where data quality has an outsized effect on output quality.

Scoring methodology:

Each finding was scored on three primary dimensions — description depth, remediation quality, and presence of a PoC. PoC carried the highest weight, as it is the most reliable signal of a useful report. Solidity snippets and severity level contributed additional points. Raw scores (0–15) were log-normalized to 0–1 to prevent score concentration at the top.

Key findings:

— Total findings analyzed: 23,625 — Mean score: 0.32 | Median: 0.27 — Distribution is multimodal with three distinct quality tiers (~0.05, ~0.25, ~0.60) — ~25% of findings score above 0.51 — these form the high-quality tier ("golden data fund") — All three normality tests confirm the distribution is significantly non-Gaussian

Most counterintuitive result: Critical-severity bugs score lower on average (0.33) than High-severity ones (0.53). Critical findings tend to be reported as brief alerts without PoC — the severity speaks for itself, so the write-up gets less attention. High findings, by contrast, typically include more thorough documentation. This is a problem: the bugs most likely to cause catastrophic losses are often the least well-documented.

What this means in practice:

The three-peak distribution reflects real behavioral patterns in how auditors write reports. The first cluster (scores ~0.05) represents minimal one-liner findings with no context. The second (~0.25) covers standard reports with a description but no PoC. The third (~0.60) is the minority that includes everything: a clear description, remediation steps, and working exploit code. Only this last group is genuinely useful for both AI training and human review.

For the broader ecosystem, the takeaway is uncomfortable: the current standard of audit reporting leaves most findings underexplained. A well-documented bug with a PoC can be understood, reproduced, and fixed in hours. A vague one-liner can stay misunderstood for weeks — or get silently ignored in the next audit cycle.

If you want to see the full distribution charts and statistics for yourself, I put together an interactive notebook with all the visualizations:

https://colab.research.google.com/drive/1Wp4yyEmXYjHATak7Bmy...

Open to questions on methodology or dataset composition.

Show HN: Isola – Open-source sandboxing on Kubernetes

Ultra-fast 1024-bit prime generator via Hilbert-Pólya spectral law

MS-DOS TUIs from the late 80s and early 90s

They Built a Legendary Privacy Tool. Now They're Sworn Enemies

How to Learn Programming in 2026

Trump Considers Bailing Out His Family's Major Business Partner

Why Crystal, 10 Years Later: Performance and Joy

Show HN: Turn Any Webpage into a Game

AI Agent Memory Explained in 3 Levels of Difficulty

Ratsissimo – AI-Powered Singing Rat Trap

What do flying cars and AI innovation have in common?

Worse Is Better

The Bitter Lesson of Agentic Coding

The sonic anatomy of a double tap strike

I froze a TCP connection for 10 minutes to migrate a live server

The United States Is Repeating Its Silicon Mistake with Gallium Nitride

Wait Is Over – Coreboot on the AMD StarBook – Star Labs

I'm Sorry, Dave. I'm Afraid I Can't De-Escalate: On (AI) Wargaming, Nuclear War

GridMove for macOS: Move or snap windows by dragging from anywhere inside them

Nobel Lecture: On the possibility of progress (2019)

We OCR'ed 30k papers using Codex, open OCR models and Jobs

Consider the Chairmaker

The most underrated distribution channel in SaaS is hiding in browser toolbar

Turing Award Winner - Mike Stonebraker: Postgres, Disagreeing with Google [video]

Show HN: A stateless search proxy using Cloudflare Workers

The Timelessness of TUIs

Websites break California privacy law at 'industrial scale,' survey finds

Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return

AI Tool Rips Off Open Source Software Without Violating Copyright

Adobe Unveils Agents for Businesses Amid Threat of AI Disruption