frontpage.

I've grown increasingly skeptical that public coding benchmarks tell me much about which model is actually worth paying for and worried that as demand continues to spike model providers will silently drop performance.

I did a few manual analyses but found it non-trivial to compare across models due to difference in token caching and tool-use efficiency and so wanted a tool for repeatable evaluations.

So the goal was an OSS tool get data to help answer questions like:

“Would Sonnet have solved most of the issues we gave Opus? "How much would that have actually saved?” “What about OSS models like Kimi K2.5 or GLM-1?” “The vibes are off, did model performance just regress from last month?”

Right now the project is a bit medium-rare - but it works end-to-end. I’ve run it successfully against itself, and I’m waiting for my token limits to reset so I can add support for more languages and do a broader run. I'm already seeing a few cases where I could've used 5.4-mini instead of 5.4 for some parts of implementation.

I’d love any feedback, criticism, and ideas. I am especially interested if this is something you might pay for as a managed service or if you would contribute your private testcases to a shared commons hold-out set to hold AI providers a bit more accountable.

https://repogauge.org hi@repogauge.org https://github.com/s1liconcow/repogauge

Thanks! David

Meta targets May 20 for first wave of layoffs; additional cuts later in 2026

Listening in on the brain's electrical conversations with better tools

Show HN: Jean2 – An Open-Source Agent You Assemble Like Lego

Sam Altman Is Dangerously Disconnected from Reality

Composing a Search Engine

What if database branching was easy?

Dennis Ritchie's PhD Dissertation [pdf]

Two Motorola Transistors Became the Default NPNs

Nature is our source of randomness: on the death of Michael O. Rabin

Should you feed child guests dinner?

Breakthrough takes big step toward safe, reversible male contraception

The Linux Slayers: Office, Photoshop, and AutoCAD

Zymacs on Writing .gitignore Files

FP-DSS: Floating Point Divider State Sampling

Ask HN: What's the most profitable SaaS in this decade?

Carleton College Cookie House

Turn a Kindle ASIN export into an Audible wishlist, automatically

Machine Generated and Checked Proofs for a Verified Compiler (Experience Report)

Rostra – P2P Social Network

A Common MVP Evolution: Service to System Integration to Product

Up to 8M Bees Are Living in an Underground Network Beneath This Cemetery

Stalwart-Sentinel – A physics-based logic gate to stop AI hallucinations

Sell your private code to the big labs

Machine-Generated Code Deserves Machine-Checked Proofs

Show HN: Agents.ml – a public identity page and A2A card for your AI agent

Loot Aura: The free map for finding yard sales and estate sales near you

Self-Hosting

Quake 3 Arena for Jira and Confluence

Saving a Lost Generation of Young Men – With Chop Saws

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

Show HN: RepoGauge – save token costs and compare agents on your own repos