frontpage.

Started this weekend project because I was thinking about whether software engineers’ performance should be tied to business impact like other engineers (as opposed to having PMs focus on that and devs executing) and then started thinking about Dev KPIs.

Developer work is unusually public. Aside from git diffs, you can see PR comments, Linear threads, and get a sense of both the complexity of the work and how people collaborate.

I tried a little adversarial experiment: - Take recent commits and have an LLM guess the "spec" (simulating a ticket on Linear, without building this step) - Ask Claude Code to implement the same thing - Use another LLM to compare the two solutions blindly - If the LLM version is worse than the human version, keep giving it hints until it matches or exceeds the human contribution - More elaborate hints = higher complexity score - Evaluating comments is even simpler. I didn’t try an adversarial approach, but there’s no reason it wouldn’t work.

This turned into a small library I hacked together. You can score devs on repos for fun.

I wonder if managers use numbers simply because they can’t hold all the context of a person’s contributions, and so lose out on nuance. What if LLMs could hold all of the context of your work and give a fairer evaluation? Could we move away from PMs deciding the “what” and engineers deciding the “how”, to Engineers deciding both?

PRs welcome!

Behind the curtain of Anduril's product engineering machine

Show HN: A Minimal Hacker News Reader for Apple Watch Built with SwiftUI

AI is predominantly replacing outsourced, offshore workers

Statamic CMS Static Caching strategy: my notes to a fast website

Tencent Hunyuan-Large

Amikit on my pimped Amiga 500 [video]

The renewable energy revolution is a feat of technology – Rebecca Solnit

Basic dependency injection in OCaml with objects

Show HN: Kool-Fetch – A Lightweight Fetch Wrapper with Axios-Like Interceptors

Commodore Trademark Used for a Fan

Models are smart enough, your process isn't

AI Is Remixing Everything We've Ever Made [video]

Should Europe wean itself off US tech?

Show HN: AI-powered PR description generator

Compliance Test (2016) [video]

Miller-Rabin Speed Testing Examples

Show HN: Tortured Metaphor, an Open Source Game Studio

The Curse of the Generalist

524522 Zoozve

Ukraine gives award to foreign vigilantes for hacks on Russia

Neverthrow – Type-Safe Errors for JavaScript and TypeScript

The Declining Reach of LinkedIn Company Pages in 2025

Sütterlin

The Coming Robot Home Invasion

Lessons from a Shitty Gardener

Show HN: Stateful LLM inference (no cost for input tokens, not prompt-caching)

The attr() function in CSS now supports types

Show HN: A tool to check if a vibecoded app is hackable

What If AI Doesn't Get Better Than This?

Investors lose billions on US penny stocks as 'pump and dump' scams multiply

Show HN: Scoring dev work by how hard it is for AI to copy it