Is this a post about AI archeology?
they are definitely useful but they miss the things that are hard to encode in tests, like spec/intent alignment, scope creep, adherence to codebase patterns, team preferences (risk tolerance, etc)
and those factors are really important. which means that test-evals should be relied upon more as weak/directional priors than as definitive measures of real-world usefulness
love2read•1h ago
refulgentis•1h ago
yorwba•57m ago
In any case, the blinding didn't stop Reviewer #2 from calling out obvious AI slop. (Figure 5)
collabs•41m ago
If you look at the comment it says what the code following the comment does. It doesn't matter whether it is a human or a machine that wrote it. It is useless. It is actually worse than useless because if someone needs to change the code, now they need to change two things. So in that sense, you just made twice the work for anyone who touches the code after you and for what benefit?
zozbot234•29m ago
spartanatreyu•23m ago
That "appears" is doing a lot of heavy lifting.
The code working isn't what's being selected for.
The code looking convincing IS what is being selected for.
That distinction is massive.