Also asked on Telegram, but Hacker News may have additional input:
I've just begun, since this morning, to wonder what I realise is a basic question I've never seen: what's the longest/largest task a human can do with n% accuracy?
For big tasks, we break them down, so we often *don't* do one huge single task. No one person actually makes an entire biro, or even an entire pencil; a human can write something like DOOM, but not usually by themselves, especially bug-free as even Carmak got help testing from the rest of id.
Is it perhaps possible to work this out from the same data used in the METR model itself? Were there tasks which several humans attempted, but only half of those humans succeeded at?
ben_w•1h ago
I've just begun, since this morning, to wonder what I realise is a basic question I've never seen: what's the longest/largest task a human can do with n% accuracy?
For big tasks, we break them down, so we often *don't* do one huge single task. No one person actually makes an entire biro, or even an entire pencil; a human can write something like DOOM, but not usually by themselves, especially bug-free as even Carmak got help testing from the rest of id.
Is it perhaps possible to work this out from the same data used in the METR model itself? Were there tasks which several humans attempted, but only half of those humans succeeded at?