Humans are tool users. If you make a statistical table to consult for some medical issue, you've using a tool.
At any rate, I'm curious on some of the readings this post brings up. I'm also vaguely remembering that human's can have some odd behaviors where requiring justification or reasoning of decisions can sometimes provide more predictable decisions; but at a cost that you may not fully explore viable decisions.
Or maybe even be in a domain which, for whatever reason, is poorly represented by a statistical model, something where data points are hard to get.
The most recent example I can think of is "Frank". In 2021, JPMorgan Chase acquired Frank, a startup founded by Charlie Javice, for $175 million. Frank claimed to simplify the FAFSA process for students. Javice asserted the platform had over 4 million users, but in reality, it had fewer than 300,000. To support her claim, she allegedly hired a data science professor to generate synthetic data, creating fake user profiles. JPMorgan later discovered the discrepancy when a marketing campaign revealed a high rate of undeliverable emails. In March 2025, Javice was convicted of defrauding JPMorgan.
IMO an data expert could have recognized the fake user profiles through the fact he has seen e.g., how messy real data is, know the demographics of would be users of a service like Frank (wealthy, time stressed families), know tell tale signs of fake data (clusters of data that follow obvious "first principles").
I don't think it's a trivial problem though. It's notoriously easy to twist stats to sell any narrative. And Goodhart's Law all but guarantees that any meaningful metric will get hacked.
By the way, note that this applies to LLMs too. One of the biggest pons asinorums that people get hung up on is the idea that "it just imitates the data, therefore, it can never be better than the average datapoint (or at least, best datapoint); how could it possibly be better?"
Well, we know from a long history that this is not that hard: humans make random errors all the time, and even a linear model with a few parameters or a little flowchart can outperform them. So it shouldn't be surprising or a mystery if some much more complicated AI system could too.
Hmm - the phrasing that perhaps holds more water is that LLMs just imitate the data, which means that novel ideas / code tends to be smashed against the force of averaging when fed into an LLM. E.g. NotebookLM summaries/podcasts are good infotainment but they tend to flatten unconventional paragraphs into platitudes or common wisdom. Obviously this is very subjective and hard to benchmark.
delichon•6h ago
This seems to be a near restatement of the bitter lesson. It's not just that large enough statistical models outperform algorithms built from human expertise, they also outperform human expertise directly.
gopalv•6h ago
When measured statistically.
This is not a takedown of that statement, but the reason we've trouble with this idea is that it works in the lab and not always in real life.
To set up a clean experiment, you have define what success looks like before you conduct the experiment - that the output variable is defined.
Once you know what to measure ahead of time to determine success, then statistical models tend to not be as random as a group of humans in achieving that target.
The variance is bad in an experiment, but variance jitter is needed in an ever changing world even if most variants are worse off.
For example, if you can predict someone's earning potential from their birth zipcode, it is not wrong and often more right than otherwise.
And then if you base student loans and business loan interest rates on the basis of birth zipcodes, the original prediction does become more right.
The experimental version that's a win, but in real life that's a terrible loss to society.
bobsomers•3h ago
> When measured statistically.
THANK YOU. It's mildly infuriating how often people forget that one of the things most human experts are good at is knowing when they are looking at something that is likely in distribution vs. out of distribution (and thus, updating their priors).