We Stopped Using the Mathematics That Works

https://gfrm.in/posts/why-decision-theory-lost/index.html

69•slygent•5h ago

Comments

nacozarina•3h ago

a voice of reason cries out in the howling maelstrom

jeffrallen•2h ago

Tldr: the author is annoyed at the Bitter Lesson.

Join the crowd dude. It's still true, no matter how inconvenient it is.

andai•1h ago

This means money beats math?

NateEag•1h ago

It means trying to figure out how to build an intelligence always loses to mindlessly brute-forcing problems with more compute:

https://en.wikipedia.org/wiki/Bitter_lesson

rcxdude•57m ago

It's not mindless brute-forcing, the details of the architecture, data, and training strategy still matter a lot (if you gave a modern datacenter to an AI researcher from the 60s they wouldn't get an LLM very quickly). The bitter lesson is that you should focus on adjusting your techniques so that they can take advantage of processing power to learn more about your problem themselves, instead of trying to hand-craft half the solution yourself to 'help' the part that's learning.

notlenin•55m ago

unless you don't have unlimited compute, at which point you need other ideas

https://arielche.net/bitter-lesson

bpt3•32m ago

Then train your model elsewhere and size it as appropriate for the runtime environment.

If that really isn't an option, then yes ML/AI isn't for you in this case.

andai•26m ago

I found this article a little weak, but there is an interesting parallel.

The 10,000 hours thing is encouraging because the amount of effort you put in as far more important than your natural ability.

... Until you get to the point where everyone is already working as hard as humanly possible, at which point natural ability becomes the sorting function again.

danaris•22m ago

Well, it means that thus far trying to build an intelligence has lost out to brute forcing it with more compute.

There is nothing particular that suggests this is infinitely scalable.

10xDev•18m ago

They have researchers working for insane salaries just so they don't go to another frontier lab to share their ideas. If you think it is just "mindless bruteforce" you don't understand anything. The idea is that the most effective methods are ones that scale but those ideas are also then limited by the compute available.

chbint•51m ago

I suspect his diagnostic is pretty accurate, though. The bitter lesson came up when deep learning was already mainstream. The text discusses how that happened, and it can be the case that convenience beats accuracy. Accuracy is an epistemic value, but current AI is largely driven by market values. If accuracy manages to get along, great, but other than that, market-laden convenience reigns. Commercially, it is often more convenient to even change the world in order to make it easier for our models (consider how we're willing to create special places without pedestrians or human-driven vehicles for autonomous vehicles as a "solution" for their shortcomings).

throwaway132448•2h ago

I found the article confusing. Its premise seems to be that alternative methods to deep learning “work”, and only faded out due to other factors, yet keeps referencing scenarios in which they demonstrably failed to “work”. Such as:

> In 2012, Alex Krizhevsky submitted a deep convolutional neural network to the ImageNet Large Scale Visual Recognition Challenge. It won by 9.8 percentage points over the nearest competitor.

Maybe there’s another definition of “works” that’s implicit and I’m not getting, but I’m struggling to picture a definition relevant to the history-of-deep-learning narrative they are trying to explain.

LoganDark•1h ago

I think what they're saying is the methods used today are faster but have a lower ceiling, and that that's why they quickly took over but can only go so far.

jerf•55m ago

That would be a hypothesis, not a fact.

I'm not closed to it. You can check my comment history for frequent references to next-generation AIs that aren't architected like LLMs. But they're going to have to produce an AI of some sort that is better than the current ones, not hypothesize that it may be possible. We've got about 50 years of hypothesis about how wonderful such techniques may be and, by the new standards of 2026, precious few demonstrations of it.

Quoting from the article:

"Within five years, deep learning had consumed machine learning almost entirely. Not because the methods it displaced had stopped working, but because the money, the talent, and the prestige had moved elsewhere."

That one jumped right out at me because there's a slight-of-hand there. A more correct quote would be "Not because the methods it displaced had stopped working as well as they ever have, ..." Without that phrase, the implication that other techniques were doing just as well as our transformer-based LLMs is slipped in there, but it's manifestly false when brought up to conscious examination. Of course they haven't, unless they're in the form of some probably-beyond-top-secret AI in some government lab somewhere. Decades have been poured into them and they have not produced high-quality AIs.

Anyone who wants to produce that next-gen leap had probably better have some clear eyes about what the competition is.

LoganDark•44m ago

> That would be a hypothesis, not a fact.

I agree.

deckar01•1h ago

It seems to be an indirect attempt to promote their GitHub project. They had Claude make them an “agent” using Bayesian modeling and Thompson sampling and now they are convinced they have heralded a new era of AI.

canjobear•24m ago

It reads to me like Claude wrote the article too.

PaulHoule•1h ago

I think the worst thing about the golden age of symbolic AI was that there was never a systematic approach to reasoning about uncertainty.

The MYCIN system was rather good at medical diagnostics and like other systems of the time had an ad-hoc procedure to deal with uncertainty which is essential in medical diagnosis.

The problem is that is not enough to say "predicate A has a 80% of being true" but rather if you have predicate A and B you have to consider the probability of all four of (AB, (not A) B, A (not B), (not A) (not B)) and if it is N predicates you have to consider joint probabilities over 2^N possible situations and that's a lot.

For any particular situation the values are correlated and you don't really need to consider all those contingencies but a general-purpose reasoning system with logic has to be able to handle the worst case. It seems that deep learning systems take shortcuts that work much of the time but may well hit the wall on how accurate they can be because of that.

[1] https://en.wikipedia.org/wiki/Mycin

zozbot234•40m ago

Symbolic AI ala Mycin and other expert systems didn't do anything that a modern database query engine can't do with far greater performance. The bottleneck is coming up with the set of rules that the system is to follow.

ontouchstart•1h ago

We are at the age of alchemy, wait for the age of chemistry and physics. New mathematical foundations are yet to be found.

furyofantares•1h ago

LLM-garbage article, ironically.

andai•1h ago

What makes you say that? Which LLM does it sound like to you?

xg15•1h ago

The paragraph "The ImageNet Moment" stuck out to me. It's so stuffed of the current AI-isms that I have a hard time seeing this as chance.

canjobear•20m ago

> Not because the methods it displaced had stopped working, but because the money, the talent, and the prestige had moved elsewhere. The researchers who understood decision theory, Bayesian inference, and operations research didn’t lose their arguments. They lost their audience.

naasking•4m ago

So, what's the problem with it?

kingstnap•1h ago

Just because you can analyse it doesn't mean that it is better. Deep learning theory is unbelievably garbage compared to the empirical results.

In particular, please show me a worked example of a decision tree meta learning. Because its trivial to show this for DNNs.

andai•1h ago

> I’ve spent the last few months building agents that maintain actual beliefs and update them from evidence — first a Bayesian learner that teaches itself which foods are safe, then an evolutionary system that discovers its own cognitive architecture. Looking at what the industry calls “agents” has been clarifying.

> What would it take for an AI system to genuinely deserve the word “agent”?

> At minimum, an agent has beliefs — not hunches, not vibes, but quantifiable representations of what it thinks is true and how certain it is. An agent has goals — not a prompt that says “be helpful,” but an objective function it’s trying to maximise. And an agent decides — not by asking a language model what to do next, but by evaluating its options against its goals in light of its beliefs.

> By this standard, the systems we’re calling “AI agents” are none of these things.

vessenes•59m ago

Heading down the links of this blog ends up at https://github.com/gfrmin/credence, which claims to be an agentic harness that keeps track of usefulness of tools separately and beats LangChain at a benchmark.

LangChain… Now that’s a name I haven’t heard in a long, long time..

Anyway, that’s a cool idea. But also his blog posts include phrases like “That’s not intelligence, it’s just <x> with vibes.” Urg. Slop of the worst sort.

But, like I said, I like the idea of keeping a running tally of what tool uses are useful in which circumstances, and consulting the oracle for recommended uses. I feel slightly icky digging into the code though; there’s a type of (usually brilliant) engineer that assumes when they see success that it’s a) wrong, and b) because everybody’s stupid, and sadly, some of that tone comes through the claude sonnet 4.0 writing used to put this blog together.

naasking•4m ago

> But also his blog posts include phrases like “That’s not intelligence, it’s just <x> with vibes.” Urg. Slop of the worst sort.

You know people actually write like that. The LLMs learned it from somewhere.

bArray•52m ago

> A Bayesian decision-theoretic agent needs explicit utility functions, cost models, prior distributions, and a formal description of the action space. Every assumption must be stated. Every trade-off must be quantified. This is intellectually honest and practically gruelling. Getting the utility function wrong doesn’t just give you a bad answer; it gives you a confidently optimal answer to the wrong question.

I was talking somebody through Bayesian updates the other day. The problem is that if you mess up any part of it, in any way, then the result can be completely garbage. Meanwhile, if you throw some neural network at the problem, it can much better handle noise.

> Deep learning’s convenience advantage is the same phenomenon at larger scale. Why specify a prior when you can train on a million examples? Why model uncertainty when you can just make the network bigger? The answers to these questions are good answers, but they require you to care about things the market doesn’t always reward.

The answer seems simple to me - sometimes getting an answer is not enough, and you need to understand how an answer was reached. In the age of hallucinations, one can appreciate approaches where hallucinations are impossible.

psychoslave•21m ago

>This is the VHS-versus-Betamax dynamic, or TCP/IP versus the OSI model, or QWERTY versus every ergonomic alternative proposed since 1936.

QWERTY has many variants, and every single geopolitical institution have their own odious anti-ergonomic layout, it seems. So this case is somehow different to my mind. As a French native, I use Bépo.

pron•18m ago

> This is the VHS-versus-Betamax dynamic, or TCP/IP versus the OSI model, or QWERTY versus every ergonomic alternative proposed since 1936. The technically superior solution loses to the solution that’s easier to deploy, easier to hire for, and good enough for the use cases that pay the bills.

Without commenting on the merit of the claims, the problem with this statement is that in many cases there is no universal "technical superiority", only tradeoffs. E.g. Betamax was technically superior in picture quality while VHS was technically superior in recording time, and more people preferred the latter technical superiority. When people say that the techinically superior approach lost in favour of convenience, what really happened is that their own personal technical preferences were in the minority. More people preferred an alternative that wasn't just "good enough" but technically better, only on a different axis.

Even if we suppose the author is right that his preferred approach yields better outputs, he acknowledges that constructing good inputs is harder. That's not technical superiority; it's a different tradeoff.

Fontcrafter: Turn Your Handwriting into a Real Font

Ireland shuts last coal plant, becomes 15th coal-free country in Europe (2025)

US Court of Appeals: TOS may be updated by email, use can imply consent [pdf]

Reverse-engineering the UniFi inform protocol

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Unlocking Python's Cores:Energy Implications of Removing the GIL

Agent Safehouse – macOS-native sandboxing for local agents

FreeBSD Capsicum vs. Linux Seccomp Process Sandboxing

Segagaga Has Been Translated into English

Microscopes can see video on a laserdisc

The Window Chrome of Our Discontent

PCB devboard the size of a USB-C plug

Ask HN: What Are You Working On? (March 2026)

FFmpeg at Meta: Media Processing at Scale

Every single board computer I tested in 2025

FrameBook

My Homelab Setup

Linux Internals: How /proc/self/mem writes to unwritable memory (2021)

Nvidia backs AI data center startup Nscale as it hits $14.6B valuation

I love email (2023)

Artificial-life: A simple (300 lines of code) reproduction of Computational Life

My “grand vision” for Rust

We should revisit literate programming in the agent era

Python's Lazy Imports: Why It Took Three Years and Two Attempts

Why can't you tune your guitar? (2019)

I made a programming language with M&Ms

Living human brain cells play DOOM on a CL1 [video]

How the Sriracha guys screwed over their supplier

We Stopped Using the Mathematics That Works

Ask HN: How to be alone?

Fontcrafter: Turn Your Handwriting into a Real Font

Ireland shuts last coal plant, becomes 15th coal-free country in Europe (2025)

US Court of Appeals: TOS may be updated by email, use can imply consent [pdf]

Reverse-engineering the UniFi inform protocol

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Unlocking Python's Cores:Energy Implications of Removing the GIL

Agent Safehouse – macOS-native sandboxing for local agents

FreeBSD Capsicum vs. Linux Seccomp Process Sandboxing

Segagaga Has Been Translated into English

Microscopes can see video on a laserdisc

The Window Chrome of Our Discontent

PCB devboard the size of a USB-C plug

Ask HN: What Are You Working On? (March 2026)

FFmpeg at Meta: Media Processing at Scale

Every single board computer I tested in 2025

FrameBook

My Homelab Setup

Linux Internals: How /proc/self/mem writes to unwritable memory (2021)

Nvidia backs AI data center startup Nscale as it hits $14.6B valuation

I love email (2023)

Artificial-life: A simple (300 lines of code) reproduction of Computational Life

My “grand vision” for Rust

We should revisit literate programming in the agent era

Python's Lazy Imports: Why It Took Three Years and Two Attempts

Why can't you tune your guitar? (2019)

I made a programming language with M&Ms

Living human brain cells play DOOM on a CL1 [video]

How the Sriracha guys screwed over their supplier

We Stopped Using the Mathematics That Works

Ask HN: How to be alone?

We Stopped Using the Mathematics That Works

Comments