There Will Be a Scientific Theory of Deep Learning

https://arxiv.org/abs/2604.21691

58•jamie-simon•3h ago

Comments

adzm•1h ago

I'm only partially through this paper, but it's written in a very engaging and thoughtful manner.

There is so much to digest here but it's fascinating seeing it all put together!

4b11b4•56m ago

wow.. this would be cool. Instead of just.. guessing "shapes"

NitpickLawyer•28m ago

tbf, we've learned (ha!) more from smashing teeny tiny particles and "looking" at what comes out than from say 40 years of string theory. Sometimes doing stuff works, and the theory (hopefully) follows.

RyanShook•31m ago

Here's where I'm missing understanding: for decades the idea of neural networks had existed with minimal attention. Then in 2017 Attention Is All You Need gets released and since then there is an exponential explosion in deep learning. I understand that deep learning is accelerated by GPUs but the concept of a transformer could have been used on much slower hardware much earlier.

BigTTYGothGF•28m ago

The modern neural net revival got kicked off long before 2017.

noosphr•16m ago

Alex net in 2012 is only 5 years earlier.

embedding-shape•26m ago

> I understand that deep learning is accelerated by GPUs but the concept of a transformer could have been used on much slower hardware much earlier

But they don't give the same results at those smaller scales. People imagined, but no one could have put into practice because the hardware wasn't there yet. Simplified, LLMs is basically Transformers with the additional idea of "and a shitton of data to learn from", and for making training feasible with that amount of data, you do need some capable hardware.

teekert•22m ago

If you are in the radiology field it started “exploding” much earlier, with CNNs.

whateverboat•20m ago

The same thing happened with matrices. We had matrices for 400 years, but the field of linear algebra and especially numerical linear algebra exploded only with advent of computers.

In olden days, the correct way to solve a linear system of equations was to use theory of minors. With advent of computers, you suddenly had a huge theory of gaussian elimination, or Krylov spaces and what not.

wslh•20m ago

Don't understimate the massive data you need to make those networks tick. Also, impracticable in slow training algorithms, beyond if they were in GPUs or CPUs.

pash•13m ago

The inflection point was 2012, when AlexNet [0], a deep convolutional neural net, achieved a step-change improvement in the ImageNet classification competition.

After seeing AlexNet’s results, all of the major ML imaging labs switched to deep CNNs, and other approaches almost completely disappeared from SOTA imaging competitions. Over the next few years, deep neural networks took over in other ML domains as well.

The conventional wisdom is that it was the combination of (1) exponentially more compute than in earlier eras with (2) exponentially larger, high-quality datasets (e.g., the curated and hand-labeled ImageNet set) that finally allowed deep neural networks to shine.

0. https://en.wikipedia.org/wiki/AlexNet

cgearhart•12m ago

A much earlier major win for deep learning was AlexNet for image recognition in 2012. It dominated the competition and within a couple years it was effectively the only way to do image tasks. I think it was Jeremy Howard who wrote a paper around 2017 wondering when we’d get a transfer learning approach that worked as well for NLP as convnets did for images. The attention paper that year didn’t immediately dominate. The hardware wasn’t good enough and there wasn’t consensus on belief that scale would solve everything. It took like five more years before GPT3 took off and started this current wave.

I also think you might be discounting exactly how much compute is used to train these monsters. A single 1ghz processor would take about 100,000,000 years to train something in this class. Even with on the order of 25k GPUs training GPT3 size models takes a couple months. The anemic RAM on GPUs a decade ago (I think we had k80 GPUs with 12GB vs 100’s of GBs on H100/H200 today) and it was actually completely impossible to train a large transformer model prior to the early 2020s.

I’m even reminded how much gamers complained in the late 2010s about GPU prices skyrocketing because of ML use.

amelius•28m ago

"A New Kind of Science" ...

UltraSane•8m ago

I think we need the equivalent of general relativity for latent spaces.

Google Plans to Invest Up to $40B in Anthropic

My audio interface has SSH enabled by default

Iliad fragment found in Roman-era mummy

Sabotaging projects by overthinking, scope creep, and structural diffing

The Classic American Diner

Tell HN: Claude 4.7 is ignoring stop hooks

Work with the garage door up

There Will Be a Scientific Theory of Deep Learning

Diatec, known for its mechanical keyboard brand FILCO, has ceased operations

How to be anti-social – a guide to incoherent and isolating social experiences

Show HN: I've built a nice home server OS

SFO Quiet Airport (2025)

I cancelled Claude: Token issues, declining quality, and poor support

OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API

Email could have been X.400 times better

CC-Canary: Detect early signs of regressions in Claude Code

SDL Now Supports DOS

Spinel: Ruby AOT Native Compiler

CSS as a Query Language

I'm done making desktop applications (2009)

Different Language Models Learn Similar Number Representations

MacBook Neo and how the iPad should be

DeepSeek v4

Show HN: Browser Harness – Gives LLM freedom to complete any browser task

Physicists revive 1990s laser concept to propose a next-generation atomic clock

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

Could a Claude Code routine watch my finances?

Google Flow Music

Show HN: HNswered – watches for replies to your Hacker News posts and comments

ML supports existence of unrecognized transient astronomical phenomena

There Will Be a Scientific Theory of Deep Learning

Comments

Google Plans to Invest Up to $40B in Anthropic

My audio interface has SSH enabled by default

Iliad fragment found in Roman-era mummy

Sabotaging projects by overthinking, scope creep, and structural diffing

The Classic American Diner

Tell HN: Claude 4.7 is ignoring stop hooks

Work with the garage door up

There Will Be a Scientific Theory of Deep Learning

Diatec, known for its mechanical keyboard brand FILCO, has ceased operations

How to be anti-social – a guide to incoherent and isolating social experiences

Show HN: I've built a nice home server OS

SFO Quiet Airport (2025)

I cancelled Claude: Token issues, declining quality, and poor support

OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API

Email could have been X.400 times better

CC-Canary: Detect early signs of regressions in Claude Code

SDL Now Supports DOS

Spinel: Ruby AOT Native Compiler

CSS as a Query Language

I'm done making desktop applications (2009)

Different Language Models Learn Similar Number Representations

MacBook Neo and how the iPad should be

DeepSeek v4

Show HN: Browser Harness – Gives LLM freedom to complete any browser task

Physicists revive 1990s laser concept to propose a next-generation atomic clock

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

Could a Claude Code routine watch my finances?

Google Flow Music

Show HN: HNswered – watches for replies to your Hacker News posts and comments

ML supports existence of unrecognized transient astronomical phenomena