frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

"The Bitter Lesson" is wrong. Well sort of

https://assaf-pinhasi.medium.com/the-bitter-lesson-is-wrong-sort-of-a3d021864924
27•GavCo•4h ago

Comments

rhaps0dy•2h ago
Sutton was talking about progress in AI overall, whereas Pinhasi (OP) is talking about building one model for production right now. Of course adding some hand-coded knowledge is essential for the latter, but it has not provided much long-term progress. (Even CNNs and group-convolutional NNs, which seek to encode invariants to increase efficiency while still doing almost only learning, seem to be on the way out)
aabhay•2h ago
The main problem with the “Bitter Lesson” is that there’s something even bitter-er behind it — the “Harsh Reality” that while we may scale models on compute and data, that simply broadly inserting tons of data without any sort of curation yields essentially garbage models.

The “Harsh Reality” is that while you may only need data, the current best models and companies behind them spend enormously on gathering high quality labeled data with extensive oversight and curation. This curation is of course being partially automated as well, but ultimately there’s billions or even tens of billions of dollars flowing into gathering, reviewing, and processing subjectively high quality data.

Interestingly, in the time that this paper was published, the harsh reality was not so harsh. For example in things like face detection, (actual) next word prediction, and other purely self supervised and not instruction tuned or “Chat” style models, data was truly all you needed. You didn’t need “good” faces. As long as it was indeed a face, the data itself was enough. Now, it’s not. In order to make these machines useful and not just function approximators, we need extremely large dataset curation industries.

If you learned the bitter lesson, you better accept the harsh reality, too.

bobbiechen•2h ago
So true. I recently wrote about how Merlin achieved magical bird identification not through better algorithms, but better expertise in creating great datasets: https://digitalseams.com/blog/what-birdsong-and-backends-can...

I think "harsh reality" is one way to look at it, but you can also take an optimistic perspective: you really can achieve great, magical experiences by putting in (what could be considered) unreasonable effort.

mhuffman•1h ago
Thanks for the intro to Merlin! I just went outside of my house and used it on 5 different types of birds and it helped me identify 100%. Relevent (possibly out of date) xkcd comic

[0]https://xkcd.com/1425/

Xymist•7m ago
Relevant - and old enough that those five years have been successfully granted!
pphysch•2h ago
Another name for gathering and curating high-quality datasets is "science". One would hope "AI pioneer" USA would embrace this harsh reality and invest massively in basic science education and infrastructure. But we are seeing the opposite, and basically no awareness of this "harsh reality" among the AI hype...
vineyardmike•1h ago
While I agree with you, it’s worth noting that current LLM training uses a significant percentage of all available written data for training. The transition from GPT-2 era models to now (GPT-3+) saw the transition from novel models that can kinda imitate speech to models that can converse, write code, and use tools. It’s only after the readily available data was exhausted, that future gains came curation and large amounts of synthetic data.
aabhay•1h ago
Transfer learning isn’t about “exhausting” all available un-curated data, its simply that the systems are large enough to support it. There’s not that much of a reason to train on all available data. And its not all, there’s still a very significant filtration happening. For example they don’t train on petabytes of log files, that would just be terribly uninteresting data.
Calavar•50m ago
> The transition from GPT-2 era models to now (GPT-3+) saw the transition from novel models that can kinda imitate speech to models that can converse, write code, and use tools.

Which is fundamentally about data. OpenAI invested an absurd amount of money to get the human annotations to drive RHLF.

RHLF itself is a very vanilla reinforcement learning algo + some branding/marketing.

v9v•23m ago
I think your comment has some threads in common with Rodney Brooks' response: https://rodneybrooks.com/a-better-lesson/
macawfish•2h ago
In my opinion the useful part of "the bitter lesson" has nothing to do with throwing more compute and more data at stuff, it has to do with actually using ML instead of trying to manually and cleverly tweak stuff, and with actually leveraging the data you have effectively as a part of that (again using more ML) rather than trying to manually label everything.
rdw•2h ago
The bitter lesson is becoming misunderstood as the world moves on. Unstated yet core to it is that AI researchers were historically attempting to build an understanding of human intelligence. They intended to, piece-by-piece, assemble a human brain and thus be able to explain (and fix) our own biological ones. Much like can be done with physical simulations of knee joints. Of course, you can also use that knowledge to create useful thinking machines, because you understand it well enough to be able to control it. Much like how we have many robotic joints.

So, the bitter lesson is based on a disappointment that you're building intelligence without understanding why it works.

DoctorOetker•37m ago
Right, like discovering Huygens principle, or interference, integrals/sums of all paths in physics.

It is not because a whole lot of physical phenomena can be explained by a couple of foundational principles, that understanding those core patterns automatically endows one with an understanding of how and why materials refract light and a plethora of other specific effects... effects worth understanding individually, even if still explained in terms of those foundational concepts.

Knowing a complicated set of axioms or postulates endows one to derive theorems from them, but those implied theorem proofs are nonetheless non-trivial, and have a value of their own (even though they can be expressed and expanded into a DAG of applications of those "bitterly minimal" axiomatization.

Once enough patterns are correctly modeled by machines, and given enough time to analyze it, people will eventually discover a better how and why things work (beyond the mere abstract, knowledge that latent parameters were fitted against a loss function).

In some sense deeper understanding has already come for the simpler models like word2vec, where many papers have analyzed and explained relations between word vectors. This too lagged behind the creation and utilization of word vector embeddings.

It is not inconceivable that someday someone observes an analogy between say QKV tensors and triples resulting from graph linearization: think subject, object, predicate; (even though I hate those triples, try modeling a ternary relation like 2+5=7 with SOP-triples, its really only meant to capture "sky - is - blue" associations. A better type of triple would be player-role-act triples, one can then model ternary relations, but one needs to reify the relation)

Similarly, without mathematical training, humans display awareness of the concepts of sets, membership, existence, ... without a formal system. The chatbots display this awareness. It's all vague naive set theory. But how are DNN's modeling set theory? Thats a paper someday.

godelski•1h ago
I'm not sure if the Bitter Lesson is wrong, I think we'd need clarification from Sutton (does someone have this?)

But I do know "Scale is All You Need" is wrong. And VERY wrong.

Scaling has done a lot. Without a doubt it is very useful. But this is a drastic oversimplification of all the work that has happened over the last 10-20 years. ConvNext and "ResNets Strike Back" didn't take off for reasons, despite being very impressive. There's been a lot of algorithmic changes, a lot of changes to training procedures, a lot of changes to how we collect data[0], and more.

We have to be very honest, you can't just buy your way to AGI. There's still innovation that needs be done. This is great for anyone still looking to get into the space. The game isn't close to being over. I'd argue that this is great for investors too, as there are a lot of techniques looking to try themselves at scale. Your unicorns are going to be over here. A dark horse isn't a horse that just looks like every other horse. Might be a "safer" bet, but that's like betting on amateur jockies and horses that just train similar to professional ones. They have to do a lot of catch-up, even if the results are fairly certain. At that point you're not investing in the tech, you're investing in the person or the market strategy.

[0] Okay, I'll buy this one as scale if we really want to argue that these changes are about scaling data effectively but we also look at smaller datasets differently because of these lessons.

roadside_picnic•1h ago
"The Bitter Lesson" certainly seems correct when applied to whatever the limit of the current state of the art is, but in practice solving day-to-day ML problems, outside of FAANG-style companies and cutting edge research, data is always much more constrained.

I have, multiple times in my career, solved a problem using simple, intelligible models that have empirically outperformed neural models ultimately because there was not enough data for the neural approach to learn anything. As a community we tend to obsess over architecture and then infrastructure, but data is often the real limiting factor.

When I was early in my career I used to always try to apply very general, data hungry, models to all my problems.. with very mixed success. As I became more skilled I started to be a staunch advocated of only using simple models you could understand, with much more successful results (which is what lead to this revised opinion). But, at this point in my career, I increasingly see that one's approach to modeling should basically be to approach the problem more information theoretically: try to figure out the model with a channel capacity that best matches your information rate.

As a Bayesian, I also think there's a very reasonable explanation for why "The Bitter Lesson" rings true over and over again. In ET Jaynes' writing he often talks about Bayes' Theorem in terms of P(D|H) (i.e. probably of the Data given the Hypothesis, or vice versa), but, especially in the earlier chapters, purposefully adds an X to that equation: P(D|H,X) where X is a stand in for all of our prior information about the world. Typically we think of prior data as being literal data, but Jaynes' points out that our entire world of understand is also part of our prior context.

In this view, models that "leverage human understanding" (i.e. are fully intelligible) are essentially throwing out information at the limit. But to my earlier point, if the data falls quite short of that limit, then those intelligible models are adding information in data constrained scenarios. I think the challenge in practical application is figuring out where the threshold is that you need to adopt a more general approach.

Currently I'm very much in love with Gaussian Processes that, for constrained data environments, offer a powerful combination of both of these methods. You can give the model prior hints at what things should look like in terms of the relative structure of the kernel and it's priors (e.g. there should be some roughly annual seasonal component, and one roughly weekly seasonal component) but otherwise let the data decide.

littlestymaar•9m ago
The Leela Chess Zero vs Stockfish case also offers an interesting perspective on the bitter lesson.

Here's my (maybe a bit lose) recollection of what happened:

Step 1- Stockfish was the typical human-knowledge AI, with tons of actual chess knowledge injected in the process of building an efficient chess engine.

Step 2. Then came Leela Chess Zero, with its Alpha Zero-inspired training, a chess engine trained fully with RL with no prior chess knowledge added. And it has beaten Stockfish. This is a “bitter lesson” moment.

Step 3. The Stockfish devs added a neural network trained with RL to their chess engine, in addition to their existing heuristics. And Stockfish easily took back its crown.

Yes sending more compute at a problem is an efficient way to solve it, but if all you have is compute, you'll pretty certainly lose to somebody who has both compute and knowledge.

Triple Scripts

https://triplescripts.org
1•akkartik•2m ago•0 comments

FFmpeg devs boast of another 100x leap thanks to handwritten assembly code

https://www.tomshardware.com/software/the-biggest-speedup-ive-seen-so-far-ffmpeg-devs-boast-of-another-100x-leap-thanks-to-handwritten-assembly-code
4•harambae•8m ago•0 comments

Show HN: Browse the Web with Superpowers

https://usesuperpowers.app/
2•harshdoesdev•11m ago•0 comments

Machine Bullshit: Characterizing the Emergent Disregard for Truth in LLMs

https://arxiv.org/abs/2507.07484
2•delichon•11m ago•1 comments

Dear Sam Altman

2•upwardbound2•15m ago•0 comments

Following news on social media boosts knowledge, belief accuracy and trust

https://www.nature.com/articles/s41562-025-02205-6
1•PaulHoule•16m ago•0 comments

Cybermania.ws Blocked by Cloudflare

https://cybermania.ws
1•bojanga•16m ago•1 comments

Tough news for our UK users

https://blog.janitorai.com/posts/3/
1•airhangerf15•16m ago•0 comments

ToolShell Mass Exploitation (CVE-2025-53770)

https://research.eye.security/sharepoint-under-siege/
1•thejj100100•17m ago•0 comments

Food contact articles as source of micro- and nanoplastics

https://www.nature.com/articles/s41538-025-00470-3
1•atombender•17m ago•0 comments

Golang's Weird Little Iterators

https://mcyoung.xyz/2024/12/16/rangefuncs/#fnref:tooling
1•fanf2•18m ago•0 comments

Show HN: Duende: Web UX for guiding Gemini as it improves your source code

https://github.com/alefore/duende
1•afc•18m ago•0 comments

Show HN: I built Realer Estate to find apartment renters the best deals in NYC

https://realerestate.org
1•realerestate•18m ago•0 comments

Political Fundraising Email Database

https://thescoop.org/political-fundraising-emails/
1•m-hodges•21m ago•0 comments

EU commissioner shocked by dangers of some goods sold by Shein and Temu

https://www.theguardian.com/business/2025/jul/20/eu-commissioner-shocked-dangerous-goods-sold-shein-temu
2•Michelangelo11•24m ago•0 comments

Nvidia Brings Reasoning Models to Consumers Ranging from 1.5B to 32B Parameters

https://www.techpowerup.com/339089/nvidia-brings-reasoning-models-to-consumers-ranging-from-1-5b-to-32b-parameters
1•hank808•26m ago•1 comments

Prexist – Instantly check if your startup idea already exists!

https://prexist.pages.dev/
2•e33or-assasin•28m ago•1 comments

We Will Not Accidentally Create AGI

https://loukidelis.com/2025/07/06/against-easy-agi.html
3•fromwilliam•30m ago•1 comments

Call Me a Jerk: Persuading AI to Comply with Objectionable Requests

https://gail.wharton.upenn.edu/research-and-insights/call-me-a-jerk-persuading-ai/
2•CharlesW•32m ago•0 comments

Stack frame layout on x86-64

https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64
1•90s_dev•32m ago•0 comments

Mute-by-default is why your video calls suck

https://caseyavila.com/blog/unmute/
1•caseyavila•35m ago•0 comments

HyperTime: A Continuous, Location-Precise Alternative to Time Zones

https://hyper-time.replit.app/
2•sosore•36m ago•1 comments

He Had Dangerous Delusions. ChatGPT Admitted It Made Them Worse

https://www.wsj.com/tech/ai/chatgpt-chatbot-psychology-manic-episodes-57452d14
9•johntfella•36m ago•2 comments

Show HN: I built an AI tool that generates product photos and videos

https://getaicraft.com
1•SaaSified•37m ago•0 comments

LLM-in-a-Box: A Templated, Self-Hostable Framework for Generative AI

https://github.com/complexity-science-hub/llm-in-a-box-template
2•geoHeil•37m ago•2 comments

D-Day veteran "Papa Jake" Larson who became TikTok star dies aged 102

https://news.sky.com/story/d-day-veteran-papa-jake-larson-who-became-tiktok-star-dies-aged-102-13399339
3•austinallegro•39m ago•0 comments

Staying cool without refrigerants: Next-generation Peltier cooling

https://news.samsung.com/global/interview-staying-cool-without-refrigerants-how-samsung-is-pioneering-next-generation-peltier-cooling
15•simonebrunozzi•41m ago•14 comments

Avoiding Management

http://funcall.blogspot.com/2025/06/avoiding-management.html
1•kaeruct•42m ago•0 comments

Welcoming the Next Generation of Programmers

https://lucumr.pocoo.org/2025/7/20/the-next-generation/
1•yomismoaqui•44m ago•0 comments

The Guardian view on social networks: the friendships that can change your life

https://www.theguardian.com/commentisfree/2025/mar/24/the-guardian-view-on-social-networks-the-friendships-that-can-change-your-life
1•wslh•46m ago•0 comments