frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris

https://kywch.github.io/blog/2025/12/curriculum-learning-2048-tetris/
57•a1k0n•1h ago

Comments

omneity•1h ago
Related, I heard about curriculum learning for LLMs quite often but I couldn’t find a library to order training data by an arbitrary measure like difficulty, so I made one[0].

What you get is an iterator over the dataset that samples based on how far you are in the training.

0: https://github.com/omarkamali/curriculus

hiddencost•1h ago
Those are not hard tasks ...
bob1029•51m ago
> To learn, agents must experience high-value states, which are hard (or impossible) for untrained agents to reach. The endgame-only envs were the final piece to crack 65k. The endgame requires tens of thousands of correct moves where a single mistake ends the game, but to practice, agents must first get there.

This seems really similar to the motivations around masked language modeling. By providing increasingly-masked targets over time, a smooth difficulty curve can be established. Randomly masking X% of the tokens/bytes is trivial to implement. MLM can take a small corpus and turn it into an astronomically large one.

larrydag•48m ago
perhaps I'm missing something. Why not start the learning at a later state?
bob1029•38m ago
That's effectively what you get in either case. With MLM, on the first learning iteration you might only mask exactly one token per sequence. This is equivalent to starting learning at a later state. The direction of the curriculum flows toward more and more of these being masked over time, which is equivalent to starting from earlier and earlier states. Eventually, you mask 100% of the sequence and you are starting from zero.
LatencyKills•36m ago
If the goal is to achieve end-to-end learning that would be cheating.

If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.

pedrozieg•37m ago
What I like about this writeup is that it quietly demolishes the idea that you need DeepMind-scale resources to get “superhuman” RL. The headline result is less about 2048 and Tetris and more about treating the data pipeline as the main product: careful observation design, reward shaping, and then a curriculum that drops the agent straight into high-value endgame states so it ever sees them in the first place. Once your env runs at millions of steps per second on a single 4090, the bottleneck is human iteration on those choices, not FLOPs.

The happy Tetris bug is also a neat example of how “bad” inputs can act like curriculum or data augmentation. Corrupted observations forced the policy to be robust to chaos early, which then paid off when the game actually got hard. That feels very similar to tricks in other domains where we deliberately randomize or mask parts of the input. It makes me wonder how many surprisingly strong RL systems in the wild are really powered by accidental curricula that nobody has fully noticed or formalized yet.

someoneontenet•17m ago
Curriculum learning helped me out a lot in this project too https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-an...
drubs•15m ago
Star the puffer https://github.com/PufferAI/PufferLib
jsuarez5341•14m ago
All open source - don't forget to feed the puffer a star! https://github.com/pufferai/pufferlib

Navigating AI: Critical Thinking in the Age of LLMs

https://mcuoneclipse.com/2025/12/31/navigating-ai-critical-thinking-in-the-age-of-llms/
1•chrsw•20s ago•0 comments

Astronomers combine JWST and Chandra X-ray Telescope to image colliding galaxies

https://www.scientificamerican.com/article/nasa-telescopes-capture-colliding-spiral-galaxies-in-s...
1•ck2•31s ago•0 comments

New York's incoming mayor bans Raspberry Pi at inauguration

https://www.theregister.com/2025/12/31/zohran_mamdani_raspberry_pi_ban/
3•Tomte•6m ago•0 comments

Are you a deep-tech founder working on an interesting project?

1•udit_50•6m ago•0 comments

Fire19 – A New DNS

https://jaguarint.vercel.app/
1•telui•7m ago•1 comments

Web Browsers have stopped blocking pop-ups

https://www.smokingonabike.com/2025/12/31/web-browsers-have-stopped-blocking-pop-ups/
2•coldpie•9m ago•0 comments

Here we go again: Retiring coal plant forced to stay open by Trump Admin

https://arstechnica.com/science/2025/12/trump-admin-orders-another-coal-plant-to-stay-open/
1•duxup•9m ago•0 comments

Stewart Cheifet, creator of The Computer Chronicles, dead at 87

https://obits.goldsteinsfuneral.com/stewart-cheifet
2•spankibalt•9m ago•1 comments

Poland calls for EU action for AI-generated TikTok videos calling for "Polexit"

https://notesfrompoland.com/2025/12/31/poland-calls-for-eu-action-against-ai-generated-tiktok-vid...
2•consumer451•10m ago•0 comments

2026

1•xalu•11m ago•0 comments

Nelm (Helm alternative)

https://github.com/werf/nelm
1•gtirloni•12m ago•0 comments

Authors Guild Raises Concerns About Kindle's New "Ask This Book" AI Feature

https://authorsguild.org/news/statement-on-amazon-kindle-ask-this-book-ai-feature/
1•ilamont•13m ago•0 comments

Moonshot AI, a Chinese AI startup behind Kimi, closed a $500M Series C

https://twitter.com/poezhao0605/status/2006286951222038562
2•Alifatisk•15m ago•1 comments

Running out of places to move the goalposts to

https://nickdrozd.github.io/2025/12/31/goalposts.html
1•nickdrozd•16m ago•0 comments

Formally speaking, "Transpiler" is a useless word

https://people.csail.mit.edu/rachit/post/transpiler-formal/
1•birdculture•16m ago•0 comments

Agents Done Right: A Framework Vision for 2026

https://blog.bryanl.dev/posts/agent-framework-vision/
1•camedee•19m ago•1 comments

Climate Solutions: Why I'm More Optimistic for 2026

https://www.gravityloss.com/2025/12/climate-solutions-why-im-more-optimistic-for-2026/
1•Gravityloss•19m ago•0 comments

What Do Consumers Want in Smart Glasses?

https://spectrum.ieee.org/two-visions-for-smart-glasses
1•ohjeez•21m ago•0 comments

Head Up, Feet Moving: Complexity isn't a spectator sport

https://medium.com/topology-insight/head-up-feet-moving-b56e60867190
1•asplake•23m ago•0 comments

Tesla Q4 2025 Delivery Consensus

https://ir.tesla.com/press-release/delivery-consensus-fourth-quarter-2025
2•yibg•23m ago•0 comments

Historical Popularity Index

https://pantheon.world/explore/rankings?show=people&years=-3501,2025
1•gygodard•23m ago•0 comments

Second-Hand Bookshops in Britain: 2025 Report

http://wormwoodiana.blogspot.com/2025/12/second-hand-bookshops-in-britain-2025.html
1•fogus•23m ago•0 comments

Looking Back at Python Pescara 2025

https://www.paulox.net/2025/12/31/looking-back-at-python-pescara-2025/
1•pauloxnet•27m ago•0 comments

WORST REGARDS: A collective fuck-you letter from humanity to 2025

https://worstregards.com/
3•tom8opot8o•27m ago•1 comments

Show HN: Overlay – Invisible AI Assistant

https://overlayai.app
1•ckkampy•28m ago•1 comments

Re-attach detached Terminal Tabs

https://askubuntu.com/questions/1242144/re-attach-detached-terminal-tabs
1•sipofwater•28m ago•0 comments

Thoughts on AI

https://vega.rd.no/writing/ai
2•vegardlarsen•32m ago•1 comments

A foundation for building tools on the AT Protocol using Unison

https://notes.kaushikc.org/3m6kc5nudgc2x?auth_completed=true
1•PaulHoule•36m ago•0 comments

The Economics of Duke University

https://dontaylor13.substack.com/p/duke-university
12•paulpauper•39m ago•1 comments

When A.I. Took My Job, I Bought a Chain Saw

https://www.nytimes.com/2025/12/28/opinion/artificial-intelligence-jobs.html
1•colesantiago•40m ago•0 comments