frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris

https://kywch.github.io/blog/2025/12/curriculum-learning-2048-tetris/
66•a1k0n•2h ago

Comments

omneity•2h ago
Related, I heard about curriculum learning for LLMs quite often but I couldn’t find a library to order training data by an arbitrary measure like difficulty, so I made one[0].

What you get is an iterator over the dataset that samples based on how far you are in the training.

0: https://github.com/omarkamali/curriculus

hiddencost•1h ago
Those are not hard tasks ...
bob1029•1h ago
> To learn, agents must experience high-value states, which are hard (or impossible) for untrained agents to reach. The endgame-only envs were the final piece to crack 65k. The endgame requires tens of thousands of correct moves where a single mistake ends the game, but to practice, agents must first get there.

This seems really similar to the motivations around masked language modeling. By providing increasingly-masked targets over time, a smooth difficulty curve can be established. Randomly masking X% of the tokens/bytes is trivial to implement. MLM can take a small corpus and turn it into an astronomically large one.

larrydag•1h ago
perhaps I'm missing something. Why not start the learning at a later state?
bob1029•1h ago
That's effectively what you get in either case. With MLM, on the first learning iteration you might only mask exactly one token per sequence. This is equivalent to starting learning at a later state. The direction of the curriculum flows toward more and more of these being masked over time, which is equivalent to starting from earlier and earlier states. Eventually, you mask 100% of the sequence and you are starting from zero.
LatencyKills•1h ago
If the goal is to achieve end-to-end learning that would be cheating.

If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.

algo_trader•24m ago
This is less about masked modelling and more about reverse-curriculum.

e.g. DeepCubeA 2019 (!) paper to solve Rubik cube.

Start with solved state and teach the network successively harder states. This is so "obvious" and "unhelpful in real domains" that perhaps they havent heard of this paper.

pedrozieg•1h ago
What I like about this writeup is that it quietly demolishes the idea that you need DeepMind-scale resources to get “superhuman” RL. The headline result is less about 2048 and Tetris and more about treating the data pipeline as the main product: careful observation design, reward shaping, and then a curriculum that drops the agent straight into high-value endgame states so it ever sees them in the first place. Once your env runs at millions of steps per second on a single 4090, the bottleneck is human iteration on those choices, not FLOPs.

The happy Tetris bug is also a neat example of how “bad” inputs can act like curriculum or data augmentation. Corrupted observations forced the policy to be robust to chaos early, which then paid off when the game actually got hard. That feels very similar to tricks in other domains where we deliberately randomize or mask parts of the input. It makes me wonder how many surprisingly strong RL systems in the wild are really powered by accidental curricula that nobody has fully noticed or formalized yet.

someoneontenet•48m ago
Curriculum learning helped me out a lot in this project too https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-an...
drubs•45m ago
Star the puffer https://github.com/PufferAI/PufferLib
kgwxd•6m ago
Great, add "curriculum" to the list of words that will spark my interest in human learning, only for it to be about garbage AI. I want HN with a hard rule against AI posts.

The compiler is your best friend

https://blog.daniel-beskin.com/2025-12-22-the-compiler-is-your-best-friend-stop-lying-to-it
54•based2•2h ago•21 comments

Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris

https://kywch.github.io/blog/2025/12/curriculum-learning-2048-tetris/
68•a1k0n•2h ago•11 comments

Akin's Laws of Spacecraft Design [pdf]

https://www.ece.uvic.ca/~elec399/201409/Akin%27s%20Laws%20of%20Spacecraft%20Design.pdf
203•tosh•8h ago•53 comments

2026: The Year of Java in the Terminal?

https://xam.dk/blog/lets-make-2026-the-year-of-java-in-the-terminal/
56•based2•2h ago•79 comments

When square pixels aren't square

https://alexwlchan.net/2025/square-pixels/
53•PaulHoule•4h ago•24 comments

Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

https://exopriors.com/scry
215•Xyra•10h ago•61 comments

Stardew Valley developer made a $125k donation to the FOSS C# framework MonoGame

https://monogame.net/blog/2025-12-30-385-new-sponsor-announcement/
368•haunter•2h ago•145 comments

Back to the future: the story of Squeak, a practical Smalltalk written in itself [pdf] (1997)

http://www.vpri.org/pdf/tr1997001_backto.pdf
60•fanf2•6d ago•7 comments

SigNoz (YC W21, open source observability platform) Is Hiring across roles

https://signoz.io/careers
1•pranay01•1h ago

Efficient method to capture carbon dioxide from the atmosphere

https://www.helsinki.fi/en/news/innovations/efficient-method-capture-carbon-dioxide-atmosphere-de...
199•lrasinen•4h ago•181 comments

Doom in Django: testing the limits of LiveView at 600.000 divs/segundo

https://en.andros.dev/blog/7b1b607b/doom-in-django-testing-the-limits-of-liveview-at-600000-divss...
127•andros•3d ago•41 comments

Microtonal Spiral Piano

https://shih1.github.io/spiral/
6•phoenix_ashes•5d ago•2 comments

Tixl: Open-source realtime motion graphics

https://github.com/tixl3d/tixl
143•nateb2022•5d ago•23 comments

A faster heart for F-Droid

https://f-droid.org/2025/12/30/a-faster-heart-for-f-droid.html
497•kasabali•23h ago•201 comments

The Economics of Duke University

https://dontaylor13.substack.com/p/duke-university
24•paulpauper•1h ago•3 comments

Winnie-the-Pooh brings 100 years of fame to forest

https://www.bbc.com/news/articles/c4g9dzj1xj3o
42•1659447091•6d ago•9 comments

RoboCop – Breaking the Law. H0ffman Cracks RoboCop Arcade from DataEast

https://hoffman.home.blog/2025/12/26/robocop-breaking-the-law/
45•birdculture•4d ago•3 comments

France targets Australia-style social media ban for children next year

https://www.theguardian.com/world/2025/dec/31/france-plans-social-media-ban-for-under-15s-from-se...
64•belter•3h ago•70 comments

Animated AI

https://animatedai.github.io/
274•frozenseven•5d ago•23 comments

Nvidia GB10's Memory Subsystem, from the CPU Side

https://chipsandcheese.com/p/inside-nvidia-gb10s-memory-subsystem
22•ingve•5h ago•3 comments

Fifteen Most Famous Transcendental Numbers

https://sprott.physics.wisc.edu/pickover/trans.html
94•vismit2000•5h ago•50 comments

Show HN: LoongArch Userspace Emulator

https://github.com/libriscv/libloong
33•fwsgonzo•1w ago•11 comments

Show HN: 22 GB of Hacker News in SQLite

https://hackerbook.dosaygo.com
654•keepamovin•1d ago•197 comments

Who Invented the Transistor?

https://people.idsia.ch/~juergen/who-invented-the-transistor.html
31•todsacerdoti•6h ago•19 comments

FediMeteo: A €4 FreeBSD VPS Became a Global Weather Service

https://it-notes.dragas.net/2025/02/26/fedimeteo-how-a-tiny-freebsd-vps-became-a-global-weather-s...
363•birdculture•22h ago•86 comments

Zero-Code Instrumentation of an Envoy TCP Proxy Using eBPF

https://sergiocipriano.com/beyla-envoy.html
44•sergiocipriano•3h ago•9 comments

'Three norths' alignment about to end

https://www.spatialsource.com.au/three-norths-alignment-about-to-end/
64•altilunium•1w ago•27 comments

Readings in Database Systems (5th Edition) (2015)

http://www.redbook.io/
130•teleforce•16h ago•13 comments

A Vulnerability in Libsodium

https://00f.net/2025/12/30/libsodium-vulnerability/
323•raggi•1d ago•47 comments

A super fast website using Cloudflare workers

https://crazyfast.website
86•kilroy123•3d ago•64 comments