frontpage.

Hi HN,

I built TetrisBench, a benchmark that tests LLMs on real-time code generation and reasoning through Tetris.

*How it works:*

Each model starts with an initial optimization function for evaluating Tetris moves.

As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.

The model continuously refines its optimizer: - Board getting too high? Prioritize clearing lines. - Hole forming? Adjust penalties. - Safe stack? Build for a Tetris.

The model generates updated code, executes it to score all placements, and picks the best move.

*Current standings:*

| Model | Win Rate | |-------|----------| | Opus 4.5 | 68% | | GPT-5.2 | 63% | | Grok 4.1 | 22% |

(181 games so far, running more)

*Try it yourself:*

You can also play against any model directly. See if you can beat opus at Tetris—only 1 human has so far.

*All trajectories are logged.* Every game saves board states, the code each model generated, and placement decisions. Happy to share the dataset

Show HN: Convert your articles into videos in one click

Red Queen's Race

The Anthropic Hive Mind

A Horrible Conclusion

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine