frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

https://tetrisbench.com/tetrisbench/
37•ykhli•2h ago

Comments

akomtu•1h ago
It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.
gpm•20m ago
Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.

It will lose so badly there will be no point in the comparison.

Besides you could compare models (and harnesses) directly against eachother.

OGEnthusiast•1h ago
Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks.
bubblesorting•1h ago
Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin.

Some feedback: - Knowing the scoring system is helpful when going 1v1 high score

- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)

- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.

- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second

- re-mappable keys are also appreciated

Nice work, I'm going to keep watching.

qsort•45m ago
The worst thing is that the delayed auto shift is slightly off and it messes my finesse. (I used to play competitive tetris as well, but between getting older -> worse reflexes and vision problems I can't really play anymore. Weirdly, finesse muscle memory is still working.)

I don't think the goal is to make a PvP simulator, it would be too easy to cheese or do weird strategies. It's mostly for LLMs to play.

bubblesorting•31m ago
Hello fellow Tetris nerd with a -sort username :)

On the topic of reflexes decaying (I'm getting there, in my late 30s): Have you played Stackflow? It's a number go up roguelite disguised as an arcade brick stacking game, but the gravity is low enough that it is effectively turn based. More about 'deck' building, less about chaining PCs and C-Spins.

vunderba•26m ago
I actually grew up playing the Spectrum HoloByte version of Tetris for PC, which only lets you rotate in one direction. As a result, I ended up playing NES Tetris for years as a kid before realizing it lets you rotate clockwise / counterclockwise!

https://en.wikipedia.org/wiki/Tetris_(Spectrum_HoloByte)

arendtio•1h ago
There are some concepts clashing here.

I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.

i_cannot_hack•11m ago
Fun fact: Humans were not build for playing Tetris either!
burkaman•1h ago
It's actually 80% against Opus, 66% average against the 5 models it's tested with.
esafak•42m ago
I imagine this is because Tetris is visual and the Gemini models are strong visually.
bogtog•26m ago
I figure OP would try and give the models pure text forms of the game?

.....

l....

l....

l.ttt

l..t.

vunderba•22m ago
Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?

Television is 100 years old today

https://diamondgeezer.blogspot.com/2026/01/tv100.html
301•qassiov•6h ago•92 comments

The Hidden Engineering of Runways

https://practical.engineering/blog/2026/1/20/the-hidden-engineering-of-runways
30•crescit_eundo•6d ago•1 comments

Dithering – Part 2: The Ordered Dithering

https://visualrambling.space/dithering-part-2/
50•ChrisArchitect•1h ago•5 comments

JuiceSSH – Give me my pro features back

https://nproject.io/blog/juicessh-give-me-back-my-pro-features/
126•jandeboevrie•3h ago•52 comments

RIP Low-Code 2014-2025

https://www.zackliscio.com/posts/rip-low-code-2014-2025/
27•zackliscio•5h ago•9 comments

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

https://tetrisbench.com/tetrisbench/
37•ykhli•2h ago•13 comments

Qwen3-Max-Thinking

https://qwen.ai/blog?id=qwen3-max-thinking
365•vinhnx•5h ago•313 comments

Fedora Asahi Remix is now working on Apple M3

https://bsky.app/profile/did:plc:okydh7e54e2nok65kjxdklvd/post/3mdd55paffk2o
305•todsacerdoti•3h ago•104 comments

ChatGPT Containers can now run bash, pip/npm install packages and download files

https://simonwillison.net/2026/Jan/26/chatgpt-containers/
27•simonw•1h ago•10 comments

MapLibre Tile: a modern and efficient vector tile format

https://maplibre.org/news/2026-01-23-mlt-release/
357•todsacerdoti•10h ago•70 comments

When AI 'builds a browser,' check the repo before believing the hype

https://www.theregister.com/2026/01/26/cursor_opinion/
121•CrankyBear•2h ago•48 comments

Find 'Abbey Road when type 'Beatles abbey rd': Fuzzy/Semantic search in Postgres

https://rendiment.io/postgresql/2026/01/21/pgtrgm-pgvector-music.html
54•nethalo•5d ago•13 comments

Not all Chess960 positions are equally complex

https://arxiv.org/abs/2512.14319
35•MaysonL•3d ago•13 comments

Google AI Overviews cite YouTube more than any medical site for health queries

https://www.theguardian.com/technology/2026/jan/24/google-ai-overviews-youtube-medical-citations-...
302•bookofjoe•6h ago•161 comments

Google Books removed all search functions for any books with previews

https://old.reddit.com/r/google/comments/1qn1hk1/google_has_seemingly_entirely_removed_search/
116•adamnemecek•3h ago•41 comments

The mountain that weighed the Earth

https://signoregalilei.com/2026/01/18/the-mountain-that-weighed-the-earth/
61•surprisetalk•4h ago•9 comments

Show HN: Ourguide – OS wide task guidance system that shows you where to click

https://ourguide.ai
5•eshaangulati•2h ago•1 comments

Things I've learned in my 10 years as an engineering manager

https://www.jampa.dev/p/lessons-learned-after-10-years-as
481•jampa•5d ago•125 comments

San Francisco Graffiti

https://walzr.com/sf-graffiti
105•walz•11h ago•110 comments

OpenFlexure Microscope

https://openflexure.org/projects/microscope/
18•o4c•5d ago•2 comments

OSS ChatGPT WebUI – 530 Models, MCP, Tools, Gemini RAG, Image/Audio Gen

https://llmspy.org/docs/v3
96•mythz•6h ago•22 comments

The Holy Grail of Linux Binary Compatibility: Musl and Dlopen

https://github.com/quaadgras/graphics.gd/discussions/242
196•Splizard•13h ago•152 comments

What "The Best" Looks Like

https://www.kuril.in/blog/what-the-best-looks-like/
84•akurilin•5h ago•39 comments

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
116•beigebrucewayne•10h ago•71 comments

Notice of Collective Action Lawsuit Against Workday, INC

https://workdaycase.com
66•mooreds•2h ago•16 comments

The browser is the sandbox

https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/
313•enos_feedler•15h ago•165 comments

Exactitude in Science – Borges (1946) [pdf]

https://kwarc.info/teaching/TDM/Borges.pdf
71•jxmorris12•6h ago•24 comments

There is an AI code review bubble

https://www.greptile.com/blog/ai-code-review-bubble
84•dakshgupta•5h ago•65 comments

Text Is King

https://www.experimental-history.com/p/text-is-king
142•zdw•5d ago•66 comments

France Aiming to Replace Zoom, Google Meet, Microsoft Teams, etc.

https://twitter.com/lellouchenico/status/2015775970330882319
372•bwb•4h ago•318 comments