frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

https://tetrisbench.com/tetrisbench/
46•ykhli•3h ago•21 comments

Show HN: Ourguide – OS wide task guidance system that shows you where to click

https://ourguide.ai
12•eshaangulati•3h ago•4 comments

Show HN: SF Microclimates

https://github.com/solo-founders/sf-microclimates
13•weisser•20h ago•21 comments

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
120•beigebrucewayne•11h ago•75 comments

Show HN: Hybrid Markdown Editing

https://tiagosimoes.github.io/codemirror-markdown-hybrid/
2•eropatori•2h ago•0 comments

Show HN: Managed Postgres with native ClickHouse integration

29•saisrirampur•4d ago•7 comments

Show HN: An interactive map of US lighthouses and navigational aids

https://www.lighthouses.app/
95•idd2•1d ago•20 comments

Show HN: TUI for managing XDG default applications

https://github.com/mitjafelicijan/xdgctl
133•mitjafelicijan•1d ago•44 comments

Show HN: Netfence – Like Envoy for eBPF Filters

https://github.com/danthegoodman1/netfence
55•dangoodmanUT•1d ago•7 comments

Show HN: A small programming language where everything is pass-by-value

https://github.com/Jcparkyn/herd
79•jcparkyn•22h ago•54 comments

Show HN: I got tired of checking 5 dashboards, so I built a simpler one

https://anypanel.io/
4•dasfelix•5h ago•0 comments

Show HN: Fence – Sandbox CLI commands with network/filesystem restrictions

https://github.com/Use-Tusk/fence
73•jy-tan•6d ago•23 comments

Show HN: Bonsplit – Tabs and splits for native macOS apps

https://bonsplit.alasdairmonk.com
241•sgottit•1d ago•33 comments

Show HN: Delegation/Mixins C# Source Generators Library

https://www.nuget.org/packages/NameHillSoftware.TypeAdoption
2•whoisthemachine•8h ago•0 comments

Show HN: NukeCast – If it happened today, where would the fallout go

https://nukecast.com/
17•todd_tracerlab•18h ago•6 comments

Show HN: WhyThere – Compare cities side-by-side to decide where to move

https://whythere.life
12•daversa•18h ago•19 comments

Show HN: LLMNet – The Offline Internet, Search the web without the web

https://github.com/skorotkiewicz/llmnet
29•modinfo•1d ago•6 comments

Show HN: Zero – Serverless ECMWF weather visualization (WebGPU)

https://zero.hypatia.earth/
3•noiv•9h ago•1 comments

Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

https://github.com/divyaprakash0426/autoshorts
70•divyaprakash•1d ago•34 comments

Show HN: C From Scratch – Learn safety-critical C with prove-first methodology

https://github.com/SpeyTech/c-from-scratch
65•william1872•1d ago•10 comments

Show HN: FaceTime-style calls with an AI Companion (Live2D and long-term memory)

https://thebeni.ai/
30•summerlee9611•22h ago•14 comments

Show HN: Alprina – Intent matching for co-founders and investors

https://www.alprina.com
2•Othrya•10h ago•1 comments

Show HN: Coi – A language that compiles to WASM, beats React/Vue

221•io_eric•6d ago•69 comments

Show HN: isometric.nyc – giant isometric pixel art map of NYC

https://cannoneyed.com/isometric-nyc/
1315•cannoneyed•4d ago•240 comments

Show HN: CertRadar – Find every certificate ever issued for your domain

https://certradar.net/
20•ops_mechanic•1d ago•8 comments

Show HN: Sightline – Shodan-style search for real-world infra using OSM Data

https://github.com/ni5arga/sightline
22•ni5arga•1d ago•1 comments

Show HN: Open-source Figma design to code

https://github.com/vibeflowing-inc/vibe_figma
50•alepeak•2d ago•8 comments

Show HN: StormWatch – Weather emergency dashboard with prep checklists

https://jeisey.github.io/stormwatch/
43•lotusxblack•2d ago•11 comments

Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)

https://huggingface.co/collections/Linum-AI/linum-v2-2b-text-to-video
156•schopra909•4d ago•24 comments

Show HN: Nhx – Node.js Hybrid eXecutor (a uvx inspired tool)

https://www.npmjs.com/package/nhx
5•kolodny•19h ago•0 comments
Open in hackernews

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

https://tetrisbench.com/tetrisbench/
46•ykhli•3h ago

Comments

akomtu•2h ago
It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.
gpm•1h ago
Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.

It will lose so badly there will be no point in the comparison.

Besides you could compare models (and harnesses) directly against eachother.

vunderba•29m ago
My back-of-the-envelope guess would be that 99% of LLMs given the task to build a chess engine would probably just end up implementing a flavor of negamax and calling it a day.

https://en.wikipedia.org/wiki/Negamax

OGEnthusiast•2h ago
Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks.
bubblesorting•2h ago
Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin.

Some feedback: - Knowing the scoring system is helpful when going 1v1 high score

- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)

- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.

- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second

- re-mappable keys are also appreciated

Nice work, I'm going to keep watching.

qsort•1h ago
The worst thing is that the delayed auto shift is slightly off and it messes my finesse. (I used to play competitive tetris as well, but between getting older -> worse reflexes and vision problems I can't really play anymore. Weirdly, finesse muscle memory is still working.)

I don't think the goal is to make a PvP simulator, it would be too easy to cheese or do weird strategies. It's mostly for LLMs to play.

bubblesorting•1h ago
Hello fellow Tetris nerd with a -sort username :)

On the topic of reflexes decaying (I'm getting there, in my late 30s): Have you played Stackflow? It's a number go up roguelite disguised as an arcade brick stacking game, but the gravity is low enough that it is effectively turn based. More about 'deck' building, less about chaining PCs and C-Spins.

vunderba•1h ago
I actually grew up playing the Spectrum HoloByte version of Tetris for PC, which only lets you rotate in one direction. As a result, I ended up playing NES Tetris for years as a kid before realizing it lets you rotate clockwise / counterclockwise!

https://en.wikipedia.org/wiki/Tetris_(Spectrum_HoloByte)

arendtio•2h ago
There are some concepts clashing here.

I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.

i_cannot_hack•56m ago
Fun fact: Humans were not build for playing Tetris either!
westurner•20m ago
Task: play tetris

Task: write and optimize a tetris bot

Task: write and safely online optimize a tetris bot with consideration for cost to converge

openai/baselines (7 years ago) was leading on RL and then AlphaZero and Self-Attention Transformer networks.

LLMs are trained with RL, but aren't general purpose game theoretic RL agents?

burkaman•2h ago
It's actually 80% against Opus, 66% average against the 5 models it's tested with.
esafak•1h ago
I imagine this is because Tetris is visual and the Gemini models are strong visually.
bogtog•1h ago
I figure OP would try and give the models pure text forms of the game?

.....

l....

l....

l.ttt

l..t.

vunderba•1h ago
Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?
storystarling•42m ago
It has to be turn-based. Even with Flash's speed, the inference latency would kill you in a real-time loop. They're likely pausing the game state after every tick to wait for the API response before resuming.
ykhli•18m ago
answered this in a comment above! It's not turn or visual layout based since LLMs are not trained that way. The representation is a JSON structure, but LLMs plug in algorithms and keeps optimizing it as the game state evolves
tiahura•44m ago
I'd like to see a nethackbench.
ykhli•35m ago
Thanks for all the questions! More details on how this works:

- Each model starts with an initial optimization function for evaluating Tetris moves.

- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.

- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function

- The model generates updated code, executes it to score all placements, and picks the best move.

- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.

p0w3n3d•13m ago
Guys, I don't know how to tell you but... Tetris can web solved without LLM...