frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Exploring LLM Evaluation by Using Games

https://lmgame.org
3•Yuxuan_Zhang13•7mo ago

Comments

Yuxuan_Zhang13•7mo ago
Pokémon Red is becoming a go-to benchmark for testing advanced AIs such as Gemini. But is Pokémon Red really a good eval? We study this problem and identify three issues: 1⃣ Navigation tasks are too hard. 2⃣ Combat control is too simple. 3⃣ Raising a strong Pokémon team is slow and expensive as an eval.

We find most of the problems are not fundamental to games themselves, but how they have been used. We believe game-as-an-eval remains a compelling and underutilized evaluation strategy.

We introduce Lmgame Bench to standardize game-as-an-eval. More details and findings in our blogpost: https://lmgame.org/#/blog/pokemon_red

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•33s ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•1m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
1•paulpauper•4m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•4m ago•0 comments

Binance Gives Trump Family's Crypto Firm a Leg Up

https://www.nytimes.com/2026/02/07/business/binance-trump-crypto.html
1•paulpauper•4m ago•0 comments

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

https://old.reddit.com/r/ClaudeCode/comments/1qy5l0n/reverse_engineering_chinese_shitprogram_for/
1•edward•4m ago•0 comments

Indian Culture

https://indianculture.gov.in/
1•saikatsg•7m ago•0 comments

Show HN: Maravel-Framework 10.61 prevents circular dependency

https://marius-ciclistu.medium.com/maravel-framework-10-61-0-prevents-circular-dependency-cdb5d25...
1•marius-ciclistu•7m ago•0 comments

The age of a treacherous, falling dollar

https://www.economist.com/leaders/2026/02/05/the-age-of-a-treacherous-falling-dollar
2•stopbulying•8m ago•0 comments

Ask HN: AI Generated Diagrams

1•voidhorse•10m ago•0 comments

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
2•josephcsible•11m ago•0 comments

Show HN: A delightful Mac app to vibe code beautiful iOS apps

https://milq.ai/hacker-news
3•jdjuwadi•14m ago•1 comments

Show HN: Gemini Station – A local Chrome extension to organize AI chats

https://github.com/rajeshkumarblr/gemini_station
1•rajeshkumar_dev•14m ago•0 comments

Welfare states build financial markets through social policy design

https://theloop.ecpr.eu/its-not-finance-its-your-pensions/
2•kome•17m ago•0 comments

Market orientation and national homicide rates

https://onlinelibrary.wiley.com/doi/10.1111/1745-9125.70023
4•PaulHoule•18m ago•0 comments

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

https://www.cbsnews.com/news/california-death-cap-mushrooms-poisonings-liver-transplants/
1•rolph•18m ago•0 comments

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

https://www.capenews.net/falmouth/obituaries/matthew-a-shulman/article_33af6330-4f52-5f69-a9ff-58...
3•canucker2016•19m ago•1 comments

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

https://github.com/varun369/SuperLocalMemoryV2
1•varunpratap369•20m ago•0 comments

Show HN: Pyrig – One command to set up a production-ready Python project

https://github.com/Winipedia/pyrig
1•Winipedia•23m ago•0 comments

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

https://github.com/AysajanE/moltbook-persistence/blob/main/paper/main.pdf
1•EagleEdge•23m ago•0 comments

C and C++ dependencies: don't dream it, be it

https://nibblestew.blogspot.com/2026/02/c-and-c-dependencies-dont-dream-it-be-it.html
1•ingve•23m ago•0 comments

Show HN: Vbuckets – Infinite virtual S3 buckets

https://github.com/danthegoodman1/vbuckets
1•dangoodmanUT•23m ago•0 comments

Open Molten Claw: Post-Eval as a Service

https://idiallo.com/blog/open-molten-claw
1•watchful_moose•24m ago•0 comments

New York Budget Bill Mandates File Scans for 3D Printers

https://reclaimthenet.org/new-york-3d-printer-law-mandates-firearm-file-blocking
2•bilsbie•25m ago•1 comments

The End of Software as a Business?

https://www.thatwastheweek.com/p/ai-is-growing-up-its-ceos-arent
1•kteare•26m ago•0 comments

Exploring 1,400 reusable skills for AI coding tools

https://ai-devkit.com/skills/
1•hoangnnguyen•27m ago•0 comments

Show HN: A unique twist on Tetris and block puzzle

https://playdropstack.com/
1•lastodyssey•30m ago•1 comments

The logs I never read

https://pydantic.dev/articles/the-logs-i-never-read
1•nojito•31m ago•0 comments

How to use AI with expressive writing without generating AI slop

https://idratherbewriting.com/blog/bakhtin-collapse-ai-expressive-writing
1•cnunciato•32m ago•0 comments

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

https://github.com/choihimchan/linkscope-bpu-uart-analyzer
1•octablock•33m ago•0 comments