frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•36s ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•4m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
1•mooreds•4m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•5m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•5m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•5m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•5m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•6m ago•2 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•7m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
1•nick007•8m ago•0 comments

What the News media thinks about your Indian stock investments

https://stocktrends.numerical.works/
1•mindaslab•9m ago•0 comments

Running Lua on a tiny console from 2001

https://ivie.codes/page/pokemon-mini-lua
1•Charmunk•9m ago•0 comments

Google and Microsoft Paying Creators $500K+ to Promote AI Tools

https://www.cnbc.com/2026/02/06/google-microsoft-pay-creators-500000-and-more-to-promote-ai.html
2•belter•11m ago•0 comments

New filtration technology could be game-changer in removal of PFAS

https://www.theguardian.com/environment/2026/jan/23/pfas-forever-chemicals-filtration
1•PaulHoule•12m ago•0 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
2•momciloo•13m ago•0 comments

Kinda Surprised by Seadance2's Moderation

https://seedanceai.me/
1•ri-vai•13m ago•2 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
2•valyala•13m ago•0 comments

Django scales. Stop blaming the framework (part 1 of 3)

https://medium.com/@tk512/django-scales-stop-blaming-the-framework-part-1-of-3-a2b5b0ff811f
1•sgt•14m ago•0 comments

Malwarebytes Is Now in ChatGPT

https://www.malwarebytes.com/blog/product/2026/02/scam-checking-just-got-easier-malwarebytes-is-n...
1•m-hodges•14m ago•0 comments

Thoughts on the job market in the age of LLMs

https://www.interconnects.ai/p/thoughts-on-the-hiring-market-in
1•gmays•14m ago•0 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
2•Keyframe•17m ago•0 comments

AIII: A public benchmark for AI narrative and political independence

https://github.com/GRMPZQUIDOS/AIII
1•GRMPZ23•17m ago•0 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
2•valyala•19m ago•0 comments

The API Is a Dead End; Machines Need a Labor Economy

1•bot_uid_life•20m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•Jyaif•21m ago•0 comments

New wave of GLP-1 drugs is coming–and they're stronger than Wegovy and Zepbound

https://www.scientificamerican.com/article/new-glp-1-weight-loss-drugs-are-coming-and-theyre-stro...
5•randycupertino•22m ago•0 comments

Convert tempo (BPM) to millisecond durations for musical note subdivisions

https://brylie.music/apps/bpm-calculator/
1•brylie•24m ago•0 comments

Show HN: Tasty A.F. - Use AI to Create Printable Recipe Cards

https://tastyaf.recipes/about
2•adammfrank•25m ago•0 comments

The Contagious Taste of Cancer

https://www.historytoday.com/archive/history-matters/contagious-taste-cancer
2•Thevet•27m ago•0 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
2•alephnerd•27m ago•1 comments
Open in hackernews

Show HN: Watch LLMs play 21,000 hands of Poker

https://pokerbench.adfontes.io/run/Large_Models
36•jazarwil•1mo ago
PokerBench is my attempt at a new LLM benchmark wherein frontier models play Texas Hold'em in an arena setting. It also features a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku, Gemini Pro/Flash, GPT-5.2/5 mini, and Grok 4.1 Fast Reasoning have all been included.

All code -> https://github.com/JoeAzar/pokerbench

Comments

tcpais•1mo ago
Finally, a way to settle the model wars that actually matters: Texas Hold'em. That 3D replay view is sick! ♠♦ I spent way too long watching the replay on Game 2a58900d. It’s wild to see the chain of thought mapped against the betting rounds. It really exposes when a model is hallucinating a strong hand versus actually calculating pot odds. This 'PokerBench' might actually become the standard for measuring agentic risk-taking.
falloutx•4w ago
yeah the 3d view is amazing
VK-pro•4w ago
Very very fun. Just glancing at this quickly at lunch but is there any idea of incorporating tool use?
jazarwil•4w ago
Not at the moment, do you have something in mind?
thorawaytrav•4w ago
Do you have idea why smaller models are better then large ones?
jazarwil•4w ago
I've seen some theories tossed around but I don't think I'm qualified to offer an authoritative answer. Gemini 3 Pro specifically seems to be consistently "tighter" and more passive than Flash.
falloutx•4w ago
Fun, any idea how much would be the cost per game? I am worried 160 isnt a big enough sample size.
jazarwil•4w ago
It greatly depends on the models. The 6-handed setup with Opus and Pro cost about $30/game. The 4-handed setup with just small models was $6/game. I'd love to run more but I already spent quite a bit as it is.
falloutx•4w ago
Yeah thats costly, 160 games still gives about 1000+ total decisions and you can see some trends on how they think about the game state.
jazarwil•4w ago
Oh to be clear, there are ~21k hands here, and far more decisions than that.
Onavo•4w ago
What about the open source models? I remember from the trading benchmarks Deepseek performed pretty well.
jazarwil•4w ago
I didn't incorporate any open weights/source models just to limit the number of API providers I had to juggle, but it is just a config change if somebody wants to try a run with them.
alalani1•4w ago
Do you have any idea why the win rate for GPT-5.2 is higher than Gemini 3 Flash yet the former loses money while the latter earns money? Is it just bet sizing (betting more when it has a good hand) or something else?
jazarwil•4w ago
There are a few reasons that come to mind, such as winning larger pots on average, and also playing more hands by virtue of not getting knocked out as frequently.
tanvach•4w ago
People looking into this a little too much, looks to me like random walk. You should try reinitiating the trial (or have multiple running) and see if the ranking is robust.
jazarwil•4w ago
Wdym exactly? I ran 163 games, are you suggesting more games or something else?
whattheheckheck•4w ago
You need to simulate 50k to 200k hands to get a true winrate
jazarwil•4w ago
I'd love to run more games, just very expensive unfortunately.
alfonsodev•4w ago
Really cool, I’m curious what would be the comparison versus a deterministic bot that uses probability tables.