frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Was going to share my work

1•hiddenarchitect•1m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•1m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
1•mltvc•5m ago•0 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•6m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•7m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
1•SchwKatze•7m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•8m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
1•guerrilla•9m ago•0 comments

Y Combinator Founder Organizes 'March for Billionaires'

https://mlq.ai/news/ai-startup-founder-organizes-march-for-billionaires-protest-against-californi...
1•hidden80•10m ago•1 comments

Ask HN: Need feedback on the idea I'm working on

1•Yogender78•10m ago•0 comments

OpenClaw Addresses Security Risks

https://thebiggish.com/news/openclaw-s-security-flaws-expose-enterprise-risk-22-of-deployments-un...
1•vedantnair•10m ago•0 comments

Apple finalizes Gemini / Siri deal

https://www.engadget.com/ai/apple-reportedly-plans-to-reveal-its-gemini-powered-siri-in-february-...
1•vedantnair•11m ago•0 comments

Italy Railways Sabotaged

https://www.bbc.co.uk/news/articles/czr4rx04xjpo
3•vedantnair•11m ago•0 comments

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•fanf2•13m ago•0 comments

Nintendo Wii Themed Portfolio

https://akiraux.vercel.app/
1•s4074433•17m ago•1 comments

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•19m ago•0 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

2•amichail•20m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•26m ago•2 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•28m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•28m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•30m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•30m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•31m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•32m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•32m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•35m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
5•codexon•35m ago•2 comments

The Science of the Perfect Second (2023)

https://harpers.org/archive/2023/04/the-science-of-the-perfect-second/
1•NaOH•36m ago•0 comments

Bob Beck (OpenBSD) on why vi should stay vi (2006)

https://marc.info/?l=openbsd-misc&m=115820462402673&w=2
2•birdculture•39m ago•0 comments

Show HN: a glimpse into the future of eye tracking for multi-agent use

https://github.com/dchrty/glimpsh
1•dochrty•40m ago•0 comments
Open in hackernews

Show HN: Watch LLMs play 21,000 hands of Poker

https://pokerbench.adfontes.io/run/Large_Models
36•jazarwil•1mo ago
PokerBench is my attempt at a new LLM benchmark wherein frontier models play Texas Hold'em in an arena setting. It also features a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku, Gemini Pro/Flash, GPT-5.2/5 mini, and Grok 4.1 Fast Reasoning have all been included.

All code -> https://github.com/JoeAzar/pokerbench

Comments

tcpais•1mo ago
Finally, a way to settle the model wars that actually matters: Texas Hold'em. That 3D replay view is sick! ♠♦ I spent way too long watching the replay on Game 2a58900d. It’s wild to see the chain of thought mapped against the betting rounds. It really exposes when a model is hallucinating a strong hand versus actually calculating pot odds. This 'PokerBench' might actually become the standard for measuring agentic risk-taking.
falloutx•1mo ago
yeah the 3d view is amazing
VK-pro•1mo ago
Very very fun. Just glancing at this quickly at lunch but is there any idea of incorporating tool use?
jazarwil•1mo ago
Not at the moment, do you have something in mind?
thorawaytrav•1mo ago
Do you have idea why smaller models are better then large ones?
jazarwil•1mo ago
I've seen some theories tossed around but I don't think I'm qualified to offer an authoritative answer. Gemini 3 Pro specifically seems to be consistently "tighter" and more passive than Flash.
falloutx•1mo ago
Fun, any idea how much would be the cost per game? I am worried 160 isnt a big enough sample size.
jazarwil•1mo ago
It greatly depends on the models. The 6-handed setup with Opus and Pro cost about $30/game. The 4-handed setup with just small models was $6/game. I'd love to run more but I already spent quite a bit as it is.
falloutx•1mo ago
Yeah thats costly, 160 games still gives about 1000+ total decisions and you can see some trends on how they think about the game state.
jazarwil•1mo ago
Oh to be clear, there are ~21k hands here, and far more decisions than that.
Onavo•1mo ago
What about the open source models? I remember from the trading benchmarks Deepseek performed pretty well.
jazarwil•1mo ago
I didn't incorporate any open weights/source models just to limit the number of API providers I had to juggle, but it is just a config change if somebody wants to try a run with them.
alalani1•1mo ago
Do you have any idea why the win rate for GPT-5.2 is higher than Gemini 3 Flash yet the former loses money while the latter earns money? Is it just bet sizing (betting more when it has a good hand) or something else?
jazarwil•1mo ago
There are a few reasons that come to mind, such as winning larger pots on average, and also playing more hands by virtue of not getting knocked out as frequently.
tanvach•1mo ago
People looking into this a little too much, looks to me like random walk. You should try reinitiating the trial (or have multiple running) and see if the ranking is robust.
jazarwil•1mo ago
Wdym exactly? I ran 163 games, are you suggesting more games or something else?
whattheheckheck•4w ago
You need to simulate 50k to 200k hands to get a true winrate
jazarwil•4w ago
I'd love to run more games, just very expensive unfortunately.
alfonsodev•4w ago
Really cool, I’m curious what would be the comparison versus a deterministic bot that uses probability tables.