frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Executable Markdown files with Unix pipes

59•jedwhite•10h ago•48 comments

Show HN: macOS menu bar app to track Claude usage in real time

https://github.com/richhickson/claudecodeusage
138•RichHickson•18h ago•46 comments

Show HN: A geofence-based social network app 6 years in development

https://www.localvideoapp.com
65•Adrian-ChatLocl•16h ago•41 comments

Show HN: Commit-based code review instead of PR-based

https://commitguard.ai
8•moshetanzer•7h ago•0 comments

Show HN: DeepDream for Video with Temporal Consistency

https://github.com/jeremicna/deepdream-video-pytorch
61•fruitbarrel•23h ago•24 comments

Show HN: Ever wanted to look at yourself in Braille?

https://github.com/NishantJoshi00/dith
3•cat-whisperer•5h ago•0 comments

Show HN: A Wall Street Terminal for Everyone

https://marketterminal.com/chart
6•adamfontan•5h ago•4 comments

Show HN: I visualized the entire history of Citi Bike in the browser

https://bikemap.nyc/
109•freemanjiang•1d ago•31 comments

Show HN: I built a tool to create AI agents that live in iMessage

https://tryflux.ai/
28•danielsdk•5d ago•12 comments

Show HN: An all-in-one image crop/split/collage tool (no uploads, no watermark)

https://imagesplitter.tools
3•harperhuang•7h ago•6 comments

Show HN: I built a "Do not disturb" Device for my home office

https://apoorv.page/blogs/over-engineered-dnd
93•quacky_batak•5d ago•49 comments

Show HN: Watch LLMs play 21,000 hands of Poker

https://pokerbench.adfontes.io/run/Large_Models
29•jazarwil•23h ago•18 comments

Show HN: Image Scaler – Privacy-focused image resizing with 60-image batches

https://image-scaler.com/
2•nmczzi•8h ago•1 comments

Show HN: SMTP Tunnel – A SOCKS5 proxy disguised as email traffic to bypass DPI

https://github.com/x011/smtp-tunnel-proxy
136•lobito25•2d ago•44 comments

Show HN: Layoffstoday – Open database tracking for 10k Companies

https://layoffstoday.io/
2•doremon0902•9h ago•2 comments

Show HN: Open database of link metadata for large-scale analysis

https://github.com/rumca-js/RSS-Link-Database-2025
15•renegat0x0•5d ago•1 comments

Show HN: Claude Code for Django

https://github.com/kjnez/claude-code-django
4•cui•10h ago•2 comments

Show HN: Tailsnitch – A security auditor for Tailscale

https://github.com/Adversis/tailsnitch
277•thesubtlety•3d ago•28 comments

Show HN: Free and local browser tool for designing gear models for 3D printing

https://gears.dmtrkovalenko.dev
52•neogoose•2d ago•13 comments

Show HN: Fzf-navigator, a terminal file system navigator

https://github.com/benward2301/fzf-navigator
2•benward2301•11h ago•0 comments

Show HN: We built a permissions layer for Notion

https://notionportals.com/
11•PEGHIN•17h ago•6 comments

Show HN: Mantic.sh – A structural code search engine for AI agents

https://github.com/marcoaapfortes/Mantic.sh
78•marcoaapfortes•2d ago•37 comments

Show HN: DoNotNotify – Log and intelligently block notifications on Android

https://donotnotify.com/
343•awaaz•3d ago•165 comments

Show HN: Legit, Open source Git-based Version control for AI agents

5•jannesblobel•12h ago•0 comments

Show HN: VaultSandbox – Test your real MailGun/SES/etc. integration

https://vaultsandbox.com/
58•vaultsandbox•2d ago•13 comments

Show HN: 48-digit prime numbers every git commit

https://textonly.github.io/git-prime/
66•keepamovin•1w ago•54 comments

Show HN: How I generate animated pixel art with AI and Python

https://sarthakmishra.com/blog/building-animated-sprite-hero
16•sarthak_drool•1d ago•2 comments

Show HN: Prism.Tools – Free and privacy-focused developer utilities

https://blgardner.github.io/prism.tools/
371•BLGardner•3d ago•101 comments

Show HN: KeelTest – AI-driven VS Code unit test generator with bug discovery

https://keelcode.dev/keeltest
28•bulba4aur•1d ago•15 comments

Show HN: Server-rendered multiplayer games with Lua (no client code)

https://cleoselene.com/
79•brunovcosta•4d ago•59 comments
Open in hackernews

Show HN: Watch LLMs play 21,000 hands of Poker

https://pokerbench.adfontes.io/run/Large_Models
29•jazarwil•23h ago
PokerBench is my attempt at a new LLM benchmark wherein frontier models play Texas Hold'em in an arena setting. It also features a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku, Gemini Pro/Flash, GPT-5.2/5 mini, and Grok 4.1 Fast Reasoning have all been included.

All code -> https://github.com/JoeAzar/pokerbench

Comments

tcpais•20h ago
Finally, a way to settle the model wars that actually matters: Texas Hold'em. That 3D replay view is sick! ♠♦ I spent way too long watching the replay on Game 2a58900d. It’s wild to see the chain of thought mapped against the betting rounds. It really exposes when a model is hallucinating a strong hand versus actually calculating pot odds. This 'PokerBench' might actually become the standard for measuring agentic risk-taking.
falloutx•18h ago
yeah the 3d view is amazing
VK-pro•18h ago
Very very fun. Just glancing at this quickly at lunch but is there any idea of incorporating tool use?
jazarwil•18h ago
Not at the moment, do you have something in mind?
thorawaytrav•18h ago
Do you have idea why smaller models are better then large ones?
jazarwil•18h ago
I've seen some theories tossed around but I don't think I'm qualified to offer an authoritative answer. Gemini 3 Pro specifically seems to be consistently "tighter" and more passive than Flash.
falloutx•18h ago
Fun, any idea how much would be the cost per game? I am worried 160 isnt a big enough sample size.
jazarwil•18h ago
It greatly depends on the models. The 6-handed setup with Opus and Pro cost about $30/game. The 4-handed setup with just small models was $6/game. I'd love to run more but I already spent quite a bit as it is.
falloutx•17h ago
Yeah thats costly, 160 games still gives about 1000+ total decisions and you can see some trends on how they think about the game state.
jazarwil•17h ago
Oh to be clear, there are ~21k hands here, and far more decisions than that.
Onavo•18h ago
What about the open source models? I remember from the trading benchmarks Deepseek performed pretty well.
jazarwil•18h ago
I didn't incorporate any open weights/source models just to limit the number of API providers I had to juggle, but it is just a config change if somebody wants to try a run with them.
alalani1•16h ago
Do you have any idea why the win rate for GPT-5.2 is higher than Gemini 3 Flash yet the former loses money while the latter earns money? Is it just bet sizing (betting more when it has a good hand) or something else?
jazarwil•16h ago
There are a few reasons that come to mind, such as winning larger pots on average, and also playing more hands by virtue of not getting knocked out as frequently.
tanvach•16h ago
People looking into this a little too much, looks to me like random walk. You should try reinitiating the trial (or have multiple running) and see if the ranking is robust.
jazarwil•15h ago
Wdym exactly? I ran 163 games, are you suggesting more games or something else?
whattheheckheck•8h ago
You need to simulate 50k to 200k hands to get a true winrate
alfonsodev•4h ago
Really cool, I’m curious what would be the comparison versus a deterministic bot that uses probability tables.