frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
159•isitcontent•7h ago•17 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
210•eljojo•10h ago•135 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
262•vecti•9h ago•125 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
53•phreda4•7h ago•9 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
78•antves•1d ago•59 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
41•nwparker•1d ago•11 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
14•NathanFlurry•15h ago•5 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
147•bsgeraci•1d ago•61 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
3•AGDNoob•3h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•6h ago•1 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
13•toborrm9•12h ago•5 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•4h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•5h ago•0 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•11h ago•11 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•6h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
171•vkazanov•1d ago•48 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•6h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
10•KevinChasse•12h ago•9 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
8•sawyerjhood•13h ago•0 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•10h ago•0 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•10h ago•0 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•16h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•12h ago•0 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: Hibana – An Affine MPST Runtime for Rust

https://hibanaworks.dev
3•o8vm•14h ago•0 comments

Show HN: Beam – Terminal Organizer for macOS

https://getbeam.dev/
2•faalbane•14h ago•2 comments

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

https://github.com/bethington/ghidra-mcp
294•xerzes•2d ago•66 comments
Open in hackernews

Show HN: Watch LLMs play 21,000 hands of Poker

https://pokerbench.adfontes.io/run/Large_Models
36•jazarwil•4w ago
PokerBench is my attempt at a new LLM benchmark wherein frontier models play Texas Hold'em in an arena setting. It also features a simulator to view individual games and observe how the different models reason about poker strategy. Opus/Haiku, Gemini Pro/Flash, GPT-5.2/5 mini, and Grok 4.1 Fast Reasoning have all been included.

All code -> https://github.com/JoeAzar/pokerbench

Comments

tcpais•4w ago
Finally, a way to settle the model wars that actually matters: Texas Hold'em. That 3D replay view is sick! ♠♦ I spent way too long watching the replay on Game 2a58900d. It’s wild to see the chain of thought mapped against the betting rounds. It really exposes when a model is hallucinating a strong hand versus actually calculating pot odds. This 'PokerBench' might actually become the standard for measuring agentic risk-taking.
falloutx•4w ago
yeah the 3d view is amazing
VK-pro•4w ago
Very very fun. Just glancing at this quickly at lunch but is there any idea of incorporating tool use?
jazarwil•4w ago
Not at the moment, do you have something in mind?
thorawaytrav•4w ago
Do you have idea why smaller models are better then large ones?
jazarwil•4w ago
I've seen some theories tossed around but I don't think I'm qualified to offer an authoritative answer. Gemini 3 Pro specifically seems to be consistently "tighter" and more passive than Flash.
falloutx•4w ago
Fun, any idea how much would be the cost per game? I am worried 160 isnt a big enough sample size.
jazarwil•4w ago
It greatly depends on the models. The 6-handed setup with Opus and Pro cost about $30/game. The 4-handed setup with just small models was $6/game. I'd love to run more but I already spent quite a bit as it is.
falloutx•4w ago
Yeah thats costly, 160 games still gives about 1000+ total decisions and you can see some trends on how they think about the game state.
jazarwil•4w ago
Oh to be clear, there are ~21k hands here, and far more decisions than that.
Onavo•4w ago
What about the open source models? I remember from the trading benchmarks Deepseek performed pretty well.
jazarwil•4w ago
I didn't incorporate any open weights/source models just to limit the number of API providers I had to juggle, but it is just a config change if somebody wants to try a run with them.
alalani1•4w ago
Do you have any idea why the win rate for GPT-5.2 is higher than Gemini 3 Flash yet the former loses money while the latter earns money? Is it just bet sizing (betting more when it has a good hand) or something else?
jazarwil•4w ago
There are a few reasons that come to mind, such as winning larger pots on average, and also playing more hands by virtue of not getting knocked out as frequently.
tanvach•4w ago
People looking into this a little too much, looks to me like random walk. You should try reinitiating the trial (or have multiple running) and see if the ranking is robust.
jazarwil•4w ago
Wdym exactly? I ran 163 games, are you suggesting more games or something else?
whattheheckheck•4w ago
You need to simulate 50k to 200k hands to get a true winrate
jazarwil•4w ago
I'd love to run more games, just very expensive unfortunately.
alfonsodev•4w ago
Really cool, I’m curious what would be the comparison versus a deterministic bot that uses probability tables.