frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Codex App

https://openai.com/index/introducing-the-codex-app/
226•meetpateltech•2h ago•128 comments

Ask HN: Who is hiring? (February 2026)

184•whoishiring•4h ago•218 comments

Stop incentivizing surface parking lots

https://progressandpoverty.substack.com/p/stop-incentivizing-surface-parking
54•surprisetalk•1h ago•16 comments

Todd C. Miller – Sudo maintainer for over 30 years

https://www.millert.dev/
151•wodniok•2h ago•79 comments

Hacking Moltbook

https://www.wiz.io/blog/exposed-moltbook-database-reveals-millions-of-api-keys
88•galnagli•4h ago•62 comments

Nano-vLLM: How a vLLM-style inference engine works

https://neutree.ai/blog/nano-vllm-part-1
181•yz-yu•7h ago•23 comments

4x faster network file sync with rclone (vs rsync) (2025)

https://www.jeffgeerling.com/blog/2025/4x-faster-network-file-sync-rclone-vs-rsync/
185•indigodaddy•3d ago•87 comments

Ask HN: Who wants to be hired? (February 2026)

59•whoishiring•4h ago•126 comments

Advancing AI Benchmarking with Game Arena

https://blog.google/innovation-and-ai/models-and-research/google-deepmind/kaggle-game-arena-updates/
42•salkahfi•2h ago•27 comments

Geologists may have solved mystery of Green River's 'uphill' route

https://phys.org/news/2026-01-geologists-mystery-green-river-uphill.html
109•defrost•6h ago•26 comments

The largest number representable in 64 bits

https://tromp.github.io/blog/2026/01/28/largest-number-revised
14•tromp•1h ago•8 comments

EPA Advances Farmers' Right to Repair

https://www.epa.gov/newsreleases/epa-advances-farmers-right-repair-their-own-equipment-saving-rep...
85•bilsbie•2h ago•29 comments

Being sane in insane places (1973) [pdf]

https://www.weber.edu/wsuimages/psychology/FacultySites/Horvat/OnBeingSaneInInsanePlaces.PDF
38•dbgrman•2h ago•21 comments

Linux From Scratch ends SysVinit support

https://lists.linuxfromscratch.org/sympa/arc/lfs-announce/2026-02/msg00000.html
76•cf100clunk•2h ago•82 comments

My fast zero-allocation webserver using OxCaml

https://anil.recoil.org/notes/oxcaml-httpz
116•noelwelsh•9h ago•42 comments

Show HN: Adboost – A browser extension that adds ads to every webpage

https://github.com/surprisetalk/AdBoost
54•surprisetalk•6h ago•78 comments

IsoCoaster – Theme Park Builder

https://iso-coaster.com/
59•duck•3d ago•12 comments

Show HN: PolliticalScience – Anonymous daily polls with 24-hour windows

https://polliticalscience.vote/
8•ps2026•2h ago•3 comments

Waymo seeking about $16B near $110B valuation

https://www.bloomberg.com/news/articles/2026-01-31/waymo-seeking-about-16-billion-near-110-billio...
127•JumpCrisscross•4h ago•168 comments

UK government launches fuel forecourt price API

https://www.gov.uk/guidance/access-the-latest-fuel-prices-and-forecourt-data-via-api-or-email
47•Technolithic•7h ago•65 comments

Why software stocks are getting pummelled

https://www.economist.com/business/2026/02/01/why-software-stocks-are-getting-pummelled
35•petethomas•15h ago•42 comments

Claude Code is suddenly everywhere inside Microsoft

https://www.theverge.com/tech/865689/microsoft-claude-code-anthropic-partnership-notepad
272•Anon84•8h ago•391 comments

Tomo: A statically typed, imperative language that cross-compiles to C [video]

https://www.youtube.com/watch?v=-vGE0I8RPcc
13•evakhoury•4d ago•8 comments

Treasures found on HS2 route

https://www.bbc.com/news/articles/c93v21q5xdvo
115•breve•21h ago•64 comments

Valanza – my Unix way for weight tracking and anlysis

https://github.com/paolomarrone/valanza
20•lallero317•4d ago•5 comments

Serverless backend hosting without idle costs – open-source

https://github.com/aryankashyap0/shorlabs
15•abyssglass01•5d ago•0 comments

My iPhone 16 Pro Max produces garbage output when running MLX LLMs

https://journal.rafaelcosta.me/my-thousand-dollar-iphone-cant-do-math/
406•rafaelcosta•23h ago•188 comments

Hypergrowth isn’t always easy

https://tailscale.com/blog/hypergrowth-isnt-always-easy
103•usrme•2d ago•42 comments

Kernighan on Programming

102•chrisjj•4h ago•22 comments

Solvingn the Santa Claus concurrency puzzle with a model checker

https://wyounas.github.io/puzzles/concurrency/2026/01/10/how-to-help-santa-claus-concurrently/
14•simplegeek•3d ago•2 comments
Open in hackernews

Advancing AI Benchmarking with Game Arena

https://blog.google/innovation-and-ai/models-and-research/google-deepmind/kaggle-game-arena-updates/
42•salkahfi•2h ago

Comments

eamag•2h ago
Curious why they decided to curate poker hands instead of a normal poker
qsort•1h ago
Poker has very high variance, you'd need several hundred thousand hands to confidently say who's better. Also, you probably want to precompute the GTO-optimal play for benchmarking purposes.
eamag•1h ago
But now because the hands are so strong we don't see any folds
johndhi•1h ago
But can't computers play several hundred thousand poker hands easily in a couple of hours ?
tiahura•1h ago
How about nethack?
chaostheory•1h ago
Anecdotal data point, but recently I’ve found Gemini to perform better than ChatGPT when it came to intent analysis.
ofirpress•1h ago
This is a good way to benchmark models. We [the SWE-bench team] took the meta-version of this and implemented it as a new benchmark called CodeClash -

We have agents implement agents that play games against each other- so Claude isn't playing against GPT, but an agent written by Claude plays poker against an agent written by GPT, and this really tough task leads to very interesting findings on AI for coding.

https://codeclash.ai/

riku_iki•1h ago
Leaderboard looks very outdated..
Instantnoodl•57m ago
Cool to see core war! I feel it's mostly forgotten by now. My dad is still playing it to this day though and even attends tournaments
63stack•42m ago
>this really tough task leads to very interesting findings on AI for coding

Are you going to share those with the class or?

cv5005•1h ago
My personal threshold for AGI is when an AI can 'sit down' - it doesn't need to have robotic hands, but it needs to only use visual and audio inputs to make its moves - and complete a modern RPG or FPS single player game that it hasn't pre-trained on (it can train on older games).
bob1029•42m ago
https://arxiv.org/abs/2507.03793
10xDev•1h ago
If AI can program, why does it matter if it can play Chess using CoT when it can program a Chess Engine instead? This applies to other domains as well.
Davidzheng•1h ago
They should be allowed to! In fact i think better benchmark would be to invent new games and test the models ability to allocate compute to minmax/alphazero new games in compute constraints
simianwords•43m ago
Its the same reason we are asked to write exams without using calculators but the real world does have them.

How you work without calculators is a proxy for real world competency.

10xDev•37m ago
Funny, you used probably the most useless form of benchmarking used on people as an example of measuring "competency" in the real world.
simianwords•36m ago
are you in favour of children using calculators in exams?
10xDev•33m ago
This isn't my child. It is a program. I need it to get task X done and I couldn't care less how it is done whether it is strictly through CoT or with tools. There is no such thing as cheating in real work and no reason to handicap it. Just test the limits of what it can do with whatever means possible.

Trying to solve everything with CoT alone seems futile.

simianwords•17m ago
you are not understanding. its a proxy for how well it does other things.
doctorpangloss•20m ago
A lot of the insights of math come from knowing how to do things efficiently. That’s why the tests are timed. I don’t know, this is pretty basic pedagogy that you are choosing to grief.
simianwords•1h ago
Gemini tops all benchmarks but when it comes to real world usage it is genuinely unusable
goniszewski•50m ago
It’s not that bad. I’ve been using 3 Pro for some time now and I’m quite happy with how it works. Best paired with Opus and Codex, like most models, but it’s solid as a full-stack buddy.
bennyfreshness•53m ago
Wow. I'm generally in the AI maximalist camp. But adding Werewolf feels dangerous to me. Anyone who's played knows lying, deceipt, and manipulation is often key to winning. We really want models climbing this benchmark?
bilekas•39m ago
Good question, but who's going to stop them?

AI already has a very creative imagination for role play so this just adds extra to their arsenal.

PunchyHamster•1m ago
confidently and charismatically lying to clueless users has been one of fundaments of AI adoption
ZeroCool2u•27m ago
I'd really like to see them add a complex open world fully physicalized game like Star Citizen (assuming the game itself is stable) with a single primary goal like accumulating currency as a measure of general autonomy and a proxy for how the model might behave in the real world given access to a bipedal robot.
PunchyHamster•2m ago
making models target benchmark about being good at lying and getting away with it (werewolf) is certainly an interesting choice