frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
529•klaussilveira•9h ago•146 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
859•xnx•15h ago•518 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
72•matheusalmeida•1d ago•13 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
180•isitcontent•9h ago•21 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
182•dmpetrov•10h ago•79 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
294•vecti•11h ago•130 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
69•quibono•4d ago•12 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
343•aktau•16h ago•168 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
338•ostacke•15h ago•90 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
434•todsacerdoti•17h ago•226 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
237•eljojo•12h ago•147 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
13•romes•4d ago•2 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
373•lstoll•16h ago•252 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
6•videotopia•3d ago•0 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
41•kmm•4d ago•3 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
220•i5heu•12h ago•162 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
91•SerCe•5h ago•75 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
62•phreda4•9h ago•11 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
162•limoce•3d ago•82 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
38•gfortaine•7h ago•10 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
127•vmatsiiako•14h ago•53 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
18•gmays•4h ago•2 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
261•surprisetalk•3d ago•35 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1029•cdrnsf•19h ago•428 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
55•rescrv•17h ago•18 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
83•antves•1d ago•60 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
18•denysonique•6h ago•2 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
5•neogoose•2h ago•1 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
109•ray__•6h ago•54 comments
Open in hackernews

Evals in 2025: going beyond simple benchmarks to build models people can use

https://github.com/huggingface/evaluation-guidebook/blob/main/yearly_dives/2025-evaluations-for-useful-models.md
80•jxmorris12•4mo ago

Comments

aplassard•4mo ago
I think cost should also be a direct consideration. Model performance varies wildly on benchmarks when given a budget. https://substack.com/@andrewplassard/note/p-173487568?r=2fqo...
elemeno•4mo ago
I’ve been building a tool to help with this - Safety Evals In-a-Box [https://github.com/elemeno/seibox]. It’s a work in progress and not quite ready for public release, but its a multi-model eval runner (primarily for safety oriented evals, but no reason why it can run other types as well!) and includes cost and latency in it reporting.
andy99•4mo ago
These can be useful for labs training models. I don't see them as particularly valuable for building AI systems. Real performance depends on how the system is built, much more so than the underlying LLM.

Evaluating the system you build on relevant inputs is most important. Beyond that it would be nice to see benchmarks that give guidance on how and LLM should be used as a system component, not just which is "better" at something.

axpy906•4mo ago
My thoughts were this. The moment it is public it’s probably in the training data set. The real evals are the ones that you have to make an a problem you’re trying to solve and the data you are using.
gdiamos•4mo ago
How can the community tell if models overfit to these benchmarks?
kovezd•4mo ago
By the composition of evals. Plus secondary metrics like parameter size, and token cost.

Not perfect, but useful.

dustrider•4mo ago
Move beyond benchmarks… proceed to list a bunch of benchmarks.

The problem for me is that it’s not worth running these myself, yeah I may pay attention to which model is better at tool calling. But what matters is how well it does at my use case.

6Az4Mj4D•4mo ago
I see there are lots of courses being sold for Evals in Maven. Some are as costly as USD 3500. Are they worth it? https://maven.com/parlance-labs/evals