frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
94•yi_wang•3h ago•25 comments

Haskell for all: Beyond agentic coding

https://haskellforall.com/2026/02/beyond-agentic-coding
39•RebelPotato•2h ago•8 comments

SectorC: A C Compiler in 512 bytes (2023)

https://xorvoid.com/sectorc.html
241•valyala•11h ago•46 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
154•surprisetalk•10h ago•150 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
186•mellosouls•13h ago•335 comments

Brookhaven Lab's RHIC concludes 25-year run with final collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
68•gnufx•9h ago•56 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
12•duxup•55m ago•1 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
177•AlexeyBrin•16h ago•32 comments

LLMs as the new high level language

https://federicopereiro.com/llm-high/
56•swah•4d ago•98 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
164•vinhnx•14h ago•16 comments

Total Surface Area Required to Fuel the World with Solar (2009)

https://landartgenerator.org/blagi/archives/127
9•robtherobber•4d ago•2 comments

First Proof

https://arxiv.org/abs/2602.05192
129•samasblack•13h ago•76 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
306•jesperordrup•21h ago•96 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
74•momciloo•11h ago•16 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
98•thelok•13h ago•22 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
104•randycupertino•6h ago•225 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
43•chwtutha•1h ago•7 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
37•mbitsnbites•3d ago•4 comments

Show HN: Axiomeer – An open marketplace for AI agents

https://github.com/ujjwalredd/Axiomeer
12•ujjwalreddyks•5d ago•2 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
572•theblazehen•3d ago•206 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
294•1vuio0pswjnm7•17h ago•471 comments

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
135•josephcsible•9h ago•161 comments

I write games in C (yes, C) (2016)

https://jonathanwhiting.com/writing/blog/games_in_c/
184•valyala•11h ago•166 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
229•limoce•4d ago•125 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
900•klaussilveira•1d ago•276 comments

Selection rather than prediction

https://voratiq.com/blog/selection-rather-than-prediction/
30•languid-photic•4d ago•12 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
146•speckx•4d ago•228 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
145•videotopia•4d ago•48 comments

The F Word

http://muratbuffalo.blogspot.com/2026/02/friction.html
113•zdw•3d ago•56 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
303•isitcontent•1d ago•39 comments
Open in hackernews

Evals in 2025: going beyond simple benchmarks to build models people can use

https://github.com/huggingface/evaluation-guidebook/blob/main/yearly_dives/2025-evaluations-for-useful-models.md
80•jxmorris12•4mo ago

Comments

aplassard•4mo ago
I think cost should also be a direct consideration. Model performance varies wildly on benchmarks when given a budget. https://substack.com/@andrewplassard/note/p-173487568?r=2fqo...
elemeno•4mo ago
I’ve been building a tool to help with this - Safety Evals In-a-Box [https://github.com/elemeno/seibox]. It’s a work in progress and not quite ready for public release, but its a multi-model eval runner (primarily for safety oriented evals, but no reason why it can run other types as well!) and includes cost and latency in it reporting.
andy99•4mo ago
These can be useful for labs training models. I don't see them as particularly valuable for building AI systems. Real performance depends on how the system is built, much more so than the underlying LLM.

Evaluating the system you build on relevant inputs is most important. Beyond that it would be nice to see benchmarks that give guidance on how and LLM should be used as a system component, not just which is "better" at something.

axpy906•4mo ago
My thoughts were this. The moment it is public it’s probably in the training data set. The real evals are the ones that you have to make an a problem you’re trying to solve and the data you are using.
gdiamos•4mo ago
How can the community tell if models overfit to these benchmarks?
kovezd•4mo ago
By the composition of evals. Plus secondary metrics like parameter size, and token cost.

Not perfect, but useful.

dustrider•4mo ago
Move beyond benchmarks… proceed to list a bunch of benchmarks.

The problem for me is that it’s not worth running these myself, yeah I may pay attention to which model is better at tool calling. But what matters is how well it does at my use case.

6Az4Mj4D•4mo ago
I see there are lots of courses being sold for Evals in Maven. Some are as costly as USD 3500. Are they worth it? https://maven.com/parlance-labs/evals