frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•2m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
1•rcarmo•3m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•3m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•3m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
2•Brajeshwar•3m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
1•Brajeshwar•4m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•5m ago•0 comments

What Really Killed Flash Player: A Six-Year Campaign of Deliberate Platform Work

https://medium.com/@aglaforge/what-really-killed-flash-player-a-six-year-campaign-of-deliberate-p...
1•jbegley•5m ago•0 comments

Ask HN: Anyone orchestrating multiple AI coding agents in parallel?

1•buildingwdavid•6m ago•0 comments

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•12m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•13m ago•2 comments

Take a trip to Japan's Dododo Land, the most irritating place on Earth

https://soranews24.com/2026/02/07/take-a-trip-to-japans-dododo-land-the-most-irritating-place-on-...
2•zdw•13m ago•0 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
14•bookofjoe•13m ago•5 comments

BookTalk: A Reading Companion That Captures Your Voice

https://github.com/bramses/BookTalk
1•_bramses•14m ago•0 comments

Is AI "good" yet? – tracking HN's sentiment on AI coding

https://www.is-ai-good-yet.com/#home
1•ilyaizen•15m ago•1 comments

Show HN: Amdb – Tree-sitter based memory for AI agents (Rust)

https://github.com/BETAER-08/amdb
1•try_betaer•16m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
2•anhxuan•16m ago•0 comments

Show HN: Seedance 2.0 Release

https://seedancy2.com/
2•funnycoding•17m ago•0 comments

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
1•thelok•17m ago•0 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases
1•edwinarbus•17m ago•0 comments

VCF West: Whirlwind Software Restoration – Guy Fedorkow [video]

https://www.youtube.com/watch?v=YLoXodz1N9A
1•stmw•18m ago•1 comments

Show HN: COGext – A minimalist, open-source system monitor for Chrome (<550KB)

https://github.com/tchoa91/cog-ext
1•tchoa91•19m ago•1 comments

FOSDEM 26 – My Hallway Track Takeaways

https://sluongng.substack.com/p/fosdem-26-my-hallway-track-takeaways
1•birdculture•19m ago•0 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•23m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•23m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•24m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•24m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•26m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•27m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•27m ago•0 comments
Open in hackernews

Paper AI Tigers

https://www.gleech.org/paper
3•mefengl•2mo ago

Comments

emschwartz•2mo ago
Very interesting! I especially appreciated the test of running models against the same benchmark from the following year and the point about the per-token discount being negated by models needing more tokens to get to the answer.

Generalization:

> Maybe Chinese models generalise to unseen tasks less well. (For instance, when tested on fresh data, 01’s Yi model fell 8pp (25%) on GSM - the biggest drop amongst all models.)

> We can get a dirty estimate of this by the “shrinkage gap”: look at how a model performs on next year’s iteration of some task, compared to this year’s. If it finished training in 2024, then it can’t have trained on the version released in 2025, so we get to see what they’re like on at least somewhat novel tasks. We’ll use two versions of the same benchmark to keep the difficulty roughly on par. Let’s try AIME:

> Almost all models get worse on this new benchmark, despite 2025 being the same difficulty as 2024 (for humans). But as I expected, Western models drop less: they lost 10% of their performance on the new data, while Chinese models dropped 21%. p = 0.09.

> Averaging across crappy models for the sake of a cultural generalisation doesn’t make sense. Luckily, rerunning the analysis with just the top models gives roughly the same result (9% gap instead of 11%).

Cost-effectiveness:

> Distinguish intelligence (max performance), intelligence per token (efficiency), and intelligence per dollar (cost-effectiveness).

> The 5x discounts I quoted are per-token, not per-success. If you had to use 6x more tokens to get the same quality, then there would be no real discount. And indeed DeepSeek and Qwen (see also anecdote here about Kimi, uncontested) are very hungry.