frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Want to be a better learner? Start by noticing how you think

https://bigthink.com/thinking/learn-with-metacognition/
1•warrenm•25s ago•0 comments

Google and Epic reach proposed settlement to open Android app store access

https://www.pocketgamer.biz/google-and-epic-reach-proposed-settlement-to-open-android-app-store-a...
1•fidotron•46s ago•0 comments

DOJ gives green light to Google's $32B Wiz deal

https://www.neowin.net/news/doj-gives-green-light-to-googles-32-billion-wiz-deal/
1•bundie•1m ago•0 comments

Shiroa: MdBook for Typst

https://github.com/Myriad-Dreamin/shiroa
1•adamnemecek•1m ago•0 comments

Bankrank.io is accepting signups for their Q1 2026 Alpha

1•calderarrow•1m ago•0 comments

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

https://book.sv
1•costco•2m ago•0 comments

Show HN: I built a content tool for LinkedIn Personal Branding

https://www.linkedfast.com/
1•AdamAkhlaq•2m ago•0 comments

Napier Deltic

https://npht.org/about-napier/products/marine-engines/deltic/
1•thunderbong•3m ago•0 comments

A Brutal Look at Balanced Parentheses, Computing Machines, and Pushdown Automata

https://raganwald.com/2019/02/14/i-love-programming-and-programmers.html
1•warrenm•3m ago•0 comments

The Running Novelist

https://www.newyorker.com/magazine/2008/06/09/haruki-murakami-the-running-novelist
1•keiferski•4m ago•0 comments

Malware Now Uses AI During Execution to Mutate and Collect Data, Google Warns

https://www.securityweek.com/malware-now-uses-ai-during-execution-to-mutate-and-collect-data-goog...
1•Bender•4m ago•0 comments

Show HN: SixSevenStudio – open-source Video Editor For Sora

https://github.com/palmier-io/sixsevenstudio
1•hchtin•4m ago•0 comments

DeepInverse Joins the PyTorch Ecosystem

https://pytorch.org/blog/deepinverse-joins-pytorch-ecosystem/
1•jeremyscanvic•5m ago•0 comments

There is an animal level of fear in investors

1•moosedman•6m ago•0 comments

James Algebra

https://iconicmath.com/algebra/james/
3•erhuve•6m ago•0 comments

Best-Of-∞: Asymptotic Performance of Test-Time Compute (Infinite Compute Budget)

https://arxiv.org/abs/2509.21091
1•SweetSoftPillow•6m ago•0 comments

AI System Achieves 600x Speed in the Search for Signals from Space

https://www.seti.org/news/revolutionary-ai-system-achieves-600x-speed-breakthrough-in-the-search-...
2•geox•7m ago•0 comments

IIHS 2025 Top Safety Pick Vehicles

https://www.iihs.org/ratings/top-safety-picks
1•throw0101d•7m ago•0 comments

The Secret Lives of China's Art Factory Workers

https://www.instapainting.com/blog/the-secret-lives-of-china-art-factory-workers
2•chrischen•8m ago•0 comments

Genetic Associations with Educational Fields

https://www.nature.com/articles/s41588-025-02391-z
2•bookofjoe•8m ago•0 comments

Pre-training under infinite compute

https://arxiv.org/abs/2509.14786
1•SweetSoftPillow•8m ago•0 comments

Building Agents for Ecommerce

https://kumo.ai/company/news/building-the-future-of-agents-for-ecommerce/
1•gk1•8m ago•0 comments

Apple Podcasts Is Adding AI-Generated Chapters for Podcasts Without Chapters

https://podcasters.apple.com/support/5545-enhance-episodes-with-chapters-links-more
2•haunter•9m ago•0 comments

When Life gives you Jenga

https://jameelur.com/blog/when-life-gives-you-jenga
1•WanderingSoul•10m ago•0 comments

Metformin (drug for blood sugar) might make you smarter

https://www.self-experiments.org/metformin-as-cognitive-enhancer/
1•chernavsky•12m ago•1 comments

What to know about the deadly UPS plane crash in Kentucky

https://www.pbs.org/newshour/nation/what-to-know-about-the-deadly-ups-plane-crash-in-kentucky
1•toomuchtodo•12m ago•1 comments

GEMPIX2 – Nano Banana 2 AI Image Generator

https://gempix2.photo
1•wantering•13m ago•0 comments

GuidePoint Is a Shady Business

https://diblante.com/user/magnetseven/post/guidepoint-is-a-shady-business
2•magnetseven•15m ago•0 comments

I Built Cool Screenshot App for Mac

https://www.screensnap.pro
1•m_0_r_g_a_n_•17m ago•0 comments

Europe's Self-Driving Cars Aren't Even at the Starting Line

https://www.bloomberg.com/news/newsletters/2025-11-05/europe-s-self-driving-cars-aren-t-even-at-t...
1•TMWNN•17m ago•2 comments
Open in hackernews

Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets

https://codeclash.ai/
3•lieret•1h ago
Current evals test LMs on tasks: "fix this bug," "write a test"

But we code to achieve goals: maximize revenue, cut costs, win users

Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals.

Because real software dev isn’t about following instructions. It’s about achieving outcomes.

Here's how it works:

Two LMs enter a tournament. Each maintains its own codebase.

Every round:

1. Edit Phase: LMs modify their codebases however they like 2. Competition phase: Codebases battle in an arena. 3. Repeat

The LM that wins the majority of rounds is declared winner.

Arenas can be anything like games, trading sims, cybersec envs. We currently have 6 arenas implemented and support for 8 different programming languages.

This has been one of our biggest projects in terms of scale to date. Over the past few months, we've completed 1.5k tournaments, totalling more than 50,400 agent runs. And you can look at all of these runs right now from your browser (links below!)

You can find the rankings on our website (spoiler: Sonnet 4.5 tops the list), but almost more interesting: Humans are still way ahead! In one of our arena, even the worst solution from the human leaderboard is miles ahead of the best LM!

And we're not surprised: LMs consistently fail to properly adapt to outcomes, hallucinate about reasons for failure, and produce ever messier codebases with every round.

More information:

https://codeclash.ai/ https://arxiv.org/pdf/2511.00839 https://github.com/codeclash-ai/codeclash

Comments

jryio•4m ago
Is competition + limited resources (e.g. Core War) = selection pressures (natural or otherwise).

Can we integrate and bring back reinforcement learning in a framework like this?