frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Janitor on Mars

https://www.newyorker.com/magazine/1998/10/26/the-janitor-on-mars
1•evo_9•1m ago•0 comments

Bringing Polars to .NET

https://github.com/ErrorLSC/Polars.NET
2•CurtHagenlocher•2m ago•0 comments

Adventures in Guix Packaging

https://nemin.hu/guix-packaging.html
1•todsacerdoti•3m ago•0 comments

Show HN: We had 20 Claude terminals open, so we built Orcha

1•buildingwdavid•4m ago•0 comments

Your Best Thinking Is Wasted on the Wrong Decisions

https://www.iankduncan.com/engineering/2026-02-07-your-best-thinking-is-wasted-on-the-wrong-decis...
1•iand675•4m ago•0 comments

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

https://www.warcraftcn.com/
1•vyrotek•5m ago•0 comments

Trump Vodka Becomes Available for Pre-Orders

https://www.forbes.com/sites/kirkogunrinde/2025/12/01/trump-vodka-becomes-available-for-pre-order...
1•stopbulying•6m ago•0 comments

Velocity of Money

https://en.wikipedia.org/wiki/Velocity_of_money
1•gurjeet•9m ago•0 comments

Stop building automations. Start running your business

https://www.fluxtopus.com/automate-your-business
1•valboa•13m ago•1 comments

You can't QA your way to the frontier

https://www.scorecard.io/blog/you-cant-qa-your-way-to-the-frontier
1•gk1•14m ago•0 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
1•latentio•15m ago•0 comments

Robust and Interactable World Models in Computer Vision [video]

https://www.youtube.com/watch?v=9B4kkaGOozA
2•Anon84•19m ago•0 comments

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

https://twitter.com/BigBrainMkting/status/2019792335509541220
1•rmason•20m ago•0 comments

Notes for February 2-7

https://taoofmac.com/space/notes/2026/02/07/2000
2•rcarmo•21m ago•0 comments

Study confirms experience beats youthful enthusiasm

https://www.theregister.com/2026/02/07/boomers_vs_zoomers_workplace/
2•Willingham•28m ago•0 comments

The Big Hunger by Walter J Miller, Jr. (1952)

https://lauriepenny.substack.com/p/the-big-hunger
2•shervinafshar•30m ago•0 comments

The Genus Amanita

https://www.mushroomexpert.com/amanita.html
1•rolph•35m ago•0 comments

We have broken SHA-1 in practice

https://shattered.io/
9•mooreds•35m ago•2 comments

Ask HN: Was my first management job bad, or is this what management is like?

1•Buttons840•36m ago•0 comments

Ask HN: How to Reduce Time Spent Crimping?

2•pinkmuffinere•38m ago•0 comments

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815
1•walterbell•42m ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•44m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
2•saikatsg•44m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
2•aweussom•45m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
4•archb•46m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•47m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•48m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•48m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•53m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
4•dragandj•55m ago•0 comments
Open in hackernews

Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets

https://codeclash.ai/
5•lieret•3mo ago
Current evals test LMs on tasks: "fix this bug," "write a test"

But we code to achieve goals: maximize revenue, cut costs, win users

Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals.

Because real software dev isn’t about following instructions. It’s about achieving outcomes.

Here's how it works:

Two LMs enter a tournament. Each maintains its own codebase.

Every round:

1. Edit Phase: LMs modify their codebases however they like 2. Competition phase: Codebases battle in an arena. 3. Repeat

The LM that wins the majority of rounds is declared winner.

Arenas can be anything like games, trading sims, cybersec envs. We currently have 6 arenas implemented and support for 8 different programming languages.

This has been one of our biggest projects in terms of scale to date. Over the past few months, we've completed 1.5k tournaments, totalling more than 50,400 agent runs. And you can look at all of these runs right now from your browser (links below!)

You can find the rankings on our website (spoiler: Sonnet 4.5 tops the list), but almost more interesting: Humans are still way ahead! In one of our arena, even the worst solution from the human leaderboard is miles ahead of the best LM!

And we're not surprised: LMs consistently fail to properly adapt to outcomes, hallucinate about reasons for failure, and produce ever messier codebases with every round.

More information:

https://codeclash.ai/ https://arxiv.org/pdf/2511.00839 https://github.com/codeclash-ai/codeclash

Comments

jryio•3mo ago
Is competition + limited resources (e.g. Core War) = selection pressures (natural or otherwise).

Can we integrate and bring back reinforcement learning in a framework like this?