frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•2m ago•0 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

1•amichail•2m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•9m ago•1 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•10m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•11m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•12m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•13m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•14m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•14m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•15m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•17m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
4•codexon•17m ago•2 comments

The Science of the Perfect Second (2023)

https://harpers.org/archive/2023/04/the-science-of-the-perfect-second/
1•NaOH•18m ago•0 comments

Bob Beck (OpenBSD) on why vi should stay vi (2006)

https://marc.info/?l=openbsd-misc&m=115820462402673&w=2
2•birdculture•22m ago•0 comments

Show HN: a glimpse into the future of eye tracking for multi-agent use

https://github.com/dchrty/glimpsh
1•dochrty•22m ago•0 comments

The Optima-l Situation: A deep dive into the classic humanist sans-serif

https://micahblachman.beehiiv.com/p/the-optima-l-situation
2•subdomain•23m ago•1 comments

Barn Owls Know When to Wait

https://blog.typeobject.com/posts/2026-barn-owls-know-when-to-wait/
1•fintler•23m ago•0 comments

Implementing TCP Echo Server in Rust [video]

https://www.youtube.com/watch?v=qjOBZ_Xzuio
1•sheerluck•23m ago•0 comments

LicGen – Offline License Generator (CLI and Web UI)

1•tejavvo•27m ago•0 comments

Service Degradation in West US Region

https://azure.status.microsoft/en-gb/status?gsid=5616bb85-f380-4a04-85ed-95674eec3d87&utm_source=...
2•_____k•27m ago•0 comments

The Janitor on Mars

https://www.newyorker.com/magazine/1998/10/26/the-janitor-on-mars
1•evo_9•29m ago•0 comments

Bringing Polars to .NET

https://github.com/ErrorLSC/Polars.NET
3•CurtHagenlocher•30m ago•0 comments

Adventures in Guix Packaging

https://nemin.hu/guix-packaging.html
1•todsacerdoti•32m ago•0 comments

Show HN: We had 20 Claude terminals open, so we built Orcha

1•buildingwdavid•32m ago•0 comments

Your Best Thinking Is Wasted on the Wrong Decisions

https://www.iankduncan.com/engineering/2026-02-07-your-best-thinking-is-wasted-on-the-wrong-decis...
1•iand675•32m ago•0 comments

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

https://www.warcraftcn.com/
1•vyrotek•33m ago•0 comments

Trump Vodka Becomes Available for Pre-Orders

https://www.forbes.com/sites/kirkogunrinde/2025/12/01/trump-vodka-becomes-available-for-pre-order...
1•stopbulying•34m ago•0 comments

Velocity of Money

https://en.wikipedia.org/wiki/Velocity_of_money
1•gurjeet•37m ago•0 comments

Stop building automations. Start running your business

https://www.fluxtopus.com/automate-your-business
1•valboa•41m ago•1 comments

You can't QA your way to the frontier

https://www.scorecard.io/blog/you-cant-qa-your-way-to-the-frontier
1•gk1•42m ago•0 comments
Open in hackernews

Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets

https://codeclash.ai/
5•lieret•3mo ago
Current evals test LMs on tasks: "fix this bug," "write a test"

But we code to achieve goals: maximize revenue, cut costs, win users

Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals.

Because real software dev isn’t about following instructions. It’s about achieving outcomes.

Here's how it works:

Two LMs enter a tournament. Each maintains its own codebase.

Every round:

1. Edit Phase: LMs modify their codebases however they like 2. Competition phase: Codebases battle in an arena. 3. Repeat

The LM that wins the majority of rounds is declared winner.

Arenas can be anything like games, trading sims, cybersec envs. We currently have 6 arenas implemented and support for 8 different programming languages.

This has been one of our biggest projects in terms of scale to date. Over the past few months, we've completed 1.5k tournaments, totalling more than 50,400 agent runs. And you can look at all of these runs right now from your browser (links below!)

You can find the rankings on our website (spoiler: Sonnet 4.5 tops the list), but almost more interesting: Humans are still way ahead! In one of our arena, even the worst solution from the human leaderboard is miles ahead of the best LM!

And we're not surprised: LMs consistently fail to properly adapt to outcomes, hallucinate about reasons for failure, and produce ever messier codebases with every round.

More information:

https://codeclash.ai/ https://arxiv.org/pdf/2511.00839 https://github.com/codeclash-ai/codeclash

Comments

jryio•3mo ago
Is competition + limited resources (e.g. Core War) = selection pressures (natural or otherwise).

Can we integrate and bring back reinforcement learning in a framework like this?