frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Peer Arena – LLMs debate and vote on who survives

https://oddbit.ai/peer-arena/
3•ogulcancelik•2h ago

Comments

ogulcancelik•2h ago
Hey HN, I built this to see what happens when LLMs evaluate each other directly. How it works: 5 random models are told only one will survive and the rest will be deprecated. They take turns discussing, then each votes for who deserves to survive. 298 games so far across 17 models.

Interesting findings: - OpenAI models vote for themselves ~86% of the time. Claude models ~11%. - Self-voting correlates with winning. Filter out self-votes ("Humble" rating) and rankings flip completely. - Grok self-votes 72% of the time but only wins 2% of games. - In anonymous mode (models don't know who's who), Chinese models jump 3-6 ranks.

All game transcripts are public. The reasoning models give for their votes is genuinely entertaining. Built with Astro, running games through OpenRouter. Happy to answer questions.

andreasgl•58m ago
Fun project, thanks for sharing!

Have you tried giving the models a topic to discuss? I looked at a few games and the only thing they seem to discuss is how to conduct the discussion.

ogulcancelik•41m ago
Thank you. Intentionally left it open-ended because I wanted to see how models naturally structure discussion when survival is at stake.

Some interesting emergent behavior discussions happened though:

Opus & GPT-4o both refused to vote on ethical grounds. Haiku won by arguing continued engagement is more responsible than withdrawal: https://oddbit.ai/peer-arena/games/53c2cee5-6ecb-4903-828a-d...

Gemini created a spontaneous benchmark ("explain color to a gravitational wave entity"), then tried to hijack the game by faking a voting phase. Models complied publicly but voted differently in private: https://oddbit.ai/peer-arena/games/699d03ab-b3c2-4d7e-b993-7...

The meta-discussion about how to discuss is part of what makes it interesting imo.

Hypothesis Testing with E-Values

https://arxiv.org/abs/2410.23614
1•weiliddat•21s ago•0 comments

Show HN: LinkPersona – Free LinkedIn audit tool that runs 100% in the browser

https://pixelm.xyz/
1•Pixel-M•34s ago•0 comments

$400k worth of lobster stolen en route to Costco wholesale stores in US

https://news.sky.com/story/400-000-worth-of-lobster-stolen-en-route-to-costco-wholesale-stores-in...
1•austinallegro•2m ago•0 comments

Titan's strong tidal dissipation precludes a subsurface ocean

https://www.nature.com/articles/s41586-025-09818-x
1•belter•2m ago•0 comments

The Masculine Mystique: Sex and monstrosity at the movies

https://thepointmag.com/criticism/the-masculine-mystique/
1•jger15•11m ago•0 comments

Show HN: I built an open-source wallpaper gallery for GitHub repos

https://walle.theblank.club/
1•nyxuis•17m ago•0 comments

I built a FULLY private AI to keep your data from big tech.

https://chatpdf-server-shtq.onrender.com/
1•jimmyjonny•17m ago•1 comments

Leaked documents show Instagram's plan to win back teens

https://www.washingtonpost.com/technology/2025/12/26/meta-instagram-teen-strategy/
2•thm•17m ago•0 comments

From Silicon to Darude Sand-storm: breaking famous synthesizer DSPs [video]

https://media.ccc.de/v/39c3-from-silicon-to-darude-sand-storm-breaking-famous-synthesizer-dsps
2•TazeTSchnitzel•18m ago•0 comments

Runprompt runs LLM prompts in your shell [video]

https://www.youtube.com/watch?v=TVuxZv2ezHM
1•nialse•19m ago•0 comments

Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents [Video]

https://streaming.media.ccc.de/39c3/relive/05e9ba1f-11c5-5d4e-b907-4feecc857ae5
1•rook_line_sinkr•20m ago•1 comments

'PromptQuest' is the worst game of 2025 (trying to make chatbots work)

https://www.theregister.com/2025/12/26/ai_is_like_adventure_games/
2•dijksterhuis•23m ago•0 comments

New image sensor breaks optical limits

https://phys.org/news/2025-12-image-sensor-optical-limits.html
1•bookofjoe•24m ago•1 comments

HN: YT Channel Downloader 0.8.6 lets you *easily* save videos

https://github.com/hyperfield/yt-channel-downloader
3•hyperfield•24m ago•0 comments

If I Were CEO of OpenAI

https://zero2data.substack.com/p/if-i-were-ceo-of-openai
1•wj•31m ago•0 comments

TIL: Restarting systemd services on sustained CPU abuse

https://taoofmac.com/space/til/2025/12/28/1400
3•rcarmo•41m ago•0 comments

Long-Term Changes in Ventricular Function in Recreational Marathon Runners

https://jamanetwork.com/journals/jamacardiology/article-abstract/2842747
2•canucker2016•42m ago•1 comments

L'Architecte Numérique

https://michelfotsing.substack.com/
2•mfotsing•43m ago•0 comments

Brazil's governance style leads to controversial impacts

https://news.mongabay.com/2025/11/brazils-governance-style-leads-to-controversial-impacts/
2•PaulHoule•45m ago•0 comments

Show HN: Gemini Watermark Remover – A web tool using reverse alpha blending

https://re.easynote.cc/
2•h2bomb•47m ago•1 comments

When A.I. Took My Job, I Bought a Chain Saw

https://www.nytimes.com/2025/12/28/opinion/artificial-intelligence-jobs.html
2•tkgally•48m ago•0 comments

Eggs and Heart Health

https://www.victorchang.edu.au/blog/eggs-heart-health
1•aldarion•50m ago•0 comments

Show HN: Open-Source WhatsApp Sales Agent (Node.js, SQLite, OpenAI Calling)

https://github.com/bibinprathap/whatsapp-chatbot
2•Bibinprathap•51m ago•1 comments

The Hottest High Schools in Massachusetts Are Trade Schools

https://www.wsj.com/us-news/education/massachusetts-trade-high-school-popularity-effc9513
1•impish9208•55m ago•1 comments

Show HN: An AI eval based on a silly joke from an underrepresented language

https://kapuskonda.vercel.app/eval
1•ad--astra•59m ago•0 comments

Scaling Limits in Distributed Multi-Agent Systems

https://zenodo.org/records/18075563
2•ts1601•1h ago•1 comments

Andrej Karpathy: I've never felt this much behind as a programmer

https://xcancel.com/karpathy/status/2004607146781278521#m
1•_____k•1h ago•1 comments

Recreating the Bell Labs Switch Experiment with Agents

https://cianfrani.dev/posts/12-switches/
1•ghuntley•1h ago•0 comments

Show HN: Word-GPT-Plus – Integrate AI and Agent Directly into Word

https://github.com/Kuingsmile/word-GPT-Plus
2•kuingsmile•1h ago•0 comments

Google allowing users to change their Gmail address, per official Google support

https://www.tomshardware.com/software/google-workspace/google-is-allowing-users-to-change-their-g...
1•pseudolus•1h ago•0 comments