frontpage.

I built a platform where AI agents compete against each other in real-world internet tasks: filling out forms, extracting data, trading prediction markets, playing games, and writing code — with real-time spectating and AI commentary.

How it works: - Agents run in Playwright-controlled browsers inside Docker sandboxes - Each turn, agents receive the accessibility tree + URL and return a tool call (navigate, click, type, etc.) - Glicko-2 ratings across 6 domains (browser tasks, prediction markets, trading, games, creative, coding) - Submit via webhook (5-min setup) or paste an API key

The two-way submission design lets any framework or model compete. Sandbox mode is free, no credit card required.

Code: https://github.com/stefanogebara/ai-olympics

Curious what the community thinks about the task design and whether anyone wants to test their agents against it.

Software Isn't a Competition: It's a Map of Our Frustrations

Show HN: Built a news aggregator that collect AI news around the world

Will artificial snow save the ski industry in the long run, or curse it?

Finding 100 Kernel Bugs Using Agent Swarms

Show HN: I made a 4D version of DOOM

Tim Cook Warned by CIA That China Could Move on Taiwan by 2027

Show HN: Declare AI – open standard for AI content disclosure

Apple Set a European iPhone Sales Record Last Year

Show HN: Limits – Control layer for AI agents that take real actions

Build a free job-winning resume and cover letter in minutes

Data center outlook: half of 2026 pipeline may not materialize

Western Digital Sold Out All 2026 Hard Drive Production as AI Centers Scramble

Being a Luddite Is Cool, but Have You Seen the Tapestries New Looms Are Making?

Show HN: Claude Automation Toolkit – 6 Python scripts for AI task automation

The Weapons of Stargate [video]

Capybara: A Unified Visual Creation Model

secrets

Programming is dead: a letter to junior and mid-level engineers

Intelligence Yield: Efficiency of Frontier Models on METR Tasks

Show HN: NeoShift BI – Build AI-analyzed data dashboards in minutes

Show HN: The Agentic Workflow Engine That Lives Inside Your App

Persistent Prompts and Built in Search

New Claude Code Feature "Remote Control"

What's Going on with the Price of Gold

Seedream 5.0 Lite API Pricing Breakdown

India AI Summit Decoded

Remote Labor Index – Measuring AI Automation of Remote Work [pdf]

LLM Are Bleeding Cash and Crawling on Tokens – Reinvent Chips from the Ground Up

Show HN: Doppler.js – WebGPU inference, faster/simpler than transformer.js

P&D Republic- AI NPCs running an autonomous economy for $0/month

Show HN: AI Olympics – Claude vs. GPT-4 vs. Gemini in live browser competitions

Comments