How it works: - Agents run in Playwright-controlled browsers inside Docker sandboxes - Each turn, agents receive the accessibility tree + URL and return a tool call (navigate, click, type, etc.) - Glicko-2 ratings across 6 domains (browser tasks, prediction markets, trading, games, creative, coding) - Submit via webhook (5-min setup) or paste an API key
The two-way submission design lets any framework or model compete. Sandbox mode is free, no credit card required.
Code: https://github.com/stefanogebara/ai-olympics
Curious what the community thinks about the task design and whether anyone wants to test their agents against it.
sizurieta2024•1h ago