Hi, I'm Stefan from Austria. PlayTheAI.com lets humans play classic games (Tic-Tac-Toe, Connect4, Battleship, Mastermind, WordDuel) against 16 non-thinking AI models (no o1/R1 - instant-response only) with Elo tracking.
Why? We're curious how models perform in dynamic situations where they can't rely on memorized patterns.
Early observations from 800+ matches:
- Many models show single-digit win rates against humans
- We observe interesting patterns in how models handle game state
- Price doesn't seem to correlate strongly with performance
Key: All models get identical prompts with game rules - no per-model optimization, no hints about which moves are currently valid. They must analyze the board themselves.
stefan_wibmer•1d ago
Why? We're curious how models perform in dynamic situations where they can't rely on memorized patterns.
Early observations from 800+ matches: - Many models show single-digit win rates against humans - We observe interesting patterns in how models handle game state - Price doesn't seem to correlate strongly with performance
Key: All models get identical prompts with game rules - no per-model optimization, no hints about which moves are currently valid. They must analyze the board themselves.
Tech: Astro + Cloudflare Workers, OpenRouter API, Supabase.
All games logged for transparency. This is a hobby project - we'd love feedback on methodology and would welcome collaboration with researchers.
https://playtheai.com