I built Beauty Arena to solve a data problem I've always found annoying: absolute rating scales (1-10) are terrible for subjective data. They suffer from massive inflation and inconsistent user baselines (one person's 7 is another person's 5).
I wanted to test if pairwise comparison (1v1) could produce a cleaner, strictly relative dataset.
Instead of asking "How beautiful is this person?", the system asks a simple question: "Who do you choose?". It uses a ranking system inspired by competitive games (Elo/Glicko) under the hood. As users vote, a global ranking emerges based on win/loss ratios against others rather than accumulated points.
I'm curious about the "wisdom of the crowd" limits here. Does a pairwise sort actually converge on a clear consensus, or does it cycle indefinitely due to intransitive preferences (A > B, B > C, but C > A)?
I'd love feedback on the ranking methodology and the overall UI.