I'm a software engineer working on C++/python/robotics by day, dabbing into web apps by night. I built The Frontier (https://the-frontier.app) because the LLM market is moving so fast it's hard to tell if you're overpaying for performance.
Pricing is easy to find, but it's hard to tell if you're missing a similarly priced or even cheaper model with better performance. So I built a visualization that maps LM Arena’s Elo scores against OpenRouter’s pricing.
The main thing it does is calculate the Pareto frontier. It highlights the optimal models at each price point, so you can easily spot when a model is technically a "bad deal" compared to its peers.
The hard part: The real headache wasn't the UI, it was the messy data. LMArena names models one way (e.g. "qwen3-coder-480b-a35b-instruct"), OpenRouter another ("qwen/qwen3-coder"), and you have to deal with a mess of variants like "thinking", "instruct", "fast", or "v1.0" vs "v1". I ended up building an automated scoring system to match these models automatically so the chart stays clean without manual mapping.
I'm pretty happy with the result, I find myself surfing the frontier (literally), going up and down the frontier to find the best model for my use case and budget.
The Tech: - React + Vite - ECharts for the visualization - A daily sync to keep the chart up-to-date with new releases
I also just added Latency and Throughput metrics because sometimes latency or throughput is just as important as intelligence.
I’d love to hear what you think, especially if you spot any weird model matches (Unfortunately they still happen) or have ideas of what to add next ! I have a few ideas, like combining latency and throughput into one, or even intelligence, latency and throughput, I'll call it Wisdom :)
URL: https://the-frontier.app/
Thanks!