My analysis of 439 models proves: You're overpaying for your LLMs

https://whatllm.vercel.app/

7•demian101•6mo ago

Comments

demian101•6mo ago

While everyone's geeking out over Grok4's insane physics sims and Kimi K2's 1T OS bombshell (crushing coding benchmarks for pennies), the real AI drama is in the pricing shadows. After my LLM Selector post blew up here, I kept getting DMs asking "but which provider should I actually use?" So I dove deep into 439 models across 63 providers.

What I found? some interesting insights:

1. huge markup on identical models Take DeepSeek R1 0528 (quality 68 from Artificial analysis bench, beats many flagships):

Completely free on Google Vertex and CentML (decent speeds too, 121 tok/s and 87 tok/s).

But jumps to $0.91 on Deepinfra, $4.25 on Fireworks Fast, and a whopping $5.50 on SambaNova, for the exact same model (ofc with speed differences).

Arbitrage alert: Why pay infinite markup when free tiers deliver the goods for experimentation or bulk runs?

2. Latency goldmines hiding in plain sight Sub millisecond responses aren't just for premium setups:

Nebius Base crushes it with DeepSeek R1 at 0.61ms latency for $1.00/1M (103 tok/s) and Qwen3 235B at 0.56ms for $0.30/1M (50 tok/s).

Groq takes it further with models like Qwen3 32B at 0.14ms for $0.36/1M (627 tok/s).

Arbitrage alert: These blow away slower "enterprise" options costing 10x more, ideal for real-time apps

3. speed demons with massive throughput gaps Hardware optimization creates wild performance swings:

Cerebras with Qwen3 32B at 2,496 tok/s for $0.50/1M and Llama 4 Scout at 2,808 tok/s for $0.70/1M.

Compare to the same models elsewhere: Often stuck at 40-80 tok/s for similar or higher prices.

Arbitrage alert: 50x+ throughput boosts on the same model?

4. Quality overpays that defy logic High-quality doesn't mean high-price anymore:

Qwen3 235B (quality 62) at $0.10/1M on Fireworks (79 tok/s): outperforms Claude 4 Opus (quality 58) which costs $30/1M everywhere (19-65 tok/s).

Grok 3 mini (quality 67) at $0.35/1M on xAI (210 tok/s), edging out pricier closed source rivals.

Arbitrage alert: 300x cheaper for better quality? Open-source gems like these make "premium" models look like rip-offs lol

5. Provider flips on big-name models Even giants like OpenAI show huge variances:

GPT-4.1 mini ($0.70/1M): Azure blasts 217 tok/s vs OpenAI's 73 tok/s.

o3 ($3.50/1M): OpenAI hits 199 tok/s vs Azure's slower 99 tok/s (with double the latency).

Arbitrage alert: Same price, but 3x throughput or half the latency? Picking the right endpoint saves thousands on production workloads.

We're in the Wild West of pricing amid all the hype. Big names coast on reputation, but smaller providers like Nebius and Cerebras optimize like mad.

Open-source crushes closed-source on value: top 20 price-perf plays are ALL open.

What should you do?

Stop assuming expensive = better

Hunt latency and speed arbitrages (they're everywhere)

Test specialised providers for throughput wins

Grab sub-$0.50 open-source beasts (like Qwen3 or Grok mini)

Exploit these gaps now before "normalization" hits

Centralised all the data from Artificial analysis on whatllm.com, and insights are the real gold.

Found crazier arbitrages? Spill in comments!

which hype are you actually buying, and why?

This rabbit hole hit harder than any benchmark!

Happy to geek out more!

Show HN: Poddley.com – Follow people, not podcasts

Layoffs Surge 118% in January – The Highest Since 2009

Papyrus 114: Homer's Iliad

DicePit – Real-time multiplayer Knucklebones in the browser

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

Show HN: Runtime Fence – Kill switch for AI agents

Researchers surprised by the brain benefits of cannabis usage in adults over 40

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "