We’ve been benchmarking a few models on our API platform and got some interesting performance numbers:
- MiniMax M2.5 → 0.118s time-to-first-token, 103 tokens/sec
- GLM 5.1 → 120 tokens/sec throughput
- Kimi K2.5 → 0.643s TTFT, 69 tokens/sec
- All models → ~99.9% request success rate
The latency difference is especially noticeable, ~0.1s TTFT feels almost instant in interactive apps.
Let me know how you're evaluating LLM APIs. Are you optimizing more for latency, throughput, or cost?