For instance,
I'm inclined to generally believe Kimi K2.5's benchmarks, because I've found that their models tend to be extremely good qualitatively and feel actually well-rounded and intelligent instead of brittle and bench-maxed.
I'm inclined to give GLM 5 some benefit of the doubt, because while I think their past benchmarks have overstated their models' capabilities, I've also found their models relatively competent, and they 2X'd the size of their models, as well as introduced a new architecture and raised the number of active parameters, which makes me feel like there is a possibility they could actually meet the benchmarks they are claiming.
Meanwhile, I've never found MiniMax remotely competent. It's always been extremely brittle, tended to screw up edits and misformat even simple JavaScript code, get into error loops, and quickly get context rot. And it's also simply just too small, in my opinion, to see the kind of performance they are claiming.
Huge - if not groundbreaking - if the benchmark stats are true.
Artificial Analysis put MiniMax 2.1 Coding index on 33, far behind frontier models and I feel it's about right. [1]
mythz•1h ago
Not self-hosting yet, but I prefer using Chinese OSS models for AI workflows because of the potential to self-host in future if needed. Also using it to power my openclaw assistant since IMO it has the best balance of speed, quality and cost:
> It costs just $1 to run the model continuously for an hour at 100 tokens/sec. At 50 tokens/sec, the cost drops to $0.30.
user2722•1h ago
I'll have to look for it in OpenRouter.
amunozo•1h ago
algo_trader•10m ago
Its good to have these models to keep the frontier labs honest! Can i ask if you use the API or a monthly plan? Do the monthly plan throttle/reset ?
edit: i agree that MM2.1 most economic, and K2.5 generally the strongest