Alibaba → Qwen 3.5 (397B/17B MoE, claims to beat GPT-5.2 on 80% of benchmarks)
Four of five text models are open-weight under MIT or Apache 2.0. All use MoE architectures. All under $1/M input tokens. For comparison: Claude Opus is $5 and GPT-5.2 is $1.75.
The other thing worth paying attention to: every lab is building for agents now, not chatbots. Kimi K2.5 runs 100 sub-agents in parallel. Qwen 3.5 controls apps from screenshots. ByteDance calls Seed 2.0 their "agent era" model.
Most of these scores are vendor-reported, so grain of salt. But even discounting the benchmarks by 10-15%, the pricing difference is hard to explain away.
So what actually justifies paying 5-10x more for Western models? Reliability? Safety? And honestly, how much do you trust vendor-reported benchmarks here?
Curious to see if anyone has compared the Chinese models with Opus 4.6 or GPT-5.2 to see how well they do.
fabioperez•1h ago
Moonshot AI → Kimi K2.5 (coordinates 100 sub-agents in parallel)
z.ai → GLM-5 (lowest hallucination rate on Artificial Analysis, runs on Huawei chips)
MiniMax → M2.5 (80.2% on SWE-bench, claims ~1/10th cost of Claude Opus per task)
ByteDance → Seedance 2.0 (4K video) + Seed 2.0 (powers Doubao, 155M weekly users)
Kuaishou → Kling 3.0 (native 4K 60fps video)
Alibaba → Qwen 3.5 (397B/17B MoE, claims to beat GPT-5.2 on 80% of benchmarks)
Four of five text models are open-weight under MIT or Apache 2.0. All use MoE architectures. All under $1/M input tokens. For comparison: Claude Opus is $5 and GPT-5.2 is $1.75.
The other thing worth paying attention to: every lab is building for agents now, not chatbots. Kimi K2.5 runs 100 sub-agents in parallel. Qwen 3.5 controls apps from screenshots. ByteDance calls Seed 2.0 their "agent era" model.
Most of these scores are vendor-reported, so grain of salt. But even discounting the benchmarks by 10-15%, the pricing difference is hard to explain away.
So what actually justifies paying 5-10x more for Western models? Reliability? Safety? And honestly, how much do you trust vendor-reported benchmarks here?
Curious to see if anyone has compared the Chinese models with Opus 4.6 or GPT-5.2 to see how well they do.