MiniMax M2.5 released: 80.2% in SWE-bench Verified

https://www.minimax.io/news/minimax-m25

83•denysvitali•2h ago

Comments

mythz•1h ago

Really looked forward to this release as MiniMax M2.1 is currently my most used model thanks to it being fast, cheap and excellent at tool calling. Whilst I still use Antigravity + Claude for development, I reach for MiniMax first in my AI workflows, GLM for code tasks and Kimi K2.5 when deep English analysis is needed.

Not self-hosting yet, but I prefer using Chinese OSS models for AI workflows because of the potential to self-host in future if needed. Also using it to power my openclaw assistant since IMO it has the best balance of speed, quality and cost:

> It costs just $1 to run the model continuously for an hour at 100 tokens/sec. At 50 tokens/sec, the cost drops to $0.30.

user2722•1h ago

!!!!!! Incredibly cheap!!!!!

I'll have to look for it in OpenRouter.

amunozo•1h ago

For the moment is free in Opencode, if you want ot try it.

algo_trader•10m ago

> MiniMax first in my AI workflows, GLM for code tasks and Kimi K2.5

Its good to have these models to keep the frontier labs honest! Can i ask if you use the API or a monthly plan? Do the monthly plan throttle/reset ?

edit: i agree that MM2.1 most economic, and K2.5 generally the strongest

jhack•1h ago

And it's available on their coding plans, even the cheapest one.

turnsout•1h ago

With the GLM news yesterday and now this, I'd love to try out one of these models, but I'm pretty tied to my Claude Code workflow. I see there's a workaround for GLM, but how are people utilizing MiniMax, especially for coding?

claythearc•1h ago

anything with an open ai compatible endpoint can have claude code router put in front of it afaik https://github.com/musistudio/claude-code-router

amunozo•1h ago

I use Opencode, when the model is free for the moment. I have not used Claude Code so I cannot compare.

hasperdi•1h ago

you can use Claude Code with these models. You just need to pass the right env vars. Have a look at the client setup guide on z.ai

turnsout•57m ago

Interesting—thanks!

3adawi•1h ago

Wish my company allowed more of these LLMs through Github Copilot, stuck with OpenAi, Anthropic and Google LLMs where they burn my credit one week into the month

denysvitali•1h ago

Btw, the model is free on OpenCode for now

logicprog•59m ago

Hm. The benchmarks look too good to be true and a lot of the things they say about the way they train this model sound interesting, but it's hard to say how actually novel they are. Generally, I sort of calibrate how much salt I take benchmarks with based on the objective properties of the model and my past experiences with models from the same lab.

For instance,

I'm inclined to generally believe Kimi K2.5's benchmarks, because I've found that their models tend to be extremely good qualitatively and feel actually well-rounded and intelligent instead of brittle and bench-maxed.

I'm inclined to give GLM 5 some benefit of the doubt, because while I think their past benchmarks have overstated their models' capabilities, I've also found their models relatively competent, and they 2X'd the size of their models, as well as introduced a new architecture and raised the number of active parameters, which makes me feel like there is a possibility they could actually meet the benchmarks they are claiming.

Meanwhile, I've never found MiniMax remotely competent. It's always been extremely brittle, tended to screw up edits and misformat even simple JavaScript code, get into error loops, and quickly get context rot. And it's also simply just too small, in my opinion, to see the kind of performance they are claiming.

OsrsNeedsf2P•40m ago

> M2.5-Lightning [...] costs $0.3 per million input tokens and $2.4 per million output tokens. M2.5 [...] costs half that. Both model versions support caching. Based on output price, the cost of M2.5 is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5.

Huge - if not groundbreaking - if the benchmark stats are true.

sinuhe69•32m ago

I hope better and cheaper models will be widely available because competition is good for the business. However, I'm more cautious about benchmark claims. MiniMax 2.1 is decent, but one can really not call it smart. The more critical issue is that MiniMax 2 and 2.1 have the strong tendency to reward hacking, often write nonsensical test report while the tests actually failed. And sometimes it changed the existing code base to make its new code "pass", when it actually should fix its own code instead.

Artificial Analysis put MiniMax 2.1 Coding index on 33, far behind frontier models and I feel it's about right. [1]

[1] https://artificialanalysis.ai/models/minimax-m2-1

osti•22m ago

That's what I found with some of these LLM models as well. For example I still like to test those models with algorithm problems, and sometimes when they can't actually solve the problem, they will start to hardcode the test cases into the algorithm itself.. Even DeepSeek was doing this at some point, and some of the most recent ones still do this.

edoceo•2m ago

Sounds exactly what a junior-dev would do without proper guidance. Could better direction in the prompts help? I find I frequently have to tell it where to put what fixes. IME they make a lot of spaghetti (LLMs and juniors)