Doesn't look to promising. Is there any reason to consider Mistral other than it's not US?
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
They more or less claim this exceeds Claude Sonnet 3.5 on most things, but is worse than Sonnet 3.6, and exceeds all other open models.
Oh and they have a cloud service that will code your apps "in the cloud". But, yeah, at this point, so does my cat.
And, yes, unsloth is on it: https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF (but 4bit quant is 75G)
There is no way it exceeds “all other” open models - but it does exceed all of Mistral’s past models.
You can see it getting blown past by GLM 5.1 and Kimi in this.
Still excited to give it a try
A few months ago China was being criticized left and right on how somehow it was not able to compete, and once DeepSeek showed up then all the hatred shifted onto how China was actually competing but exploring unfair competitive advantages.
Funny how that works.
Also, aren't the likes of OpenAI burning through over $2 of investment for each $1 of revenue?
>Also, aren't the likes of OpenAI burning through over $2 of investment for each $1 of revenue?
Yes, innovation costs money.
Edit: In response to below, EUV machines use tech licensed from the US, so yes, the US worked on them.
I think you should check your notes. The likes of Kimi K2 thinking shows up as high as the second best general purpose model currently in existence. It seems they compete just fine.
If you believe "distilling" is all it takes to put together a model at the top of any synthetic benchmark then I wonder what you would have to say about all US models that greatly underperform in comparison and still manage to be used extensively in professional settings.
But your argument is an emotional one and not rarional, isn't it?
China are cheating by using data obtained without permission to train their models in an evil commie way!
They should have done what the US did instead and trained models on data obtained without permission in a fair and freedum way!
> Where are the Chinese models that are blowing US ones out of the water?
Kimi2 blows every US model out of the water in any comparison that includes both costs and performance.
There are none. Mistral Small 4 is pareto-competitive in its pricing bracket at $0.15/$0.60, at worst it's second to Gemma 4 26B A4B. The above countries have never had a model that is even close to being so.
This particular Mistral Medium looks to be uncompetitive at that pricing. I'm surprised it's so expensive given its size. Wonder if we'll see other providers offer it for cheaper.
but that doesn't mean Mistral has never produced anything useful.
Yes, it might be a problem that the UK allows companies like this to be bought up by foreign countries.
China and rest of the world has sane leadership that aren't mentally retarted.
Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models. The difference in capability is enormous and choosing anything less has a real cost in terms of productivity.
I've been a big fan of the smaller labs like Mistral and especially Cohere but it's been a while since I've been excited by a release by either company.
That said, I'm using mistral voxtral realtime daily – it's great.
It's just apples to oranges.
There is not a clear, across the board, winner on non-agentic tasks between Gemini, ChatGPT, and Claude - the simple chatbot interface.
But Claude Code is substantially better than Codex which itself is notably better than Gemini-cli.
In this vein, it should not be surprising that Claude Code is way better than non-frontier models for agentic coding... It's substantially better than other frontier models at specialized agentic tasks.
From my perspective, Claude Code is decidedly not better than Codex. They’re slightly different and work better together. I would have no issues dropping CC entirely and using codex 100%.
If you’re working off of “defaults”, in other words no custom prompting, Claude Code does perform a lot better out of the box. I think this matters, but if you’re a professional software developer, I’d make the case that you should be owning your tools and moving beyond the baked in prompts.
This is a very naive and misguided opinion. In most tasks, including complex coding tasks, you can hardly tell the difference between a frontier model and something like GPT4.1. You need to really focus on areas such as context window, tool calling and specific aspects of reasoning steps to start noticing differences. To make matters worse, frontier models are taking a brute force approach to results which ends up making them far more expensive to run, both in terms of what shows up on your invoice and how much more you have to wait to get any resemblance of output.
And I won't even go into the topic or local models.
This is like saying "the current models and the old models are the same if you ignore every important advance they've made"
The different results on some benchmarks vibes as if this is truly an independently trained model, not just exfiltrated frontier logs, which I think is also really important - having different weight architectures inside a particular model seems like a benefit on its own when viewed from a global systems architecture perspective.
One thing in particular I was disappointed in was its bad explanations when asking about French grammar. It made multiple mistakes and the other models got it right, even Qwen 3.6 27b!
Anyway, I'm hoping they catch up some more.
The only benefit of leading is mindshare. OpenAI is doubling down on that, by investing in communication companies. That's their pathetic attempt at a "moat".
That is what has happened until now though
That said, when I stop spending money on Gemini Ultra, I will give Mistral Vibe another 1-month test.
I like the entire business model and vibe of Mistral so much more than OpenAI/Anthropic/Google but I also have stuff to get done. I am curious if Mistral Vibe for $15/month is a stable business model (i.e., can they make a profit).
Their model listing API returns this:
{
"id": "mistral-medium-2508",
"object": "model",
"created": 1777479384,
"owned_by": "mistralai",
"capabilities": {
"completion_chat": true,
"function_calling": true,
"reasoning": false,
"completion_fim": false,
"fine_tuning": true,
"vision": true,
"ocr": false,
"classification": false,
"moderation": false,
"audio": false,
"audio_transcription": false,
"audio_transcription_realtime": false,
"audio_speech": false
},
"name": "mistral-medium-2508",
"description": "Update on Mistral Medium 3 with improved capabilities.",
"max_context_length": 131072,
"aliases": [
"mistral-medium-latest",
"mistral-medium",
"mistral-vibe-cli-with-tools"
],
"deprecation": null,
"deprecation_replacement_model": null,
"default_model_temperature": 0.3,
"type": "base"
},
So that has the alias "mistral-medium-latest", but the official ID is "mistral-medium-2508" which suggests it's the model they released in August 2025.But... that 1777479384 timestamp decodes to Wednesday, April 29, 2026 at 04:16:24 PM UTC
So is that the new Mistral Medium?
curl https://api.mistral.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(llm keys get mistral)" \
-d '{
"model": "mistral-medium-3.5",
"messages": [
{"role": "user", "content": "Generate an SVG of a pelican riding a bicycle"}
]
}'
Which did work: https://gist.github.com/simonw/f3158919b18d2c47863b0a5dc257a... - it's pretty disappointing.Weird that it doesn't show up in the model list:
curl https://api.mistral.ai/v1/models \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(llm keys get mistral)" | jqGLM 5.1 is an excellent model, but even at Q4 you're looking at ~400GB. Kimi K2.5 is really good too, and at Q4 quantization you're looking at almost ~600GB.
This model? You can run it at Q4 with 70GB of VRAM. This is approaching consumer level territory (you can get a Mac Studio with 128GB of RAM for ~3500 USD).
For the Claude-pilled people, I don't know if you only run Opus but when I was on the Pro plan Sonnet was already extremely capable. This beats the latest Sonnet while running locally, without anyone charging you extra for having HERMES.md in your repo, or locking you out of your account on a whim.
Mistral has never been competitive at the frontier, but maybe that is not what we need from them. Having Pareto models that get you 80% of the frontier at 20% of the cost/size sounds really good to me.
amunozo•1h ago
ako•1h ago
dotancohen•1h ago
amunozo•52m ago
Ritewut•40m ago
r0b05•1h ago
spwa4•1h ago
Funny detail: Google AI (the one they use in search) can't spell évidemment correctly.
baq•59m ago
lava_pidgeon•59m ago
manishsharan•45m ago
I have been using DeepSeek and GLMnmodels with OpenCode and Codex and Claudr side by side.
I have not found the Chinese models lacking. I enjoy for coding and like to maintain full control of my codebade and deeply care about the GOF patterns. So I am very stringent in terms of what I want the LLM to code and how to code.
So from my perspective, they are all about the same.
amunozo•40m ago