Mistral Medium 3.5

https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

160•meetpateltech•1h ago

Comments

amunozo•1h ago

I want to believe it's gonna be good, but after trying GPT-5.5 even the most advanced Chinese models seem depressing.

ako•1h ago

Then you’ll be happy to learn it’s not Chinese

dotancohen•1h ago

GP is stating that the second best in the field, the Chinese, is so far behind the best in the field, GPT 5.5, that it is not even worth testing anything else.

amunozo•1h ago

Thanks for the translation, I did not express it very clearly. Anything that I try is so much worse.

Ritewut•1h ago

Is GPT 5.5 the best in the field? I think Opus is still better despite Anthropic's recent stumbling.

r0b05•1h ago

This is a French model sir

spwa4•1h ago

Évidemment

Funny detail: Google AI (the one they use in search) can't spell évidemment correctly.

baq•1h ago

What's French for 'goblin'...?

lava_pidgeon•1h ago

Honestly I depends on the context which this performance matters. Mistral is quiet cheap

manishsharan•1h ago

I am not following this obsession with SOTA and benchmark rankings

I have been using DeepSeek and GLMnmodels with OpenCode and Codex and Claudr side by side.

I have not found the Chinese models lacking. I enjoy for coding and like to maintain full control of my codebade and deeply care about the GOF patterns. So I am very stringent in terms of what I want the LLM to code and how to code.

So from my perspective, they are all about the same.

amunozo•1h ago

That I agree with, but for more complex autonomous changes the differences are considerable. However, it seems that most models will reach the saturation time in which they will be useful for almost everything and the difference will be in more and more niche and specialized tasks.

InputName•1h ago

Looks at first graph. It's SWE-Bench Verified. A benchmark Open-AI stopped using two months ago due to contamination.

Doesn't look to promising. Is there any reason to consider Mistral other than it's not US?

tpurves•1h ago

If it's not US and it's within a few percent of SOTA that might be good enough for a lot of people (eg Europeans)

NitpickLawyer•1h ago

Gemma has been better for us at EU languages than mistral (for comparable sized models) :/ so ... dunno. What mistral does well and others are lagging behind is deploying on prem with their engineers and know-how, offering tuned models for your tasks and finetuning on your own data. (I expect google to start offering this next)

amunozo•1h ago

Price and speed.

spwa4•1h ago

TLDR: Mistral Medium 3.5, text-only, 128B dense model, 256k context window, modified MIT license. Model is ~140G ...

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B

They more or less claim this exceeds Claude Sonnet 3.5 on most things, but is worse than Sonnet 3.6, and exceeds all other open models.

Oh and they have a cloud service that will code your apps "in the cloud". But, yeah, at this point, so does my cat.

And, yes, unsloth is on it: https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF (but 4bit quant is 75G)

Marciplan•1h ago

You mean Sonnet 4.5 and 4.6 riight

spwa4•5m ago

right

wolttam•59m ago

Sonnet 4.5 and 4.6*

There is no way it exceeds “all other” open models - but it does exceed all of Mistral’s past models.

You can see it getting blown past by GLM 5.1 and Kimi in this.

Still excited to give it a try

pama•55m ago

Unfortunately they only compare to old “all other open models”. There are probably over 10 other open models better than it by now.

mtct88•1h ago

It's okay, nothing exceptional, but any news from non US and non Chinese models is still good news.

pb7•1h ago

This is the bar for Europe, huh?

amunozo•1h ago

This is the bar for anybody that's not the frontier labs.

locknitpicker•52m ago

> This is the bar for Europe, huh?

A few months ago China was being criticized left and right on how somehow it was not able to compete, and once DeepSeek showed up then all the hatred shifted onto how China was actually competing but exploring unfair competitive advantages.

Funny how that works.

Also, aren't the likes of OpenAI burning through over $2 of investment for each $1 of revenue?

pb7•50m ago

China is not competing, it is distilling US models. Where are the Chinese models that are blowing US ones out of the water? There aren't any. The US continues to innovate, China replicate, Europe regulate. As is tradition.

>Also, aren't the likes of OpenAI burning through over $2 of investment for each $1 of revenue?

Yes, innovation costs money.

Edit: In response to below, EUV machines use tech licensed from the US, so yes, the US worked on them.

sagacity•44m ago

Ah yes, like those EUV machines America and China have worked on.

locknitpicker•39m ago

> China is not competing, it is distilling US models.

I think you should check your notes. The likes of Kimi K2 thinking shows up as high as the second best general purpose model currently in existence. It seems they compete just fine.

If you believe "distilling" is all it takes to put together a model at the top of any synthetic benchmark then I wonder what you would have to say about all US models that greatly underperform in comparison and still manage to be used extensively in professional settings.

But your argument is an emotional one and not rarional, isn't it?

tirpen•38m ago

> China is not competing, it is distilling US models

China are cheating by using data obtained without permission to train their models in an evil commie way!

They should have done what the US did instead and trained models on data obtained without permission in a fair and freedum way!

> Where are the Chinese models that are blowing US ones out of the water?

Kimi2 blows every US model out of the water in any comparison that includes both costs and performance.

nickthegreek•37m ago

2 businesses working to get money from the same customers in the same field is competition. Kellogs is competing with store brand cereal. People are choosing to use these Chinese AI apis because they are good enough for some workflows and cheaper. If they didn't exist, the money would go to the frontier labs. There is no world where this would not be defined as competition.

deaux•44m ago

Where are the competitive models from Singapore, Japan, Taiwan, Korea, Russia, Canada, India, the UK? From anywhere that isn't China or the US?

There are none. Mistral Small 4 is pareto-competitive in its pricing bracket at $0.15/$0.60, at worst it's second to Gemma 4 26B A4B. The above countries have never had a model that is even close to being so.

This particular Mistral Medium looks to be uncompetitive at that pricing. I'm surprised it's so expensive given its size. Wonder if we'll see other providers offer it for cheaper.

but that doesn't mean Mistral has never produced anything useful.

argsnd•31m ago

DeepMind, which is headquartered in London, probably had a significant role in the development of the Gemini and Gemma models.

Yes, it might be a problem that the UK allows companies like this to be bought up by foreign countries.

class4behavior•25m ago

Although the Manus decision might change things for AI, Singapore-washing is quite rampant among Chinese companies, so I wouldn't call this place of origin an alternative market.

johndough•14m ago

> Korea

EXAONE from LG AI Research https://huggingface.co/LGAI-EXAONE

They had one of the best small models a few months ago and they released a new model just last week.

There's also HyperCLOVA X (haven't tested it, but maybe it is also good) https://huggingface.co/naver-hyperclovax

The UAE (not part of the list above) also has a few noteworthy models: https://huggingface.co/tiiuae

wg0•35m ago

I don't mind Chinese but US under Trump is a fascist state based on ethnic and theological grounds pretty much or soon would be if electorate doesn't decide otherwise.

China and rest of the world has sane leadership that aren't mentally retarted.

wyre•1h ago

I'm rooting for Mistral. It seems they are making a big bet that smaller models will win over larger ones and I can see it happening. I was running some simple chat and tool-calling benchmarks for small models and Mistral Small 4 performed well for it's price ($.15/$.60). Seeing this today got me excited, benchmarks seems solid compared to models much larger, but it's priced higher than Haiku, 5.4 mini, and all the the Chinese models it's comparing itself too. It's not even winning those benches either, just being competitive with them, which is great, those models are 5x+ the size, but they are also 1/2 the price. Hard to be excited about that.

postalcoder•1h ago

This release Mistral really reminds you of the gap between the frontier labs and everyone else.

Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models. The difference in capability is enormous and choosing anything less has a real cost in terms of productivity.

I've been a big fan of the smaller labs like Mistral and especially Cohere but it's been a while since I've been excited by a release by either company.

That said, I'm using mistral voxtral realtime daily – it's great.

onlyrealcuzzo•55m ago

> Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models. The difference in capability is enormous and choosing anything less has a real cost in terms of productivity.

It's just apples to oranges.

There is not a clear, across the board, winner on non-agentic tasks between Gemini, ChatGPT, and Claude - the simple chatbot interface.

But Claude Code is substantially better than Codex which itself is notably better than Gemini-cli.

In this vein, it should not be surprising that Claude Code is way better than non-frontier models for agentic coding... It's substantially better than other frontier models at specialized agentic tasks.

nothinkjustai•33m ago

CC is not better than Codex, nor is it better than OpenCode, Crush, Pi etc…

postalcoder•31m ago

I think there's a fair amount of evidence that the heavy harnesses actually drag down performance compared to bare harnesses.

philipbjorge•25m ago

I’ve been comparing Claude Code and Codex extensively side by side over the past couple of weeks with my favorite prompting framework superpowers…

From my perspective, Claude Code is decidedly not better than Codex. They’re slightly different and work better together. I would have no issues dropping CC entirely and using codex 100%.

If you’re working off of “defaults”, in other words no custom prompting, Claude Code does perform a lot better out of the box. I think this matters, but if you’re a professional software developer, I’d make the case that you should be owning your tools and moving beyond the baked in prompts.

locknitpicker•54m ago

> Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models.

This is a very naive and misguided opinion. In most tasks, including complex coding tasks, you can hardly tell the difference between a frontier model and something like GPT4.1. You need to really focus on areas such as context window, tool calling and specific aspects of reasoning steps to start noticing differences. To make matters worse, frontier models are taking a brute force approach to results which ends up making them far more expensive to run, both in terms of what shows up on your invoice and how much more you have to wait to get any resemblance of output.

And I won't even go into the topic or local models.

postalcoder•43m ago

> You need to really focus on areas such as context window, tool calling and specific aspects of reasoning steps to start noticing differences.

This is like saying "the current models and the old models are the same if you ignore every important advance they've made"

deaux•53m ago

Can't agree at all. Productivity gap just 1 year ago was much larger for frontier model vs non-frontier. Let alone 2 years ago.

postalcoder•32m ago

When I was thinking pre-agentic, I was actually thinking more pre-"coding seen as the main use case for these models".

vessenes•1h ago

As always, rooting for these guys — model and national diversity is great. This looks like a solid foundation to build on; hopefully the 3.6/3.7 will dial in more gains. It looks like maybe from the computer use benchmarks that their vision pipeline could use improvement, but that’s just speculation.

The different results on some benchmarks vibes as if this is truly an independently trained model, not just exfiltrated frontier logs, which I think is also really important - having different weight architectures inside a particular model seems like a benefit on its own when viewed from a global systems architecture perspective.

Tepix•1h ago

I use Mistral Le Chat quite a bit.

One thing in particular I was disappointed in was its bad explanations when asking about French grammar. It made multiple mistakes and the other models got it right, even Qwen 3.6 27b!

Anyway, I'm hoping they catch up some more.

kubb•1h ago

There's a good chance that they'll catch up. The "AI race" is a race to the bottom, with the leaders blowing huge wads of cash on capabilities that get replicated months later by the competition at a fraction of the cost.

The only benefit of leading is mindshare. OpenAI is doubling down on that, by investing in communication companies. That's their pathetic attempt at a "moat".

pb7•52m ago

They catch up by distilling frontier models. They will eventually figure out how to prevent that from happening. No one has any interest in investing tens of billions if the product can be copied and sold for less.

amarcheschi•36m ago

>No one has any interest in investing tens of billions if the product can be copied and sold for less.

That is what has happened until now though

mark_l_watson•57m ago

I like the idea of Mistral, but the last time I evaluated Mistral Vibe it was really nice for $15/month but not as effective as Gemini Plus with AntiGravity and gemini-cli. I am currently running Gemini Ultra on a 3 month 'special deal' and AntiGravity with Opus 4.7 tokens is pretty much fantastic.

That said, when I stop spending money on Gemini Ultra, I will give Mistral Vibe another 1-month test.

I like the entire business model and vibe of Mistral so much more than OpenAI/Anthropic/Google but I also have stuff to get done. I am curious if Mistral Vibe for $15/month is a stable business model (i.e., can they make a profit).

amunozo•26m ago

I'm testing it right now and it seems very buggy and unstable, just like before.

simonw•42m ago

I can't figure out if this is available in the official Mistral API or not.

Their model listing API returns this:

  {
    "id": "mistral-medium-2508",
    "object": "model",
    "created": 1777479384,
    "owned_by": "mistralai",
    "capabilities": {
      "completion_chat": true,
      "function_calling": true,
      "reasoning": false,
      "completion_fim": false,
      "fine_tuning": true,
      "vision": true,
      "ocr": false,
      "classification": false,
      "moderation": false,
      "audio": false,
      "audio_transcription": false,
      "audio_transcription_realtime": false,
      "audio_speech": false
    },
    "name": "mistral-medium-2508",
    "description": "Update on Mistral Medium 3 with improved capabilities.",
    "max_context_length": 131072,
    "aliases": [
      "mistral-medium-latest",
      "mistral-medium",
      "mistral-vibe-cli-with-tools"
    ],
    "deprecation": null,
    "deprecation_replacement_model": null,
    "default_model_temperature": 0.3,
    "type": "base"
  },

So that has the alias "mistral-medium-latest", but the official ID is "mistral-medium-2508" which suggests it's the model they released in August 2025.

But... that 1777479384 timestamp decodes to Wednesday, April 29, 2026 at 04:16:24 PM UTC

So is that the new Mistral Medium?

simonw•33m ago

Some poking around in the source code for https://github.com/mistralai/mistral-vibe got me to this:

  curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(llm keys get mistral)" \
  -d '{
    "model": "mistral-medium-3.5",
    "messages": [
      {"role": "user", "content": "Generate an SVG of a pelican riding a bicycle"}
    ]
  }'

Which did work: https://gist.github.com/simonw/f3158919b18d2c47863b0a5dc257a... - it's pretty disappointing.

Weird that it doesn't show up in the model list:

  curl https://api.mistral.ai/v1/models \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(llm keys get mistral)" | jq

Mashimo•7m ago

I also did some SVG tests, it's really bad.

https://chat.mistral.ai/chat/897fbe7d-b1ae-4109-9b29-f3ccc4f...

minimaxir•34m ago

It's funny that 128B is now considered Medium. I remember back in the day when 355M parameters was considered medium with GPT-2.

speedgoose•23m ago

And GPT-2 1.5B was considered too dangerous to release.

They were perhaps right.

Giorgi•32m ago

Oh they are still a thing?! Completely forgot about Mistral. I am assuming they are still burning trough investor money.

sev_verso•25m ago

What's better than Voxtral for locally processed voice input? More competition is always better.

simjnd•31m ago

I'm not sure what people are on in the comments. It doesn't beat the other models, but it sure competes despite its size.

GLM 5.1 is an excellent model, but even at Q4 you're looking at ~400GB. Kimi K2.5 is really good too, and at Q4 quantization you're looking at almost ~600GB.

This model? You can run it at Q4 with 70GB of VRAM. This is approaching consumer level territory (you can get a Mac Studio with 128GB of RAM for ~3500 USD).

For the Claude-pilled people, I don't know if you only run Opus but when I was on the Pro plan Sonnet was already extremely capable. This beats the latest Sonnet while running locally, without anyone charging you extra for having HERMES.md in your repo, or locking you out of your account on a whim.

Mistral has never been competitive at the frontier, but maybe that is not what we need from them. Having Pareto models that get you 80% of the frontier at 20% of the cost/size sounds really good to me.

redrove•19m ago

It’s 128b dense model. Good luck getting more than 3t/s out of a mac. It doesn’t matter if it fits or not.

gregsadetsky•11m ago

I didn't know about HERMES.md ... (??) - found information here for others who are curious https://github.com/anthropics/claude-code/issues/53262

Mashimo•12m ago

Compared to all other hosted LLMs that I have tested, Mistral seems to be the only one with rather strict CSP headers. When you ask them to create a website with some javascript library it will not preview, even though le chat offers canvas mode.

Sometimes when a new release comes around from any provider I just want to test it a bit on the web. without paying and using an agent harness.

Why are they like this ;_;

Edit: Christ on a bike it's bad at drawing SVGs https://chat.mistral.ai/chat/23214adb-5530-4af9-bb47-90f5219...

7777777phil•8m ago

$1.50 / $7.50 per Mtok for 77.6% on SWE-Bench Verified is the most aggressive open-weight pricing I've seen anywhere near frontier capability.

seb_lz•5m ago

I'm using mistral-medium-2508 for some text transformation operations. It's giving me better results than mistral-large for my use cases. Looking forward to testing this new model, although I'm not sure if it's really meant at replacing the previous medium model since it's a lot more expensive and presented more as a coding / agentic model (mistral-medium-2508 was priced $0.4/$2 per 1M tokens, mistral-medium-3.5 is $1,5/$7.5).

Zed 1.0

Why AI companies want you to be afraid of them

Tangled – We need a federation of forges

Soft launch of open-source code platform for government

Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained

Ghostty is leaving GitHub

FastCGI: 30 Years Old and Still the Better Protocol for Reverse Proxies

Mistral Medium 3.5

An open-source stethoscope that costs between $2.5 and $5 to produce

Online age verification is the hill to die on

Making AI chatbots friendly leads to mistakes and support of conspiracy theories

Show HN: A new benchmark for testing LLMs for deterministic outputs

Improving ICU handovers by learning from Scuderia Ferrari F1 team

GitHub – DOS 1.0: Transcription of Tim Paterson's DOS Printouts

Letting AI play my game – building an agentic test harness to help play-testing

Stardex Is Hiring a Founding Customer Success Lead

Bugs Rust won't catch

Court Rules 2nd Amendment Covers Firearms Parts Good News Those Who Build Guns

Show HN: Adblock-rust Manager – Firefox extension to enable the Brave ad blocker

Before GitHub

How ChatGPT serves ads

Shrdlu

Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

HardenedBSD Is Now Officially on Radicle

Show HN: Rip.so – a graveyard for dead internet things

Withnail's Coat and I

Coffee with a splash of physics: how to make the most out of your brew

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation

Zed 1.0

Why AI companies want you to be afraid of them

Tangled – We need a federation of forges

Soft launch of open-source code platform for government

Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained

Ghostty is leaving GitHub

FastCGI: 30 Years Old and Still the Better Protocol for Reverse Proxies

Mistral Medium 3.5

An open-source stethoscope that costs between $2.5 and $5 to produce

Online age verification is the hill to die on

Making AI chatbots friendly leads to mistakes and support of conspiracy theories

Show HN: A new benchmark for testing LLMs for deterministic outputs

Improving ICU handovers by learning from Scuderia Ferrari F1 team

GitHub – DOS 1.0: Transcription of Tim Paterson's DOS Printouts

Letting AI play my game – building an agentic test harness to help play-testing

Stardex Is Hiring a Founding Customer Success Lead

Bugs Rust won't catch

Court Rules 2nd Amendment Covers Firearms Parts Good News Those Who Build Guns

Show HN: Adblock-rust Manager – Firefox extension to enable the Brave ad blocker

Before GitHub

How ChatGPT serves ads

Shrdlu

Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

HardenedBSD Is Now Officially on Radicle

Show HN: Rip.so – a graveyard for dead internet things

Withnail's Coat and I

Coffee with a splash of physics: how to make the most out of your brew

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation

Mistral Medium 3.5

Comments