Mercury 2: The fastest reasoning LLM, powered by diffusion

https://www.inceptionlabs.ai/blog/introducing-mercury-2

16•fittingopposite•1h ago

Comments

dvt•29m ago

What excites me most about these new 4figure/second token models is that you can essentially do multi-shot prompting (+ nudging) and the user doesn't even feel it, potentially fixing some of the weird hallucinatory/non-deterministic behavior we sometimes end up with.

tl2do•23m ago

Genuine question: what kinds of workloads benefit most from this speed? In my coding use, I still hit limitations even with stronger models, so I'm interested in where a much faster model changes the outcome rather than just reducing latency.

irthomasthomas•18m ago

multi-model arbitration, synthesis, parallel reasoning etc. Judging large models with small models is quite effective.

layoric•10m ago

I think it would assist in exploiting exploring multiple solution spaces in parallel, and can see with the right user in the loop + tools like compilers, static analysis, tests, etc wrapped harness, be able to iterate very quickly on multiple solutions. An example might be, "I need to optimize this SQL query" pointed to a locally running postgres. Multiple changes could be tested, combined, and explain plan to validate performance vs a test for correct results. Then only valid solutions could be presented to developer for review. I don't personally care about the models 'opinion' or recommendations, using them for architectural choices IMO is a flawed use as a coding tool.

It doesn't change the fact that the most important thing is verification/validation of their output either from tools, developer reviewing/making decisions. But even if don't want that approach, diffusion models are just a lot more efficient it seems. I'm interested to see if they are just a better match common developer tasks to assist with validation/verification systems, not just writing (likely wrong) code faster.

cjbarber•10m ago

It could be interesting to do the metric of intelligence per second.

ie intelligence per token, and then tokens per second

My current feel is that if Sonnet 4.6 was 5x faster than Opus 4.6, I'd be primarily using Sonnet 4.6. But that wasn't true for me with prior model generations, in those generations the Sonnet class models didn't feel good enough compared to the Opus class models. And it might shift again when I'm doing things that feel more intelligence bottlenecked.

But fast responses have an advantage of their own, they give you faster iteration. Kind of like how I used to like OpenAI Deep Research, but then switched to o3-thinking with web search enabled after that came out because it was 80% of the thoroughness with 20% of the time, which tended to be better overall.

What keeps Japan's 1k-year-old companies alive?

Ask HN: Built an algorithmic forensic accounting tool

Stress testing Claude's language skills

Ask HN: How do you find a cofounder for a game?

Democracy in 2025: on rising authoritarianism in the United States

The Price of American Authoritarianism What Can Reverse Democratic Decline?

The Internet, nobody knows you're a dog (1993)

Go-Size-Analyzer

Show HN: AI Olympics – Claude vs. GPT-4 vs. Gemini in live browser competitions

The Tail That Wags the Company

Secure, kernel-enforced sandbox CLI and SDKs for AI agents

Ask HN: Built a real functional game from scratch where do I find investors?

Offline Intelligence: Founding Software Engineer (Equity Only)

US Military leaders meet with Anthropic to argue against Claude safeguards

AI's Uneven Impact

The Edge of Mathematics

One Million Checkboxes on SpacetimeDB

The Software Upgrade in Chinese Civic Behaviour

Pentagon Gives Anthropic an Ultimatum

The Physical Intelligence Layer

Native KV Cache Offloading to Any Filesystem with LLM-D

Car Shopping Is Cooked

Mercury 2: Best-in-class speed-optimized intelligence at 1,200 tok/SEC

I Built an "AI for Shell Commands" CLI (So I Could Stop Asking ChatGPT)

A Meta AI security researcher said an OpenClaw agent ran amok on her inbox

Argus: Automated Discovery of Test Oracles for DBMSs Using LLMs

App Fair Project: free and open-source app store for iPhone and Android

Habits to make sure you don't go insane

Agents.md file isn't the problem. Your lack of Evals is

A Decade of Docker Containers