Mercury: Ultra-Fast Language Models Based on Diffusion

222•PaulHoule•4h ago

Comments

mynti•4h ago

is there a kind of nanogpt for diffusion language models? i would love to understand them better

nvtop•3h ago

This video has a live coding part which implements a masked diffusion generation process: https://www.youtube.com/watch?v=oot4O9wMohw

chc4•3h ago

Using the free playground link, and it is in fact extremely fast. The "diffusion mode" toggle is also pretty neat as a visualization, although I'm not sure how accurate it is - it renders as line noise and then refines, while in reality presumably those are tokens from an imprecise vector in some state space that then become more precise until it's only a definite word, right?

PaulHoule•3h ago

It's insane how fast that thing is!

maelito•3h ago

Link : https://chat.inceptionlabs.ai/

icyfox•44m ago

Some text diffusion models use continuous latent space but they historically haven't done that well. Most the ones we're seeing now typically are trained to predict actual token output that's fed forward into the next time series. The diffusion property comes from their ability to modify previous timesteps to converge on the final output.

I have an explanation about one of these recent architectures that seems similar to what Mercury is doing under the hood here: https://pierce.dev/notes/how-text-diffusion-works/

luckystarr•3h ago

I'm kind of impressed by the speed of it. I told it to write a MQTT topic pattern matcher based on a Trie and it spat out something reasonable on first try. It hat a few compilation issues though, but fair enough.

earthnail•3h ago

Tried it on some coding questions and it hallucinated a lot, but the appearance (i.e. if you’re not a domain expert) of the output is impressive.

TechDebtDevin•3h ago

Oddly fast, almost instantaneous.

mike_hearn•3h ago

A good chance to bring up something I've been flagging to colleagues for a while now: with LLM agents we are very quickly going to become even more CPU bottlenecked on testing performance than today, and every team I know of today was bottlenecked on CI speed even before LLMs. There's no point having an agent that can write code 100x faster than a human if every change takes an hour to test.

Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green. Many runs end up bottlenecked on I/O or availability of workers, and so changes can sit in queues for hours, or they flake out and everything has to start again.

As they get better coding agents are going to be assigned simple tickets that they turn into green PRs, with the model reacting to test failures and fixing them as they go. This will make the CI bottleneck even worse.

It feels like there's a lot of low hanging fruit in most project's testing setups, but for some reason I've seen nearly no progress here for years. It feels like we kinda collectively got used to the idea that CI services are slow and expensive, then stopped trying to improve things. If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

Mercury is crazy fast and in a few quick tests I did, created good and correct code. How will we make test execution keep up with it?

TechDebtDevin•3h ago

LLM making a quick edit, <100 lines... Sure. Asking an LLM to rubber-duck your code, sure. But integrating an LLM into your CI is going to end up costing you 100s of hours productivity on any large project. That or spend half the time you should be spending learning to write your own code, dialing down context sizing and prompt accuracy.

I really really don't understand the hubris around llm tooling, and don't see it catching on outside of personal projects and small web apps. These things don't handle complex systems well at all, you would have to put a gun in my mouth to let one of these things work on an important repo of mine without any supervision... And if I'm supervising the LLM I might as well do it myself, because I'm going to end up redoing 50% of its work anyways..

mike_hearn•3h ago

I've used Claude with a large, mature codebase and it did fine. Not for every possible task, but for many.

Probably, Mercury isn't as good at coding as Claude is. But even if it's not, there's lots of small tasks that LLMs can do without needing senior engineer level skills. Adding test coverage, fixing low priority bugs, adding nice animations to the UI etc. Stuff that maybe isn't critical so if a PR turns up and it's DOA you just close it, but which otherwise works.

Note that many projects already use this approach with bots like Renovate. Such bots also consume a ton of CI time, but it's generally worth it.

flir•2h ago

Don't want to put words in the parent commenter's mouth, but I think the key word is "unsupervised". Claude doesn't know what it doesn't know, and will keep going round the loop until the tests go green, or until the heat death of the universe.

mike_hearn•2h ago

Yes, but you can just impose timeouts to solve that. If it's unsupervised the only cost is computation.

airstrike•2h ago

IMHO LLMs are notoriously bad at test coverage. They usually hard code a value to have the test pass, since they lack the reasoning required to understand why the test exists or the concept of assertion, really

wrs•1h ago

I don’t know, Claude is very good at writing that utterly useless kind of unit test where every dependency is mocked out and the test is just the inverted dual of the original code. 100% coverage, nothing tested.

DSingularity•2h ago

He is simply observing that if PR numbers and launch rates increase dramatically CI cost will become untenable.

kraftman•2h ago

I keep seeing this argument over and over again, and I have to wonder, at what point do you accept that maybe LLM's are useful? Like how many people need to say that they find it makes them more productive before you'll shift your perspective?

candiddevmike•2h ago

People say they are more productive using visual basic, but that will never shift my perspective on it.

Code is a liability. Code you didn't write is a ticking time bomb.

psychoslave•2h ago

That's a tool, and it depends what you need to do. If it fits someone need and make them more productive, or even simply enjoy more the activity, good.

Just because two people are fixing something on the whole doesn't mean the same tool will hold fine. Gum, pushpin, nail, screw,bolts?

The parent thread did mention they use LLM successfully in small side project.

dragonwriter•2h ago

> I keep seeing this argument over and over again, and I have to wonder, at what point do you accept that maybe LLM's are useful?

The post you are responding to literally acknowledges that LLMs are useful in certain roles in coding in the first sentence.

> Like how many people need to say that they find it makes them more productive before you'll shift your perspective?

Argumentum ad populum is not a good way of establishing fact claims beyond the fact of a belief being popular.

ninetyninenine•5m ago

They say it’s only effective for personal projects but there’s literally evidence of LLMs being used for what he says can’t be used. Actual physical evidence.

It’s self delusion. And also the pace of AI is so fast he may not be aware of how fast LLMs are integrating into our coding environments. Like 1 year ago what he said could be somewhat true but right now what he said is clearly not true at all.

blitzar•1h ago

Do the opposite - integrate your CI into your LLM.

Make it run tests after it changes your code and either confirm it didnt break anything or go back and try again.

piva00•2h ago

I haven't worked in places using off-the-shelf/SaaS CI in more than a decade so I feel my experience has been quite the opposite from yours.

We always worked hard to make the CI/CD pipeline as fast as possible. I personally worked on those kind of projects at 2 different employers as a SRE: a smaller 300-people shop which I was responsible for all their infra needs (CI/CD, live deployments, migrated later to k8s when it became somewhat stable, at least enough for the workloads we ran, but still in its beta-days), then at a different employer some 5k+ strong working on improving the CI/CD setup which used Jenkins as a backend but we developed a completely different shim on top for developer experience while also working on a bespoke worker scheduler/runner.

I haven't experienced a CI/CD setup that takes longer than 10 minutes to run in many, many years, got quite surprised reading your comment and feeling spoiled I haven't felt this pain for more than a decade, didn't really expect it was still an issue.

mike_hearn•2h ago

I think the prevalence of teams having a "CI guy" who often is developing custom glue, is a sign that CI is still not really working as well as it should given the age of the tech.

I've done a lot of work on systems software over the years so there's often tests that are very I/O or computation heavy, lots of cryptography, or compilation, things like that. But probably there are places doing just ordinary CRUD web app development where there's Playwright tests or similar that are quite slow.

A lot of the problems are cultural. CI times are a commons, so it can end in tragedy. If everyone is responsible for CI times then nobody is. Eventually management gets sick of pouring money into it and devs learn to juggle stacks of PRs on top of each other. Sometimes you get a lot of pushback on attempts to optimize CI because some devs will really scream about any optimization that might potentially go wrong (e.g. depending on your build system cache), even if caching nothing causes an explosion in CI costs. Not their money, after all.

kccqzy•2h ago

> Maybe I've just got unlucky in the past, but in most projects I worked on a lot of developer time was wasted on waiting for PRs to go green.

I don't understand this. Developer time is so much more expensive than machine time. Do companies not just double their CI workers after hearing people complain? It's just a throw-more-resources problem. When I was at Google, it was somewhat common for me to debug non-deterministic bugs such as a missing synchronization or fence causing flakiness; and it was common to just launch 10000 copies of the same test on 10000 machines to find perhaps a single digit number of failures. My current employer has a clunkier implementation of the same thing (no UI), but there's also a single command to launch 1000 test workers to run all tests from your own checkout. The goal is to finish testing a 1M loc codebase in no more than five minutes so that you get quick feedback on your changes.

> make builds fully hermetic (so no inter-run caching)

These are orthogonal. You want maximum deterministic CI steps so that you make builds fully hermetic and cache every single thing.

mark_undoio•2h ago

> I don't understand this. Developer time is so much more expensive than machine time. Do companies not just double their CI workers after hearing people complain? It's just a throw-more-resources problem.

I'd personally agree. But this sounds like the kind of thing that, at many companies, could be a real challenge.

Ultimately, you can measure dollars spent on CI workers. It's much harder and less direct to quantify the cost of not having them (until, for instance, people start taking shortcuts with testing and a regression escapes to production).

That kind of asymmetry tends, unless somebody has a strong overriding vision of where the value really comes from, to result in penny pinching on the wrong things.

mike_hearn•2h ago

It's more than that. You can measure salaries too, measurement isn't the issue.

The problem is that if you let people spend the companies money without any checks or balances they'll just blow through unlimited amounts of it. That's why companies always have lots of procedures and policies around expense reporting. There's no upper limit to how much money developers will spend on cloud hardware given the chance, as the example above of casually running a test 10,000 times in parallel demonstrates nicely.

CI doesn't require you to fill out an expense report every time you run a PR thank goodness, but there still has to be a way to limit financial liability. Usually companies do start out by doubling cluster sizes a few times, but each time it buys a few months and then the complaints return. After a few rounds of this managers realize that demand is unlimited and start pushing back on always increasing the budget. Devs get annoyed and spend an afternoon on optimizations, suddenly times are good again.

The meme on HN is that developer time is always more expensive than machine time, but I've been on both sides of this and seen how the budgets work out. It's often not true, especially if you use clouds like Azure which are overloaded and expensive, or have plenty of junior devs, and/or teams outside the US where salaries are lower. There's often a lot of low hanging fruit in test times so it can make sense to optimize, even so, huge waste is still the order of the day.

mike_hearn•2h ago

I was also at Google for years. Places like that are not even close to representative. They can afford to just-throw-more-resources, they get bulk discounts on hardware and they pay top dollar for engineers.

In more common scenarios that represent 95% of the software industry CI budgets are fixed, clusters are sized to be busy most of the time, and you cannot simply launch 10,000 copies of the same test on 10,000 machines. And even despite that these CI clusters can easily burn through the equivalent of several SWE salaries.

> These are orthogonal. You want maximum deterministic CI steps so that you make builds fully hermetic and cache every single thing.

Again, that's how companies like Google do it. In normal companies, build caching isn't always perfectly reliable, and if CI runs suffer flakes due to caching then eventually some engineer is gonna get mad and convince someone else to turn the caching off. Blaze goes to extreme lengths to ensure this doesn't happen, and Google spends extreme sums of money on helping it do that (e.g. porting third party libraries to use Blaze instead of their own build system).

In companies without money printing machines, they sacrifice caching to get determinism and everything ends up slow.

PaulHoule•2h ago

Most of my experience writing concurrent/parallel code in (mainly) Java has been rewriting half-baked stuff that would need a lot of testing with straightforward reliable and reasonably performant code that uses sound and easy-to-use primitives such as Executors (watch out for teardown though), database transactions, atomic database operations, etc. Drink the Kool Aid and mess around with synchronized or actors or Streams or something and you're looking at a world of hurt.

I've written a limited number of systems that needed tests that probe for race conditions by doing something like having 3000 threads run a random workload for 40 seconds. I'm proud of that "SuperHammer" test on a certain level but boy did I hate having to run it with every build.

mystified5016•2h ago

IME it's less of a "throw more resources" problem and more of a "stop using resources in literally the worst way possible"

CI caching is, apparently, extremely difficult. Why spend a couple of hours learning about your CI caches when you can just download and build the same pinned static library a billion times? The server you're downloading from is (of course) someone else's problem and you don't care about wasting their resources either. The power you're burning by running CI for there hours instead of one is also someone else's problem. Compute time? Someone else's problem. Cloud costs? You bet it's someone else's problem.

Sure, some things you don't want to cache. I always do a 100% clean build when cutting a release or merging to master. But for intermediate commits on a feature branch? Literally no reason not to cache builds the exact same way you do on your local machine.

ronbenton•2h ago

>Do companies not just double their CI workers after hearing people complain?

They do not.

I don't know if it's a matter of justifying management levels, but these discussions are often drawn out and belabored in my experience. By the time you get approval, or even worse, rejected, for asking for more compute (or whatever the ask is), you've spent way more money on the human resource time than you would ever spend on the requested resources.

kccqzy•2h ago

I have never once been refused by a manager or director when I am explicitly asking for cost approval. The only kind of long and drawn out discussions are unproductive technical decision making. Example: the ask of "let's spend an extra $50,000 worth of compute on CI" is quickly approved but "let's locate the newly approved CI resource to a different data center so that we have CI in multiple DCs" solicits debates that can last weeks.

mysteria•1h ago

This is exactly my experience with asking for more compute at work. We have to prepare loads of written justification, come up with alternatives or optimizations (which we already know won't work), etc. and in the end we choose the slow compute and reduced productivity over the bureaucracy.

And when we manage to make a proper request it ends up being rejected anyways as many other teams are asking for the same thing and "the company has limited resources". Duh.

IshKebab•2h ago

Developer time is more expensive than machine time, but at most companies it isn't 10000x more expensive. Google is likely an exception because it pays extremely well and has access to very cheap machines.

Even then, there are other factors:

* You might need commercial licenses. It may be very cheap to run open source code 10000x, but guess how much 10000 Questa licenses cost.

* Moores law is dead Amdahl's law very much isn't. Not everything is embarrassingly parallel.

* Some people care about the environment. I worked at a company that spent 200 CPU hours on every single PR (even to fix typos; I failed to convince them they were insane for not using Bazel or similar). That's a not insignificant amount of CO2.

underdeserver•55m ago

That's solvable with modern cloud offerings - Provision spot instances for a few minutes and shut them down afterwards. Let the cloud provider deal with demand balancing.

I think the real issue is that developers waiting for PRs to go green are taking a coffee break between tasks, not sitting idly getting annoyed. If that's the case you're cutting into rest time and won't get much value out of optimizing this.

physicsguy•49m ago

Not really, in most small companies/departments, £100k a month is considered a painful cloud bill and adding more EC2 instances to provide cloud runners can add 10% to that easily.

wat10000•49m ago

Many companies are strangely reluctant to spend money on hardware for developers. They might refuse to spend $1,000 on a better laptop to be used for the next three years by an employee, whose time costs them that much money in a single afternoon.

mathiaspoint•2h ago

Good God I hate CI. Just let me run the build automation myself dammit! If you're worried about reproducibility make it reproducible and hash the artifacts, make people include the hash in the PR comment if you want to enforce it.

The amount of time people waste futzing around in eg Groovy is INSANE and I'm honestly inclined to reject job offers from companies that have any serious CI code at this point.

droopyEyelids•2h ago

In most companies the CI/Dev Tools team is a career dead end. There is no possibility to show a business impact, it's just a money pit that leadership can't/won't understand (and if they do start to understand it, then it becomes _their_ money pit, which is a career dead end for them) So no one who has their head on straight wants to spend time improving it.

And you can't even really say it's a short sighted attitude. It definitely is from a developer's perspective, and maybe it is for the company if dev time is what decides the success of the business overall.

yieldcrv•1h ago

then kill the CI/CD

these redundant processes are for human interoperability

blitzar•1h ago

Yet, now I have added a LLM workflow to my coding the value of my old and mostly useless workflows is now 10x'd.

Git checkpoints, code linting and my naive suite of unit and integration tests are now crucial to my LLM not wasting too much time generating total garbage.

vjerancrnjak•1h ago

It’s because people don’t know how to write tests. All of the “don’t do N select queries in a for loop” comments made in PRs are completely ignored in tests.

Each test can output many db queries. And then you create multiple cases.

People don’t even know how to write code that just deals with N things at a time.

I am confident that tests run slowly because the code that is tested completely sucks and is not written for batch mode.

Ignoring batch mode, tests are most of the time written in a a way where test cases are run sequentially. Yet attempts to run them concurrently result in flaky tests, because the way you write them and the way you design interfaces does not allow concurrent execution at all.

Another comment, code done by the best AI model still sucks. Anything simple, like a music player with a library of 10000 songs is something it can’t do. First attempt will be horrible. No understanding of concurrent metadata parsing, lists showing 10000 songs at once in UI being slow etc.

So AI is just another excuse for people writing horrible code and horrible tests. If it’s so smart , try to speed up your CI with it.

rapind•54m ago

> This will make the CI bottleneck even worse.

I agree. I think there are potentially multiple solutions to this since there are multiple bottlenecks. The most obvious is probably network overhead when talking to a database. Another might be storage overhead if storage is being used.

Frankly another one is language. I suspect type-safe, compiled, functional languages are going to see some big advantages here over dynamic interpreted languages. I think this is the sweet spot that grants you a ton of performance over dynamic languages, gives you more confidence in the models changes, and requires less testing.

Faster turn-around, even when you're leaning heavily on AI, is a competitive advantage IMO.

rafaelmn•36m ago

> If anything CI got a lot slower over time as people tried to make builds fully hermetic (so no inter-run caching), and move them from on-prem dedicated hardware to expensive cloud VMs with slow IO, which haven't got much faster over time.

I am guesstimating (based on previous experience self-hosting the runner for MacOS builds) that the project I am working on could get like 2-5x pipeline performance at 1/2 cost just by using self-hosted runners on bare metal rented machines like Hetzner. Maybe I am naive, and I am not the person that would be responsible for it - but having a few bare metal machines you can use in the off hours to run regression tests, for less than you are paying the existing CI runner just for build, that speed up everything massively seems like a pure win for relatively low effort. Like sure everyone already has stuff on their plate and would rather pay external service to do it - but TBH once you have this kind of compute handy you will find uses anyway and just doing things efficiently. And knowing how to deal with bare metal/utilize this kind of compute sounds generally useful skill - but I rarely encounter people enthusiastic about making this kind of move. Its usually - hey lets move to this other service that has slightly cheaper instances and a proprietary caching layer so that we can get locked into their CI crap.

Its not like these services have 0 downtime/bug free/do not require integration effort - I just don't see why going bare metal is always such a taboo topic even for simple stuff like builds.

TheDudeMan•26m ago

This is because coders didn't spend enough time making their tests efficient. Maybe LLM coding agents can help with that.

grogenaut•10m ago

Before cars people spent little on petroleum products or motor oil or gasoline or mechanics. Now they do. That's how systems work. You wanna go faster well you need better roads, traffic lights, on ramps, etc. you're still going faster.

Use AI to solve the IP bottlenecks or build more features that ear more revenue that buy more ci boxes. Same as if you added 10 devs which you are with AI so why wouldn't some of the dev support costs go up.

Are you not in a place where you can make an efficiency argument to get more ci or optimize? What's a ci box cost?

fastball•3h ago

ICYMI, DeepMind also has a Gemini model that is diffusion-based[1]. I've tested it a bit and while (like with this model) the speed is indeed impressive, the quality of responses was much worse than other Gemini models in my testing.

[1] https://deepmind.google/models/gemini-diffusion/

thelastbender12•3h ago

The speed here is super impressive! I am curious - are there any qualitative ways in which modeling text using diffusion differs from that using autoregressive models? The kind of problems it works better on, creativity, and similar.

orbital-decay•2h ago

One works in the coarse-to-fine direction, another works start-to-end. Which means different directionality biases, at least. Difference in speed, generalization, etc. is less clear and needs to be proven in practice, as fundamentally they are closer than it seems. Diffusion models have some well-studied shortcuts to trade speed for quality, but nothing stops you from implementing the same for the other type.

JimDabell•3h ago

Pricing:

US$0.000001 per output token ($1/M tokens)

US$0.00000025 per input token ($0.25/M tokens)

https://platform.inceptionlabs.ai/docs#models

asaddhamani•2h ago

The pricing is a little on the higher side. Working on a performance-sensitive application, I tried Mercury and Groq (Llama 3.1 8b, Llama 4 Scout) and the performance was neck-and-neck but the pricing was way better for Groq.

But I'll be following diffusion models closely, and I hope we get some good open source ones soon. Excited about their potential.

empiko•3h ago

I strongly believe that this will be a really important technique in the near future. The cost saving this might create is mouth watering.

NitpickLawyer•2h ago

> I strongly believe that this will be a really important technique in the near future.

I share the same belief, but regardless of cost. What excites me is the ability to "go both ways", edit previous tokens after others have been generated, using other signals as "guided generation", and so on. Next token prediction works for "stories", but diffusion matches better with "coding flows" (i.e. going back and forth, add something, come back, import something, edit something, and so on).

It would also be very interesting to see how applying this at different "abstraction layers" would work. Say you have one layer working on ctags, one working on files, and one working on "functions". And they all "talk" to each other, passing context and "re-diffusing" their respective layers after each change. No idea where the data for this would come, maybe from IDEs?

baalimago•3h ago

I, for one, am willing to trade accuracy for speed. I'd rather have 10 iterations of poor replies which forces me to ask the right question than 1 reply which takes 10 times as long and _maybe_ is good, since it tries to reason about my poor question.

PaulHoule•2h ago

Personally I like asking coding agents a question and getting an answer back immediately. Systems like Junie that go off and research a bunch of irrelevant things than ask permission than do a lot more irrelevant research, ask more permission and such and then 15 minutes later give you a mountain of broken code are a waste of time if you ask me. (Even if you give permission in advance)

pmxi•2h ago

This is cool. I think faster models can unlock entirely new usage paradigms, like how faster search enables incremental search.

amelius•2h ago

Damn, that is fast. But it is faster than I can read, so hopefully they can use that speed and turn it into better quality of the output. Because otherwise, I honestly don't see the advantage, in practical terms, over existing LLMs. It's like having a TV with a 200Hz refresh rate, where 100Hz is just fine.

pmxi•2h ago

There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

amelius•1h ago

Sure, but I was talking about the chat interface, sorry if that was not clear.

Legend2440•1h ago

This lets you do more (potentially a lot more) reasoning steps and tool calls before answering.

irthomasthomas•2h ago

I've used mercury quite a bit in my commit message generator. I noticed it would always produce the exact same response if you ran it multiple times, and increasing temperature didn't affect it. To get some variability I added a $(uuidgen) to the prompt. Then I could run it again for a new response if I didn't like the first.

everlier•1h ago

Something like https://github.com/av/klmbr could also work

seydor•2h ago

I wonder if diffusion llms solve the hallucination problem more effectively. In the same way that image models learned to create less absurd images, dllms can perhaps learn to create sensical responses more predictably

awaymazdacx5•2h ago

Having token embeddings with diffusion models, for 16x16 transformer encoding. Image is tokenized before transformers compile it. If decomposed virtualization modulates according to a diffusion model.

storus•2h ago

Can Mercury use tools? I haven't seen it described anywhere. How about streaming with tools?

nashashmi•2h ago

I guess this makes specific language patterns cheaper and more artistic language patterns more expensive. This could be a good way to limit pirated and masqueraded materials submitted by students.

true_blue•2h ago

I tried the playground and got a strange response. I asked for a regex pattern, and the model gave itself a little game-plan, then it wrote the pattern and started to write tests for it. But it never stopped writing tests. It continued to write tests of increasing size until I guess it reached a context limit and the answer was canceled. Also, for each test it wrote, it added a comment about if the test should pass or fail, but after about the 30th test, it started giving the wrong answer for those too, saying that a test should fail when actually it should pass if the pattern is correct. And after about the 120th test, the tests started to not even make sense anymore. They were just nonsense characters until the answer got cut off.

The pattern it made was also wrong, but I think the first issue is more interesting.

fiatjaf•1h ago

This is too funny to be true.

skybrian•1h ago

Company blog post: https://www.inceptionlabs.ai/introducing-mercury-our-general...

News coverage from February: https://techcrunch.com/2025/02/26/inception-emerges-from-ste...

mtillman•1h ago

Ton of performance upside in most GPU adjacent code right now.

However, is this what arXiv is for? It seems more like marketing their links than research. Please correct me if I'm wrong/naive on this topic.

eden-u4•1h ago

No open model/weights?

gdiamos•46m ago

I think the LLM dev community is underestimating these models. E.g. there is no LLM inference framework that supports them today.

Yes the diffusion foundation models have higher cross entropy. But diffusion LLMs can also be post trained and aligned, which cuts the gap.

IMO, investing in post training and data is easier than forcing GPU vendors to invest in DRAM to handle large batch sizes and forcing users to figure out how to batch their requests by 100-1000x. It is also purely in the hands of LLM providers.

Mercury: Ultra-Fast Language Models Based on Diffusion

Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec

I used o3 to profile myself from my saved Pocket links

Adding a feature because ChatGPT incorrectly thinks it exists

Show HN: Unlearning Comparator, a visual tool to compare machine unlearning

Dyson, techno-centric design and social consumption

When Figma starts designing us

François Chollet: The Arc Prize and How We Get to AGI [video]

Show HN: Ossia score – a sequencer for audio-visual artists

CPU-X: CPU-Z for Linux

tinymcp: Let LLMs control embedded devices via the Model Context Protocol

The Era of Exploration

Solving Wordle with uv's dependency resolver

Bitchat – A decentralized messaging app that works over Bluetooth mesh networks

Lightfastness Testing of Colored Pencils

So you wanna build an aging company

Hymn to Babylon, missing for a millennium, has been discovered

Tuning the Prusa Core One

SUS Lang: The SUS Hardware Description Language

AI Cameras Change Driver Behavior at Intersections

Show HN: NYC Subway Simulator and Route Designer

Neanderthals operated prehistoric “fat factory” on German lakeshore

Cpparinfer: A C++23 implementation of the parinfer algorithm

A non-anthropomorphized view of LLMs

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

Show HN: Piano Trainer – Learn piano scales, chords and more using MIDI

Show HN: Integrated System for Enhancing VIC Output

Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

Why English doesn't use accents

Intel's Lion Cove P-Core and Gaming Workloads

Mercury: Ultra-Fast Language Models Based on Diffusion

Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec

I used o3 to profile myself from my saved Pocket links

Adding a feature because ChatGPT incorrectly thinks it exists

Show HN: Unlearning Comparator, a visual tool to compare machine unlearning

Dyson, techno-centric design and social consumption

When Figma starts designing us

François Chollet: The Arc Prize and How We Get to AGI [video]

Show HN: Ossia score – a sequencer for audio-visual artists

CPU-X: CPU-Z for Linux

tinymcp: Let LLMs control embedded devices via the Model Context Protocol

The Era of Exploration

Solving Wordle with uv's dependency resolver

Bitchat – A decentralized messaging app that works over Bluetooth mesh networks

Lightfastness Testing of Colored Pencils

So you wanna build an aging company

Hymn to Babylon, missing for a millennium, has been discovered

Tuning the Prusa Core One

SUS Lang: The SUS Hardware Description Language

AI Cameras Change Driver Behavior at Intersections

Show HN: NYC Subway Simulator and Route Designer

Neanderthals operated prehistoric “fat factory” on German lakeshore

Cpparinfer: A C++23 implementation of the parinfer algorithm

A non-anthropomorphized view of LLMs

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

Show HN: Piano Trainer – Learn piano scales, chords and more using MIDI

Show HN: Integrated System for Enhancing VIC Output

Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

Why English doesn't use accents

Intel's Lion Cove P-Core and Gaming Workloads

Mercury: Ultra-Fast Language Models Based on Diffusion

Comments