Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec

217•bhaktatejas922•7mo ago

Hey HN, I’m Tejas at Morph. We’ve built a blazing-fast model for applying AI-generated code edits directly into your files at 4,500+ tokens/sec. No more slow full-file rewrites or brittle search-and-replace hacks.

Here's a demo video: https://www.youtube.com/watch?v=LdT8epGHJPk.

Why? AI spits out code that can’t reliably be inserted into existing code. Full file rewrites, brittle search-and-replace hacks are too slow, expensive, or error-prone.

Morph's approach:

- Your agent outputs edits “lazily”, referencing unmodified lines in the existing file (ex: // ...existing code...)

- Morph instantly applies these edits to a file using our Fast Apply model + speculative decoding against the original file, making AI patches fast, reliable, and production-ready.

This approach was pioneered by Cursor last year, but their models aren’t available as APIs—so we built Morph for developers everywhere (with a large free tier!)

Live demo (no signup): https://morphllm.com/dashboard and docs: https://docs.morphllm.com/quickstart

We have 2 Fast Apply models: morph-v3-fast - 4500+ tok/sec, and morph-v3-large - 2500+ tok/sec. These models power Fast Apply at create.xyz, databutton, continue.dev, and more!

We also provide retrieval models for embedding + reranking. Next Up: Inline Edit Model (Cmd-K): Extremely fast inline edits - keep dev flow state; and Morph Tab API: Our Next Edit Prediction model guesses your next code edit + action with sub-500ms latency. It's currently in private beta, but you can request early access here: https://morphllm.com/tab

Hot takes:

1) Raw inference speed matters more than incremental accuracy gains for dev UX—agree or disagree?

2) Full-file rewrites by frontier models are legacy—Fast Apply edits win on speed, cost, reliability.

3) As benchmarks on narrow tasks saturate to 99%+, complexity is shifting from single frontier models to specialized inference-optimized models. As frontier models move upmarket, they'll leave simple tasks behind, and they'll be used to do tasks only frontier models can do

We’d love to hear your ideas and experiences with coding agents!

Comments

handfuloflight•7mo ago

Is there anyway to bring this into Claude Code?

bhaktatejas922•7mo ago

There might be a way to using their new hooks commands, but out of the box, not yet. email us if you want to make it happen!

https://docs.anthropic.com/en/docs/claude-code/hooks

booli•7mo ago

If this proves the way forward, it will be in Claude Code soon enough natively

koakuma-chan•7mo ago

There is already https://www.relace.ai/, albeit not as blazing fast at mere 4300 tok/s

bhaktatejas922•7mo ago

Perhaps. Boris from the Claude Code team shares a bit about their view here https://www.youtube.com/watch?v=Yf_1w00qIKc

My read is that despite Claude moving upmarket in what it can do, they are keen on clinging to all the (token heavy) tasks they're leaving behind

halfjoking•7mo ago

Make an MCP server, and turn off the Write|Edit|MultiEdit tools?

Actually - that's what this company should do. It should be an MCP server so anyone could plug it into any agent with a url and an API key.

bhaktatejas922•7mo ago

great idea! we'll have one up soon :)

SatvikBeri•7mo ago

Can you set up a mailing list or something where we can keep up with updates? I'm interested in trying this as soon as it works with Claude Code.

Edit: I'd be particularly interested if there's a way to run a sort of comparison mode for a while, so I can get a sense of how much accuracy I'm losing, if any. Even at the cost of initial performance.

bhaktatejas922•6mo ago

mcp is out! https://morphllm.com/mcp

amelius•7mo ago

Can't you ask these LLMs to simply output a patch file?

https://man7.org/linux/man-pages/man1/patch.1.html

bhaktatejas922•7mo ago

you can - but they dont work reliably in practice. Common issues include search match fails, missing commas in replaced items (model doesnt have surround context while replacing), and a few other error cases. This issues are much worse for scattered edits across a file from real world queries (ex: make this page look nicer). Patches tend to work fine for single line or extremely focused edits though - Cursor uses s&r/patches for single line edits:

https://github.com/x1xhlol/system-prompts-and-models-of-ai-t...

treyd•7mo ago

I wonder if it'd be feasible to have a much smaller model that could go in and correct these meshing issues that require simpler reasoning?

bhaktatejas922•7mo ago

hm maybe but correction/issue detection is a much harder task for models. If you pipe back the errors in it could work, but personally still see Fast Apply as the better approach

deepdarkforest•7mo ago

> 1) Raw inference speed matters more than incremental accuracy gains for dev UX—agree or disagree?

I know you are trying to generate some controversy/visibility, but i think if we are being transparent here, you know this is wrong. People prefer using larger (or reasoning) models, with much bigger diff in tok/sec just for quality in coding, it comes first. Even if i have a big edit to apply, like 5k tokens, 200-300ms of difference in edit time are nothing. Edit speed is definitely not a bottleneck for dev UX, quality is. A dev who wants to save 200ms every code change over quality is someone who well, i cannot relate. If im using 1-2 agents in parallel, most of the time the edits are already applied while im reviewing code from the other agents. But again maybe that's just me.

Speaking of quality, how do you measure it? Do you have any benchmarks? How big is the difference in error rate between the fast and large model?

bigyabai•7mo ago

The marketing language seems to suggest they're insecure over quality and want to promote quantity. But I'm in the same boat as you - I would happily take 10 tok/sec of a correct answer instead of wasting an hour curating 4500 tok/sec throwaway answers. Benchmark performance matters 100x more than your latency.

If these "hot takes" extend into Morph's own development philosophy, then I can be glad to not be a user.

bhaktatejas922•7mo ago

There's no amount of error rate that's acceptable to us - edits should always be correct. We've just found anecdotally the saving users time is just provably also very important for churn, retention and keeping developer flow state, right after accuracy.

bigyabai•7mo ago

Then why are you using a custom model instead of an industry-leading option?

I don't mean to be rude, but I can't imagine you're selling a product on-par with Claude 3.7. Some level of performance tradeoff has to be acceptable if you prioritize latency this hard.

bhaktatejas922•7mo ago

We're not - our model doesn't actually think up the code changes. Claude-4 or Gemini still writes the code, we're just the engine that merges it into the original file.

Our whole thesis is that Claude and Gemini are extremely good at reasoning/coding - so you should let them do that, and pass it to Morph Fast Apply to merge changes in.

johnfn•7mo ago

Anyone can get 10 tok/sec - just tell the model to output the entire file with changes, rather than just the delta.

Whatever LLM you're using will have a baseline error rate a lot higher than 2%, so you're going to be reviewing all the code it outputs regardless.

bhaktatejas922•7mo ago

yeah even claude is well over 11% error rates with search and replace

IanCal•7mo ago

This is a code editing model. 10 tokens per second editing may as well not exist for any interactive use case.

bhaktatejas922•7mo ago

I think it depends - the actual thing to measure it to keep a developer in flow state. Many errors as well as latency break this. To be brief yes, accuracy comes first.

Quality is measured 2 main ways:

1) End-to-end: User query -> to task resolution. These are aider style benchmarks answering the question of actual task completion

2) Apply Quality: Syntax correctness, character diff, etc..

The error rate for large vs fast is around 2%. If you're doing code edits that are extremely complex or on obscure languages - large is the better option. There's also an auto option to route to the model we think is best for a task

deepdarkforest•7mo ago

Glad to hear quality comes first! Then I assume you have some public benchmarks like the ones you mention that are reproducible? I could only find this graph https://docs.morphllm.com/guides/apply but there is no mention of what it refers to, what data it used etc.

candiddevmike•7mo ago

I don't believe anyone can be in some kind of "flow state" while waiting on LLM responses. I think it's funny that we complained for years about C and others being slow to compile and now folks are fine waiting seconds++ everytime they want to change something.

bhaktatejas922•7mo ago

how so? Is your view that flow state at all isnt a thing, or just with using LLMs?

candiddevmike•7mo ago

Flow state is 100% a thing, it's just impossible with LLMs (at least, for me). I can't be blocked waiting on things during a flow state or my mind starts wondering to other places.

ada1981•7mo ago

I've had the opposite experience.

bhaktatejas922•7mo ago

same

bhaktatejas922•7mo ago

Fast Apply definitely helps with keeping flow state and is a large part of Cursor's success

Personally I work on multiple repos at a time to solve for this

0x457•7mo ago

I do it like simultaneous exhibition in chess:

- Multiple repos or independent changes in monorepo

- First round of changes idgaf about anything beyond public interface and unit tests

   - I review public interface and make changes if needed
  
   - I review unit tests it wrote to see that at least from the outside it looks alright.

 - here I either:
   
   - make more unit tests (features, edge cases and make it write code for it)

   - polish what it generate

bhaktatejas922•7mo ago

sounds like flow state to me

0x457•7mo ago

oh it's fore sure is. But I use amazon q almost exclusively. One thing that gets me out of this state: when I have to do the math on "should I just do it myself" vs "keep refining prompt/context until this thing finally gets it right".

bhaktatejas922•7mo ago

so frustrating how slow edits are in Q dev

0x457•7mo ago

Sometimes it splits edits to a single file into way to many fs_write(s) and often get stuck not being able to apply edits. It also so conservative with using your machine resources: kept trying to run test suit with a single worker, like come on, I paid for 32 cores, I will be using 32 cores.

klank•7mo ago

Time really is a flat circle. My software career started with me archaically flipping characters in a file I vaguely understood with long pauses waiting on magic compilers to give me my actual output.

Now it's dying in the same place. Thankfully I got to spend the brunt of my career working through the fun, intermediate years.

bhaktatejas922•7mo ago

I've never had so much fun coding in my life - you should definitely give it a try again!

klank•7mo ago

Thanks, I appreciate the good vibes.

However, it's kind of a trope for me at this point that people assume a negative opinion of using generative AI in the development process is due to a lack of experience using it.

handfuloflight•7mo ago

Well you could articulate what issues you have with it. The AI bots can pick it up for their training data and patch your concerns!

klank•7mo ago

> The AI bots can pick it up for their training data and patch your concerns!

This is borderline mystical AI speak to me. I know what you mean, and no, it doesn't work like that. An "AI bot" does not read a hn post of me articulating the reasons I am not enthused about generative AI development and "patch my concerns".

handfuloflight•7mo ago

Next time, I'll wrap it up in <sarcasm></sarcasm>.

Truly ironic the AI readily detected what I said as sarcastic. Without context. https://claude.ai/share/7d14287d-c066-4927-8942-8eb8dd8d7e7f

klank•7mo ago

Ah, thanks for the explination. I actually was confused a bit. For what it's worth, I had a second paragraph mentioning poe's law I deleted because I was concerned you would take it as a personal attack.

I should have left it in, knowing you were sarcastic I think you'd have appreciated me being confused about whether you were being satirical or not.

handfuloflight•7mo ago

That would have been a perfect opportunity for me to finally internalize Poe's law.

klank•7mo ago

Haha. You got me. I couldn't tell. I really couldn't tell.

bhaktatejas922•7mo ago

the claude link is hilarious, hahaha

simonw•7mo ago

Have you tried any of the ludicrously fast LLM demos yet?

https://inference.cerebras.ai/ and https://groq.com/ and https://deepmind.google/models/gemini-diffusion/ (waitlisted) are all 10 to 100x faster than regular models, which really does have a meaningful impact on how I interact with them because I don't have to disengage for 15+ seconds while I wait for a response.

I have video demos of a few of those: https://simonwillison.net/2024/Oct/25/llm-cerebras/ and https://simonwillison.net/2024/Oct/31/cerebras-coder/ and https://simonwillison.net/2025/May/21/gemini-diffusion/

sitkack•7mo ago

Flow state has been redefined now that we are all using Claude Code. If I can stay focused on tests, reviewing code, etc while CC is doing its thing, we are good. The kloc/s doesn't matter as much.

dingnuts•7mo ago

if LLMs are ever able to write the kind of code I write for work, I'm going to move to management. spending 100% of my time reviewing AI slop and writing tests is the opposite of what I want. I want to define behavior quickly and have AI do the boring parts; you're letting the computer do the fun bit and spending your entire life doing the shit part, and paying for the privilege.

fuck. THAT.

ipaddr•7mo ago

We have it backwards. Claude should be reviewing work and writing tests.

bpt3•7mo ago

No one sane would trust an LLM with that task, which is how we know it's not ready for production use yet.

bpt3•7mo ago

I might put this on a plaque.

I realize this sounds harsh, but I assume anyone who is pushing for developers to basically take on all the shit work of a tech lead stuck managing a bunch of incompetent developers is not an actual developer, and is either an incompetent one who hopes LLMs will cover for them or someone looking to reduce their dependency on developers.

Fortunately for me, I think we'll be well into the Matrix before my job can be done adequately by AI so I have the luxury of using it as a tool here and there where it makes sense rather than spending most of my time trying to avoid the damage a firehose of hallucinations will do to my codebase.

furyofantares•7mo ago

This is gonna sound like some chad hype shit, but I've tried just working 2 different projects simultaneously and have had some incredible extended flow sessions. It felt like the old days of multitabling poker.

I had tried doing it with different features in different worktrees in the same codebase but found flow much harder there.

Lately I am also just spending a lot more time reworking code manually to keep the code in good shape. Still getting a ton of value out of the LLM doing a lot of work, but not exactly spending lots of time just waiting for it because I am dropping back down to manual mode frequently.

throwaway2037•7mo ago

    > we complained for years about C and others being slow to compile

For C? I don't remember that, unless headers are poorly managed. C++? Definitely yes.

bhaktatejas922•7mo ago

What do we not complain about if we're being honest?

Aurornis•7mo ago

> the actual thing to measure it to keep a developer in flow state.

Personally, I find flow state hard to achieve when I constantly have to switch modes to debugging LLM output or an edit error that I missed.

When the majority of time is spent waiting for the main LLM to think, I will always wait a few extra seconds for a better edit than risk having to spend multiple cycles playing find-the-bug because something didn't get applied correctly somewhere.

bhaktatejas922•7mo ago

Like most things its a tradeoff. Developer tolerance for errors is extremely low - but the error rate for Fast Apply is even lower

ashwindharne•7mo ago

I do find that having inference happen ~50% faster is much more valuable to my workflow than a single digit accuracy increase. If I'm going to have to check that the changes are correct anyways, getting more iterations in faster feels much better than incremental accuracy.

There's definitely a tipping point though. If the accuracy gains are so high that I can check its work less carefully or less often, the benefits of inference speed are effectively nil.

bhaktatejas922•7mo ago

exactly. The point is that none of the users even realize a model is doing the apply - it should be so accurate and fast that it feels like its not there

walthamstow•7mo ago

Agreed. Sonnet 4 is supposedly better than Sonnet 3.5, but in Cursor 3.5 is much faster so that's what I use

godot•7mo ago

I've been using Cursor pretty extensively in the past few months and I use it to code pretty hard problems sometimes, and a while ago when the options were between claude 3.5 sonnet vs gemini 2.5 pro, there was such a significant difference in quality that claude 3.5 often straight up failed -- the code it wrote woudln't work, even after retrying over and over again, and gemini 2.5 pro often was able to solve it correctly. In a particular project I even had to almost exclusively use gemini 2.5 pro to continue to make any progress despite having to wait out the thinking process every time (gemini was generally slower to begin with, and then the thinking process often took 30-90 seconds).

Mo3•7mo ago

Yup, same. My Google API costs were way too high. Sonnet and Opus 4 are much better now so they take care of most of my "easier" tasks. Gemini 2.5 Pro is still somehow better for larger scopes so I have it do all the pre-planning and larger tasks

smrtinsert•7mo ago

Slow is smooth and smooth is fast.

bhaktatejas922•7mo ago

and speculative edits is faster

Cort3z•7mo ago

As far as i understand, this is not +-300ms. It is 300ms vs. 10 sec or something. That is a huge difference. I personally find the time to wait for these larger models a limiting factor. It’s also probably a resource waste for fairly simple task like this. (Compared to the general function approximation of the llms)

But I honestly feel like the task of smartly applying edits falls somewhat within traditional coding tasks. What about it is so difficult it could not be done with a smart diffing algorithm?

deepdarkforest•7mo ago

you misunderstood. its 300ms just for the apply model, the model that takes your coding models output (eg sonnet) and figures out where the code should be changed in the file. Cursor has its own, and claude uses a different technique with strings as well. So its 10sec vs 10sec +300ms using your analogy

Cort3z•7mo ago

Their selling point is to be a more open version of what cursor has. So the alternative is to use a full llm. So it is 10s+ 10s vs 10s+ 300ms

bhaktatejas922•7mo ago

yep!

bravesoul2•7mo ago

For someone not heavy in this space. I use GH copilot at work. I might switch to cursor. I am not into the details of the tools just care does it help me or not. For us it might be worth having an easier to understand value proposition.

It may take a bit if explaining and that's OK. But the big question is as someone doing my enterprise microservice who isn't heavy into AI why do I switch to you.

bhaktatejas922•7mo ago

it's a bit unclear why a model works best here. in short - smart diffing is edge case hell and you'll never capture all of them

k__•7mo ago

I have to admit, that using slow models is unbearable when I used fast one before.

I don't know if the quality and speed are linearly related, though.

AirMax98•7mo ago

Seriously agree — try using something like Sonnet 3.7 and then switching to Gemini 2.5 Pro. The code that both output is fine enough — especially given that I mostly use LLMs as a fancy autocomplete. Generally a better prompt is going to get me closer to what I want than a more robust model. The speed hit with Gemini 2.5 Pro is just too substantial for me to use it as a daily driver.

I imagine the speed difference might not matter so much if you are performing seismic updates across a codebase though.

bhaktatejas922•7mo ago

yeah speed and flow state are for sure linked. People love to say Cursor is a Claude wrapper but they miss the reality that Cursor is a fast apply wrapper.

Intensely sticky user experience

paulddraper•7mo ago

I do not use Opus for coding, I much prefer Sonnet.

Many tasks work better with iteration/supervision and Sonnet makes that feasible.

bhaktatejas922•7mo ago

yeah same. I feel like Opus tends to be slightly more sycophancy leaning on technical topics

helsinki•7mo ago

Interesting. I use Opus exclusively (like $1000/day in tokens) via Claude Code. Do you really think Sonnet is better for programming? I’m not sure I agree, though I’d love to save $900/day by taking you up on it.

stoken•7mo ago

Genuine: how? I assume you're using something like cc-usage to get that value. $1k/day is tons. Would genuinely love to know how you're managing to keep the inference burning through that much a day, as I'd love to do the same, but even with 2-4 simultaneous sessions running fairly continuously, mostly on Opus for 10-12 hours a day, I get maybe $500/day. What's your workflow rig/setup look like to get you to that $1k velocity?

helsinki•7mo ago

I use Vertex and work at a hedge fund. I just spam Claude Code Opus all day long. There’s not much to it, other than I sit at a chair for 12-16 hours and spam poor (actually, rich) Claude. I don’t use the cc usage too - I just look at my GCP bill :(

stoken•7mo ago

I mean, that'll do it :claps:

paulddraper•7mo ago

I think the difference without accounting for performance is noticeable but small.

And after accounting performance is in favor of Sonnet.

bhaktatejas922•7mo ago

Im not sure that it's "better". I still use Opus and it's better at coding but needs more steering to be less of a "Yes you're EXACTLY right" every time i suggest a new solution path. Purely anecdotal though

Darmani•7mo ago

Sounds like review time is the bottleneck for you.

I'm currently working on something that that makes people much faster at reviewing the output of coding agents. If you have some time, I'm very interested in interviewing you about your workflows. Just reply here, or find my contact information in my profile.

-- Jimmy Koppel, Ph. D.

asam-0•7mo ago

Fully agree.

The very 1st thing you do after you get a proposed set of changes from an AI model is to review them carefully before applying them. Most of the time it duplicates code because it skipped specific tokens or context that was out of it's window and the user didn't include it in their prompt.

Batch applying any changes is just a way to create even harded code to debug and accumulating such bulk code injections will definietly break your code much earlier than you think.

B, Sam

bhaktatejas922•7mo ago

if you've used cursor, you've probably felt how seamless fast apply can feel - fast apply is accurate and fast to the point where most don't even realize its a model

animuchan•7mo ago

Absolutely, when stuff runs fast it's better UX compared to when the stuff runs slow.

I think what parent comments suggest is, the speed of applying a diff is not the major bottleneck in LLM-assisted coding, and improvements in other aspects are much more desirable (e.g. correctness, or even speed of thinking models themselves).

In a world where diff application is a real pain point, it's likely one of the last pain points in the field.

rs186•7mo ago

Sounds interesting, but I imagine all the big players (Cursor, Windsurf, and maybe even OpenAI/Anthropic) will achieve something similar very quickly in their tools first-party, which will decimate the company. And I don't get the API part of this -- at the end of the day people use those IDEs, and I don't see developers/companies want to send their code to yet another endpoint.

bhaktatejas922•7mo ago

Perhaps - Cursor does this in house. I see the coding agent space being large as we shift into a market of on-demand software.

Sending code externally is meh especially for companies with tight security rules. We do self-hosting for them in their infra

Qerbz•7mo ago

Heard some insane rumors of the efficacy increase of this in action even though I don't know how you do it

bhaktatejas922•7mo ago

the rumors are true! learn how we do it by joining the team :)

bigyabai•7mo ago

That's an interesting first comment to post from a 5-month old account.

zackangelo•7mo ago

For anyone more curious about how this works, Fireworks wrote a blog post about it last year (I think):

https://fireworks.ai/blog/cursor

bhaktatejas922•7mo ago

yep - great post!

simonw•7mo ago

This uses an OpenAI-compatible endpoint, so got this working with my https://llm.datasette.io/ CLI tool.

First I added their models to my ~/Library/Application Support/io.datasette.llm/extra-openai-models.yaml file:

  - model_id: morph-auto
    model_name: auto
    api_base: https://api.morphllm.com/v1
    api_key_name: morph

Then I added the API key like this:

  llm keys set morph
  # Paste in API key from https://morphllm.com/api-keys

Then I saved an LLM template with their prompting pattern:

  llm -m morph-auto '<code>$code</code><update>$update</update>' --save morph

Now I can run operations like this:

  llm -t morph -p code "$(cat orig.txt)" -p update "$(cat update.txt)"

The -t option is the template I named when I ran --save. The -p name value options then set the content for the template $code and $update variables.

Example transcript here: https://gist.github.com/simonw/de67818603d448a3fee788ace2976...

One thing that worries me: since it's using XML-style tags <code> and <update>, if my own source code contains those tags I expect it may get confused.

bhaktatejas922•7mo ago

Wow that was fast - this is awesome. it shouldnt be a problem unless your code has both <code> and <update> internally. 1 or the other should be fine

nailer•7mo ago

[flagged]

seanw265•7mo ago

Last time I looked into Morph, I noticed you weren’t yet on OpenRouter. I see that’s changed, but it looks like only an older model is listed. Any plans to be more active there?

Also, are there any benchmarks comparing your fast apply models to others like Relace or even Llama via Cerebras? I’m particularly interested in output accuracy.

bhaktatejas922•7mo ago

the v2 model listed currently points to morph-v3-large. We're working with them to get v3-large and v3-fast listed

bhaktatejas922•7mo ago

the power of hacker news! New models are listed there now

bijection•7mo ago

How does this compare to relace, which I believe is also a YC company? They seem to have very similar functionality [0]

[0] https://www.relace.ai/

Kamshak•7mo ago

Good question, they also list the same customers (create.xyz, continue.dev)

fazkan•6mo ago

I think both maybe using customers very loosely :)

Workaccount2•7mo ago

Just for clarification here because I am a bit confused,

Morph is a tool for integrating the output of other LLMs and not an LLM itself? It doesn't generate 4500 tok/sec, it can edit 4500 tok/sec?

bhaktatejas922•7mo ago

Correct, but morph is a LLM as well. In practice its basically Big LLM using small LLM as a tool call

Workaccount2•7mo ago

I see. How is this not going to get run over immediately by big players? Google's diffusion model is already in the wings, and it's both wicked fast and ~flash-lite intelligent.

bhaktatejas922•7mo ago

you could make the argument about any startup really. To me its the same reason they don't build the foundational model for legal, for sales, etc.. - everything comes at a cost. Allocating researcher time to this is attention not spent on the general frontier model - losing 1-2% there is the difference of billions of dollars for them

nailer•7mo ago

Google's a great tech organization but they generally don't create dominant tech products like they used to back in the Maps / Mail days (this is nearly two decades ago).

Google wrote AKYNIA. OpenAI wrote ChatGPT.

bhaktatejas922•7mo ago

factual

conartist6•7mo ago

Conventional tools are also a threat. Nothing about this problem is AI-specific.

thegeomaster•7mo ago

It does require a level of contextual awareness, fuzziness and robustness against crazy inputs that in my mind would be very hard to achieve using classical approaches.

conartist6•7mo ago

Context awareness, yep. Response to fuzzy inputs... idunno, why does it need that?

The thing I think is really silly is that it tries to make incremental writes to a flat file really fast, which is an impossible goal. As the file gets bigger your writes will just get slower and slower and slower at a rate that increases linearly with the size of the file.

thegeomaster•7mo ago

LLMs will generate unpredictable, very humanlike code edits in this form. They might use comments like "same function as above", "rest of the function with similar changes", "function ABC is no longer needed", "function ABC same as above", etc. Your code edit model must resolve all of these, or flag an error in case of too much ambiguity. I would think a classical algorithm would have a lot of trouble differentiating between, for example, "function replaced with a comment because the LLM wanted to remove it" and "function replaced with a comment because the LLM wanted to keep it the same".

The other half of your comment is true, but we typically have a ceiling on the reasonable size of a code file. For decades we've had the conventional wisdom to refactor files that are beyond some threshold LoC (be it 200, 500 or whatever it is.) If that is sufficiently fast, you can parallelize such operations and provide a maximum edit time regardless of the change size.

conartist6•7mo ago

I agree that sorting something that messy can't easily be done with a heuristic expert system, but I'm registering my concern that the whole approach is predicated on first making a huge mess with one LLM then cleaning up the mess with another.

The classical approach is more like "change the definition of the problem until you don't need to make a big mess in the first place"

thegeomaster•7mo ago

This might be a controversial take, but this approach is just plain old engineering applied to LLMs.

Instead of making an LLM perform both an accurate code edit AND follow a strict output schema, you split that into two problems: accurate code edit with a lax output schema, then application of that lax output schema to the original file. You can then use different models for the two tasks, reducing your probability of failure.

conartist6•7mo ago

Yeah, it's just that I'm building things to last the next 50 years. Of course you can build a skyscraper on a foundation of wet noodles and it would still be (impressive) engineering, but the structure would not stand the test of time.

So yeah, it's a clever approach, and useful even, but there's no way I can see that that such a noodl-y hack could become the bedrock for all other infrastructure

bhaktatejas922•7mo ago

precisely. we work well on files up to 2k lines. It's hard to wrap your head around at first, but code merge has 100s of edge cases to deal with. Its the perfect application for a model

thegeomaster•7mo ago

You can run more intelligent traditional LLMs at higher speeds than the Google diffusion model. Even then, it runs nowhere near 4500tok/s, and such small models generally suck in terms of accuracy compared to a specialized, fine tuned one.

furyofantares•7mo ago

Big LLM and small LLM, very Starbucks sizing vibes here.

icy•7mo ago

That would be Grande LLM and Tall LLM.

eabeezxjc•7mo ago

why not ruby?

because ruby no need corecting. It works.

nico•7mo ago

Would be awesome to have a browser extension that could create a bridge between ChatGPT and VSCode, applying Morph in between (or Claude instead of ChatGPT). Essentially use the web interface, instead of the APIs for agentic coding

bhaktatejas922•7mo ago

I think an MCP would do the job. We're shipping one out as we speak

sidgarimella•7mo ago

+1 hyped for an mcp that I might be able to plug zed into

bhaktatejas922•6mo ago

mcp is out! https://morphllm.com/mcp

elzbardico•7mo ago

1) Raw inference speed matters more than incremental accuracy gains for dev UX—agree or disagree?

Yeah, I love reviewing and debugging thousands of lines of buggy and dirty AI generated code. Who cannot love it?

bhaktatejas922•7mo ago

key word incremental - for fast apply to be useful it should be so fast and accurate that most people don't realize there's a model there at all

callamdelaney•7mo ago

Yeah sounds like exactly what we need

laborcontract•7mo ago

Really like this. I've been trying microsoft's copilot and it's so clunky, particularly when applying edits. One would assume they have the resources to train the model..

Request: please provide a system prompt in the docs to help the llm generate the diff format that performs best w/ your models. LLMs frequently change the way they present diffs on upgrades and I don't want to be guessing which format is best.

EDIT: Please clarify your privacy policy. If my interpretation is correct, paying users will have their data retained and trained on? Is there any way to pay to use the service (w/o picking up the phone) and not have my data trained on?

  4.1 Use of Service Data

  Depending on your subscription tier:

  Free Tier: We may use your submitted code data to train our models, improve our Services, and develop new features.
  Engineer Tier: We may use your submitted code data to train our models, improve our Services, and develop new features, subject to the confidentiality provisions in your service agreement.
  Enterprise Tier: We do not use your submitted code data for any purpose other than processing your immediate request. Your code data is never used for model training or service improvement.

[0] https://morphllm.com/privacy

bhaktatejas922•7mo ago

done! Yeah we have ZDR options as well, just email us to enable it info@morphllm.com

Morph via OpenRouter is always zero data retention

laborcontract•7mo ago

Good to know. Thanks a lot!

fastball•7mo ago

This whole "don't train on my data" thing is so silly. Do you know how these models were created? By training them on code.

Very selfish / tragedy of the commons for you to want to use tools that were trained on the code of others but not your own. That is how these models get better.

laborcontract•7mo ago

I care much less about the training, much more about the data retention. I don't think it's wrong to not want my data retained, especially if the counterparty is receiving remuneration for the service. For free services, I agree with you. I've used the free Gemini liberally.

I do appreciate the transparency on their privacy page and their providing the ability to opt about. Seems like they've given it some thought.

lastdong•7mo ago

Is this similar to Gemini Diffusion? Thanks

bhaktatejas922•7mo ago

No, we use autoregressive llms. Diffusion models would be super interesting here. Mercury is doing some interesting work with diffusion in code gen but still too early to tell if it'll get good enough for production usage

scottpersinger•7mo ago

I’d just like to put a pitch in here for someone to do “smart rebase+merge” with AI. Now THAT would really speed up development, if my AI was intelligently merging code from different users in the background, based on understanding the intent behind each conflicting change.

bhaktatejas922•7mo ago

how often do you run into merge conflicts?

weird-eye-issue•7mo ago

You can do that with Claude Code. Just tell it to merge in another branch and fix the merge conflicts.

bbe327•7mo ago

I didn’t know that. Thank you for the info. It’s been annoying. Will hail Claude Code. Hoorays

FridgeSeal•7mo ago

> Raw inference speed matters more than incremental accuracy gains for dev UX

Now I can be wrong, faster!

z3ugma•7mo ago

How do I start using this on a codebase on my local computer? I'm quite confused by the quickstart. Do I use a VSCode extension? One of the Claude Code like clones but with this as a custom model?

fcpguru•7mo ago

same question! +1

joshmlewis•7mo ago

This is more of a model that VSCode would integrate, not something for end users to use.

bhaktatejas922•7mo ago

if you make an account, we'll email you when we have something easier to try!

michaelneale•7mo ago

Have been using morph for a while (I am one of the authors of goose) and was surprised when introduced at the boost it gave me (much less iteration with the main expensive LLM, and I can even make the editing process simpler to take a load off the agent). Used it with claude 3.5, 3.7, 4 and currently with a o3/openai and anthropic/claude4 + morphllm combo today.

bhaktatejas922•7mo ago

yeah its such a better experience! - it was really the difference maker between the cursor experience and everything else previously

orge•7mo ago

It would be great to have an integration with Aider or OpenCode.

bhaktatejas922•7mo ago

working on it

golergka•7mo ago

I don't think this makes any sense as a standalone product, but I wish I had it in my claude code and aider as an intermediate step right away. The latter tool, while less popular, already supports _exactly_ this workflow in the architect mode — a good candidate to be your first integration, I think.

shifald•7mo ago

Nice! Will give it a try. Congrats on the launch

bhaktatejas922•7mo ago

thanks, let us know how it goes

joshmlewis•7mo ago

I'm going to test implementing this for my project https://promptslice.com and see how it does with text based edits. I assume it will do ok.

I'm also really curious about the XML tool calls in the documentation. I have not heard of this being the norm for tools like Cursor. Is that still the case? I feel like I'm pretty in the know about this stuff but must have missed that trend.

bhaktatejas922•7mo ago

Its true - Cursor, Cline, and many others still use xml for tool calls. In JSON, the model needs to "focus" on escaping characters correctly while also sampling from a reduced token distribution.

https://aider.chat/2024/08/14/code-in-json.html

weird-eye-issue•7mo ago

Seems completely broken.

I used the provided HTML example on https://morphllm.com/dashboard/playground/apply. Without editing anything at all, I pressed apply.

Your model added a bunch of CSS even though that wasn't in the update instructions at all. It also added a contact section, which again, wasn't in the update instructions that your demo provided.

bhaktatejas922•7mo ago

nice catch. the html example was using a hardcoded snippet we forgot to uncomment. fixed

weird-eye-issue•7mo ago

Thanks I will give it another try because our use case is HTML/Markdown documents, not code, and this could be interesting. I'm just hesitant to trust an LLM to do replacements and your broken example really didn't help with my confidence. Even a 1% error rate wouldn't be worth it because if find/replace doesn't work you know it doesn't work and can feed that error back into the agent to fix it (like how Claude Code recovers from its editing errors)

edit: The example is still broken. I've inspected the network request and it's definitely your backend that is broken not something being hardcoded... The CSS is not present in the request at all, but in the response it's being inserted.

bhaktatejas922•7mo ago

took a closer look, the update snippet provided is referencing css thats not defined. this is sort of a overreach example of the "semantic" nature of the edits - ie, if an update explicitly contains syntax, or minor reference errors, the apply model corrects them. you can argue that this is an overstep in this example - but at the same time its not something claude/gemini would suggest

weird-eye-issue•7mo ago

> the update snippet provided is referencing css thats not defined

That isn't how HTML works.

You mean the new HTML has classes which from what your LLM can see has no styles applied.

Maybe those styles are in a different file or maybe the developer didn't want any styles applied, the editing LLM would not know that.

It is completely broken behavior and you lied about it at first. Your model should not be writing code! It should just be applying edits given to it.

Bluestein•7mo ago

"Let's be wrong faster! It's a feature, we swear!"

bhaktatejas922•7mo ago

or lets be right faster

Bluestein•7mo ago

Yes. Or, no. Who knows.-

bhaktatejas922•7mo ago

The shoggoth knows

charcircuit•7mo ago

It's better than being right slower than I can manually make the change.

w10-1•7mo ago

(warning: outside, naive perspective)

> 1) Raw inference speed matters [most] for dev UX—agree or disagree?

Or maybe incremental content-assist and full-file problem-solving are two significantly different uses, though they're both dev UX use cases.

Because they're confusingly similar, comparing them (and denigrate full-file solutions) wastes time/energy. You muddy your own message.

Just concentrate on showing the value of what you do where and when. To wit...

In the inference case, you're really using context to provide affordances -- next steps. In the full-file case, you're starting instead from a goal statement, with context providing constraints.

I think where you want to go is to show when the tool anticipates where you *should* go; i.e., the extent to which it can lead junior developers to the next step, and senior developers to the next constraint/issue they're ignoring.

I believe just as "attention is all you need" surprised people, this kind of bottom-up approach has more legs than people expect.

I understand the naked probability model is trained on world code corpus; what would interest me is whether you can also create a model that learns the developer's biases.

Then the work is to see the issues in the context, but address them in the order and manner that the developer would. Lock-in would occur because, well, the system understands me. And it would be particularly nice when Programmer A wants to code like Programmer B. If your assistant has a model of Programmer B, the assistant could guide Programmer A in that direction.

bhaktatejas922•7mo ago

"Creating models that learn developer biases" has a great ring to it - maybe we should make that our mission statement. Thats exactly what we're doing with our models. The Next-Edit completion model especially resonates with this

now if you meant one step further and meaning the literal single developer, that's probably best serve in context - albiet with a model that's learned developer biases

boleary-gl•7mo ago

Kilo Code team member here

Would love to chat about integrating the models into Kilo Code if you’re interested

You can contact me at brendan [at] kilocode [dot] ai

furyofantares•7mo ago

Does Claude Code have a similar apply model? It does create diffs for you to accept/reject but then I feel like it's always using a find/replace tool to apply it rather than a model that rewrites the whole file. I don't know how the speed of this approach compares but the accuracy feels great.

bhaktatejas922•7mo ago

Claude Code uses find and replace, errors get piped back to the agent so you rarely feel them unless you hit an infinite error loop

krishvs•7mo ago

Really impressive. I'm in the market for such a solution for our internal AI coding systems - how do you compare to the opensource https://huggingface.co/osmosis-ai/Osmosis-Apply-1.7B?

I am assuming your models are not opensource/openweights?

bhaktatejas922•7mo ago

You should give both a try! key difference is our models are faster and more accurate by a large margin

csomar•7mo ago

1- I agree that speed is intelligence[+], but you're suggesting we can reduce accuracy, increase speed, and get better results. I don't buy it.

2- I'm confused. Claude Code and a Neovim plugin I used both do edits/diffs. Are you saying they're actually rewriting entire files instead?

3- Aren't "simple tasks" just things you train the model on? If so, are you solving a bunch of simple tasks or offering custom training?

> No more slow full-file rewrites or brittle search-and-replace hacks.

Here's the thing - LLMs are already blazing fast. I commented the other day that you could probably write Chrome's entire code base in a couple months at average speed. The bottleneck isn't speed, it's accuracy; that's of course my opinion.

+: https://omarabid.com/claude-magic

bhaktatejas922•7mo ago

Accuracy of course comes first - errors of any sort break flow state and are massively disruptive. Claude Code does search and replace. The failure rate of search and replace is rather high (over 11%) but claude can self correct them. Fast Apply is more accurate than search and replace. Full file rewrites for files under 10k tokens can work quite well - like 98%+ but its absurdly expensive and slow. Fast Apply is a much better option that balances speed and accuracy

csomar•7mo ago

I'm having trouble getting your platform to work properly. When I tried the examples you provided, I couldn't understand what they were supposed to demonstrate. The code I wrote either breaks or just outputs your sample code instead of working correctly. My usage dashboard shows 0, so I can't tell if the platform is actually functioning. The only thing that seems to work is the payment redirect to Stripe.

I think you need to take a step back and contemplate.

bigtimegangstar•7mo ago

Gotta try this, maybe even a obsidian plugin?

bhaktatejas922•7mo ago

say more! We're in the smart composer plugin, but the integration is out of date

dsp_person•7mo ago

2. Sometimes watching Claude do inline edits I cringe watching it delete almost the entire file line by line. I think when early in exploring a problem it's best to torch the previous version and fully re-write than to assume incremental.

bhaktatejas922•7mo ago

Yeah, incremental updates are something models struggle with, which is what we're trying to solve.

strogonoff•7mo ago

The most important question for any new company based around generative ML: how did you source training data, and did you observe the licesning (e.g., never use code under GPL or default copyright conditions)?

Very few companies can or are willing to answer that.

pcwelder•7mo ago

Impressive demo! Is there any rate limit (TPM)?

bhaktatejas922•7mo ago

for companies and startups we raise it considerably. For free tier there's rate limits + a limit of 100 req per month

Kamshak•7mo ago

It's more expensive than Gemini flash which can actually write pretty decent code (not just apply a diff). Fast AI edit application is definitely great but that's pretty expensive

Morph v3 fast: Input: $1.20 / M tokens, Output $2.70 / M tokens

Gemini 2.5 Flash: $0.30 / M tokens, Output $2.50 / M tokens

(Source: OpenRouter)

bhaktatejas922•7mo ago

Thats for 0 data retention - on the Morph website its: 0.80 /1M token input, $1.20 /1M token output. We have discounts for large volumes/reserved instances as well

helsinki•7mo ago

Doesn’t Apple’s new model do the same thing? https://huggingface.co/apple/DiffuCoder-7B-cpGRPO

bhaktatejas922•7mo ago

This is a coding model, not a code merge model. What we do is 1 thing, extremely well - merging code edits into original files

kordlessagain•7mo ago

This is what I use and it's free for anyone to use individually: https://github.com/kordless/gnosis-evolve/blob/main/contrib_...

agrippanux•7mo ago

How does this compare to Google Diffusion? Diffusion writes out at seemingly the speed of thought.

bhaktatejas922•7mo ago

we're quite a bit faster and specifically training for merging code edits.

Google diffusion is a swing at a generalist model. Super cool work nonetheless

fazkan•6mo ago

Can someone help me understand the value of this? I've build a vibe-coding product, and the existing model (both claude and openai) work well enough in apply. Its all a function of prompting/context.

bhaktatejas922•6mo ago

it really matters when you care deeply about the product experience. The gap between something that works sometimes and something that works great everytime is what separated Cursor from the rest.

Try it out: https://morphllm.com/dashboard/playground/apply

OpenClaw Creator: Why 80% of Apps Will Disappear

What Happens When Technical Debt Vanishes?

AI Is Finally Eating Software's Total Market: Here's What's Next

Computer Science from the Bottom Up

Show HN: I built a toy compiler as a young dev

You don't need Mac mini to run OpenClaw

Learning to Reason in 13 Parameters

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

Ask HN: Will GPU and RAM prices ever go down?

From hunger to luxury: The story behind the most expensive rice (2025)

Substack makes money from hosting Nazi newsletters

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

Moltbook was peak AI theater

Why Claude Cowork is a math problem Indian IT can't solve

Show HN: Built an space travel calculator with vanilla JavaScript v2

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

These White-Collar Workers Actually Made the Switch to a Trade

The Wonder Drug That's Plaguing Sports

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Federated Credential Management (FedCM)

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

The Story of Heroku (2022)

Obey the Testing Goat

Claude Opus 4.6 extends LLM pareto frontier

Brute Force Colors (2022)

Google Translate apparently vulnerable to prompt injection

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

Software development is undergoing a Renaissance in front of our eyes

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

OpenClaw Creator: Why 80% of Apps Will Disappear

What Happens When Technical Debt Vanishes?

AI Is Finally Eating Software's Total Market: Here's What's Next

Computer Science from the Bottom Up

Show HN: I built a toy compiler as a young dev

You don't need Mac mini to run OpenClaw

Learning to Reason in 13 Parameters

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

Ask HN: Will GPU and RAM prices ever go down?

From hunger to luxury: The story behind the most expensive rice (2025)

Substack makes money from hosting Nazi newsletters

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

Moltbook was peak AI theater

Why Claude Cowork is a math problem Indian IT can't solve

Show HN: Built an space travel calculator with vanilla JavaScript v2

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

These White-Collar Workers Actually Made the Switch to a Trade

The Wonder Drug That's Plaguing Sports

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Federated Credential Management (FedCM)

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

The Story of Heroku (2022)

Obey the Testing Goat

Claude Opus 4.6 extends LLM pareto frontier

Brute Force Colors (2022)

Google Translate apparently vulnerable to prompt injection

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

Software development is undergoing a Renaissance in front of our eyes

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec

Comments