Launch HN: Relace (YC W23) – Models for fast and reliable codegen

111•eborgnia•5mo ago

Hey HN community! We're Preston and Eitan, and we're building Relace (https://relace.ai). We're trying to make building code agents easy and cheap.

Here’s an example of our apply model vs. whole file edits: https://youtu.be/J0-oYyozUZw

Building reliable code agents is hard. Beyond simple prototypes, any app with code generation in production quickly runs into two problems -- how do you reliably apply diffs, and how do you manage codebase context?

We're focused on solving these two problems at order-of-magnitude lower price and latency.

Our first model that we released, in February, is the Fast Apply model -- it merges code snippets with files at 4300 tok/s. It is more reliable (in terms of merge errors) than Sonnet, Qwen, Llama, or any other model at this task. Each file takes ~900ms and gives an instantaneous user experience, as well as saving ~40% on Claude 4 output tokens.

Our second model focuses on retrieval. For both vibe-coded and enterprise codebases, retrieving only the files relevant to a user request saves both on SoTA input token cost and reduces the number of times code agents need to view files. Our reranker (evals below) can scan a million-line codebase in ~1-2s, and our embedding model outperforms any other embedding model for retrieval as evaluated on a corpus of Typescript/React repositories.

There are many different ways to build coding agents, but being able to edit code reliably and retrieve the most relevant parts of the codebase is going to be a foundational issue. We're excited to be building ways to make it more accessible to millions of users who don't want to spend $$$ on Claude.

These models are used in production, millions of times per week. If you've used Lovable, Create.xyz, Magic Patterns, Codebuff, Tempo Labs then you've used us!

Here's a link to try it out: https://app.relace.ai, and here are our docs: https://docs.relace.ai.

We've opened up free access for prototyping on our website to everyone, and the limits should be enough for personal coding use and building small projects (correct us if it’s not). We integrate directly with Open-Source IDE's like Continue.dev. Please try us out, we'd love to hear your feedback!

Comments

bigyabai•5mo ago

> We're trying to make building code agents easy and cheap.

What is your plan to beat the performance and cost of first-party models like Claude and GPT?

eborgnia•5mo ago

Hey -- good question! We're focused on a narrower task right now that aims to save frontier tokens (both input & output). Our merge + retrieval models are simply smaller LLMs that save you from passing in too much context to Sonnet, and allow you to output fewer tokens. These are cheap for us to run while still maintaining or improving accuracy.

ramoz•5mo ago

I can import my entire codebase to Gemini and get more than a nuanced similarity score in terms of agent guidance.

What’s the differentiator or plan for arbitrary query matching?

Latency? If you think about it - not really a huge issue. Spend 20s-1M mapping an entire plan with Gemini for a feature.

Pass that to Claude Code.

At this point you want non-disruptive context moving forward and presumably any new findings would only be redundant with what is in long context already.

Agentic discovery is fairly powerful even without any augmentations. I think Claude Code devs abandoned early embedding architectures.

eborgnia•5mo ago

Hey, these are really interesting points. The question of agentic discovery vs. one-shot retrieval is really dependent on the type of product.

For Cline or Claude Code where there's a dev in the loop, it makes sense to spend more money on Gemeni ranking or more latency on agentic discovery. Prompt-to-app companies (like Lovable) have a flood of impatient non-technical users coming in, so latency and cost become a big consideration.

That's when using a more traditional retrieval approach can be relevant. Our retrieval models are meant to work really well with non-technical queries on these vibe-coded codebases. They are more of a supplement to the agentic discovery approaches, and we're still figuring out how to integrate them in a sensible way.

mercurialsolo•5mo ago

Good job on the launch - will give it a spin for our coding agent. Having worked a bunch with the agents i see below as the next evolution or leap in agents.

I see 2 big factors to improve ability of coding agents today

- on device model - context (or understanding of modules) - not only retrieving the relevant sections or codebase but creating a version (transforming it) which is readily consumable by a model and used to focus on the problem at hand.

This requires both a macro global context of the codebase and the ability to retrieve the local context of the problem being solved.

Augment context e.g. does a fairly good job of context compression and retrieval among coding agents. Fast indexing & retrieval is a good step forward to enable open context compression

eborgnia•5mo ago

Thank you :)

Please do reach out, we love talking to builders in this space & would love to share notes & give you free access. eborgnia@relace.ai

jacktheturtle•5mo ago

looks very promising! this space is such a cool domain. compression algos v2

nico•5mo ago

Very interesting. Can these models be used in editors/agents like aider or roo? I can see also see a use case of some sort of plugin or browser extension, to easily apply the patches provided by GPT/Claude on their web interfaces (without having to copy/paste and manually edit the files in the editor)

Also, would love to see more concrete examples of using the Apply model

Reading here: https://docs.relace.ai/docs/instant-apply/quickstart

Is it correct, that first I need to: 1) have some code, 2) create a patch of the code with the changes I want, 3) call the Apply model with the full code + patch to make the changes and provide the result?

Do you have metrics to compare that workflow with just passing the code from 1) with a prompt for the changes to something like gpt/claude?

pfunctional•5mo ago

(Preston, other guy on the team)

Yes, they can -- I actually tried a semantic edit implementation in Aider. It got the "correct edit format" percentage to 100%, but didn't really budge the overall percent correct on SOTA models. I should push it sometime, since it really helps the reliability of these local models like Qwen3. If you reach out to me, I can try to share some of this code with you as well (it needs to be cleaned up).

But yes, 1. have some code, 2. create a patch (semantic, diff, or udiff formats all work), and 3. apply will return it to you very fast. There's roughly a 10-15% merge error rate when we last benchmarked on using Claude 3.7 Sonnet to create diff patches, and with us it was 4%; and you can use the Apply as a backup if the merge fails.

conartist6•5mo ago

What's the semantic diff format?

HyprMusic•5mo ago

Looks great. Before I get too excited, do you plan to release a per-token paid API, or is your target audience bigger companies who negotiate proper contracts?

pfunctional•5mo ago

I think we have one on the site right now -- it's roughly 4.1-mini pricing. We're not aiming to make money off of individual users, which is why we're trialing a free thing (and trying to partner with open-source frameworks). Our bread and butter is more companies doing this at scale & licensing.

jumploops•5mo ago

We looked into many different diff/merge strategies[0] before finding Relace.

Their apply model was a simple drop-in that reduced the latency of our UX substantially, while keeping error rates low.

Great work Preston and Eitan!

[0] https://aider.chat/docs/more/edit-formats.html

eborgnia•5mo ago

Thanks for the support!

jadbox•5mo ago

Benchmarks?

eborgnia•5mo ago

We have a few benchmarks on docs.relace.ai in the model overview sections. Any ideas on other benchmarks you'd like to see are welcome though

bradly•5mo ago

Great job. I think this is a great area to focus on.

I am a solo developer who after trying to run local llm models for code and not being satisfied with the results is back to copy/pasting from browser tabs. I use vim so getting llm/lsp integration working reliably has felt questionable and not something I enjoying tinkering with. I tried aider with Google's Geminis models, but I never got the IAM accounts, billing quotas, and acls properly configured to get things to just work. I thought it would be fairly straight forward to build a local model based on my Gemfile, codebase, whatever else and have a local llm be both a better and cheaper experience than claude code which I blew threw $5 results that weren't usable or didn't save time after.

The sign up experience was really smooth. Like anything it else, is so easy to over complicate or be too clever, so I commend you for having the discipline to get it straight forward and to the point.

After account verification I didn't feel I understood what to do when landing on the Add Code Playground experience. It took me a while to grok what the three editors were doing and why there was JavaScript on the left and python on the right, but with an option for JavaScript. I found https://docs.relace.ai/docs/instant-apply/quickstart in the docs and at myself would be a better place to land after signup. I'd even recommend having the tabs on those snippets to be able to just grab a curl command and tip my toe in.

I think my biggest miss was my own assumption that a custom model was going to be a local model. Not that it was represented that way, but my brain was lumping those things together prematurely.

eborgnia•5mo ago

Hey, really appreciate the detailed sign up journey here! Getting the simplest flow is hard, and it's something we obsess over. The docs have been a work in progress for the past couple of months, but now that they are getting better I think it's a good idea to make them more front and center for new users.

We are trying to make this as accessible as possible to the open-source community, with our free tier, but feel free to reach out if you need expanded rate limits. Cheers :)

diggan•5mo ago

Looks interesting and useful if the accuracy numbers are as told. Kind of sad it's only available via a remote API though, makes the product more like a traditional SaaS-API. The marketing keeps talking about "models" yet the actual thing you use is only the API, would have been nice to be able to run locally. Although I do understand that it's harder to make money in that case.

I got curious about what datasets you used for training the models? Figured the easiest would be to scrape git repositories for commits from there, but seems there are also quality issues with an approach like that.

eborgnia•5mo ago

Open source git repos are a really good place to get data -- it requires a lot of munging to get it into a useful format, but that's the name of the game with model training.

It's on the roadmap to make public evals people can use to compare their options. A lot of the current benchmarks aren't really specialized for these prompt-to-app use cases

piterrro•5mo ago

How does it differ from Cline VS extension? It already uses diff apply which makes bigger files edits much faster

eborgnia•5mo ago

Cline orchestrates all the models under the hood, you could use our apply model with Cline. Not sure what model they are using for that feature right now

KaoruAoiShiho•5mo ago

Does this work on any language or text?

eborgnia•5mo ago

We trained it on over a dozen languages, with a bias towards Typescript and Python. We've seen it work on Markdown pretty well, but you could try it on plaintext too -- curious to hear how that goes

bcyn•5mo ago

Very interested to see what the next steps are to evolve the "retrieval" model - I strongly believe that this is where we'll see the next stepwise improvement in coding models.

Just thinking about how a human engineer approaches a problem. You don't just ingest entire relevant source files into your head's "context" -- well, maybe if your code is broken into very granular files, but often files contain a lot of irrelevant context.

Between architecture diagrams, class relationship diagrams, ASTs, and tracing codepaths through a codebase, there should intuitively be some model of "all relevant context needed to make a code change" - exciting that you all are searching for it.

ankit219•5mo ago

I have a different pov on retrieval. It's a hard problem to solve in a generalizable format with embeddings. I believe this can be solved at a model level where its used to fix an issue. With the model providers (oai, anthropic) going full stack, there is a possibility they solve it at reinforcement learning level. Eg: when you teach a model to solve issues in a codebase, the first step is literally getting the right files. Here basic search (with grep) would work very well as with enough training, you want the model to have an instinct about what to search given a problem. similar to how an experienced dev has that instinct about a given issue. (This might be what the tools like cursor are also looking at). (nothing against anyone, just sharing a pov, i might be wrong)

However, the fast apply model is a thing of beauty. Aider uses it and it's just super accurate and very fast.

bcyn•5mo ago

Definitely agree with you that it's a problem that will be hard to generalize a solution for, and that the eventual solution is likely not embeddings (at least not alone).

cocoflunchy•5mo ago

Relevant interview extract from the Claude Code team: https://x.com/pashmerepat/status/1926717705660375463

> Boris from the Claude Code team explains why they ditched RAG for agentic discovery. > "It outperformed everything. By a lot"

ankit219•5mo ago

This is very cool. They explained the solution better than I did. If I knew, I would have just linked this :)

eborgnia•5mo ago

Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.

If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.

The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.

Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards

harrisreynolds•5mo ago

Nice! I am currently writing a new version of my no-code platform, WeBase [1], to use AI to generate and edit applications.

Currently just using foundation models from OpenAI and Gemini but will be very interested to try this out.

My current approach is to just completely overwrite files with new updated version but I am guessing using something like Relace will make the whole process more efficient... is that correct?

I'll watch your video later but I would love to learn more about common use cases. It could even be fun to write a blog post for your blog comparing my "brut force" approach to something more intelligent using Relace.

[1] https://www.webase.com (still points to the old "manual" version)

diggan•5mo ago

> My current approach is to just completely overwrite files with new updated version

Overwriting full files work great <100 lines or so, but once you want to be able to edit files above that, it kind of gets very slow (and costly if using paid APIs), so using some sort of "patch format" makes a lot of sense.

eborgnia•5mo ago

Happy to collaborate, shoot us an email at info@relace.ai :)

max_on_hn•5mo ago

I will have to try out Relace for CheepCode[0], my cloud-based AI coding agent :) Right now I’m using something I hacked together, but this looks quite slick!

[0] https://cheepcode.com

darkteflon•5mo ago

This is very interesting. There was also an article and discussion a couple of days ago on using diffusion models for edit/apply tasks at ~2k tps[1].

If I understand correctly, the ‘apply’ model takes the original code, an edit snippet, and produces a patch. If the original code has a lot of surrounding context (e.g., let’s say you pass it the entire file rather than trying to assess which bits are relevant in advance), are speed and/or performance materially affected (assuming the input code contains no duplication of the code to be amended)?

Does / how well does any of this generalise to non-code editing? Could I use Relace Apply to create patches for, e.g., plain English markdown documents? If Apply is not a good fit, is anyone aware of something suitable in the plain English space?

[1] https://news.ycombinator.com/item?id=44057820

eborgnia•5mo ago

The diffusion approach is really interesting -- it's something we haven't checked out for applying edits just yet. It could work quite well though!

You can definitely use it for markdown, but we haven't seen anyone test it for plaintext yet. I'm sure it would work though, let us know if you end up trying it!

darkteflon•5mo ago

Perfect, thanks for the reply - I absolutely will try it, we have a specific need for this capability.

aystatic•5mo ago

Codex-like agents are cool but as someone with even just a passing interest in compilers I absolutely hate this attempt at appropriating the word "codegen"

threeseed•5mo ago

> For both vibe-coded and enterprise codebases

What in god’s name does this even mean ?

eborgnia•5mo ago

Haha, we think of "vibe-coded codebases" as codebases produced by nontechnical users that are using an AI tool

revskill•5mo ago

Yes.finally a llm that makes sense instead of the rabbit hole of bullshit generation.

rbitar•5mo ago

Excited to try this out, it will solve two problems we’ve had: applying a code diff reliably and selecting which files from a large codebase to use for context.

We quickly discovered that RAG using a similarity search over embedded vectors can easily miss relevant files, unless we cast a very wide net during retrieval.

We’ve also had trouble getting any LLM to generate a diff format (such as universal diff) reliably so your approach to applying a patch is exciting.

energy123•5mo ago

One thing I'd like to see is an apples-to-apples benchmark against e.g. aider's edit formats, on the same set of tasks. There is a published benchmark on your site, but it isn't apples-to-apples, it only establishes the relative superiority of the fine-tuned model within this patching framework -- it's not a comparison across patching frameworks.

pfunctional•5mo ago

You're super right -- this is probably the one crack in our narrative and one that I sorely need to address. Hope to be back with something positive on this front soon, we're setting up all the benchmark harnesses to do this more equitably.

blef•5mo ago

We are using Relace in production to apply code for a month and this is crazy how easy it has been to integrate it (less than 30 minutes). The most impressive thing when you come from a general purpose LLM is the speed and the accuracy relace brings.

In the past we were using o4-mini which had an annoying issue at adding newline when not needed and was slow (5s+), relace fixed all these issues.

eborgnia•5mo ago

Glad it's working out -- thanks for the support :)

ilikebits•5mo ago

How do you compare against Morph (https://morphllm.com/)?

nylonstrung•5mo ago

Why do we need 2 closed source API-only options?

It's limiting to not be able to call it through routers like LiteLLM & to make a new billing account

Not to mention local- these are presumably small models and I'd take 800 tokens/sec vs 4000/sec with latency any day

yeeyang•5mo ago

What assessments have you made of your work? How do you measure your continuous progress?

mousetree•5mo ago

This looks great - congrats. Would I be able to build an internal "Chat with your code" type product? I'm looking at the "code reranker" api docs and they require a list of files including the code itself. This would probably be infeasible to send all our code just to get relevancy. Are there any local ways of reducing the list of files and code we send?

We're using Github Copilot chat (the interface hosted at https://github.com/copilot) for this purpose today but I'm curious how one would build such a thing ourselves.

willchen•5mo ago

I tried sending an email to support@relace.ai (which was linked in the doc) and got:

Your message wasn't delivered to support@relace.ai because the address couldn't be found, or is unable to receive mail.

btw, I'm interested in trying out relace for my AI app generator tool: http://dyad.sh/

Futurelock: A subtle risk in async Rust

Introducing architecture variants

A theoretical way to circumvent Android developer verification

Leaker reveals which Pixels are vulnerable to Cellebrite phone hacking

Addiction Markets

Use DuckDB-WASM to query TB of data in browser

My Impressions of the MacBook Pro M4

Hacking India's largest automaker: Tata Motors

Perfetto: Swiss army knife for Linux client tracing

AI scrapers request commented scripts

How We Found 7 TiB of Memory Just Sitting Around

Nix Derivation Madness

x86 architecture 1 byte opcodes

Show HN: Pipelex – Declarative language for repeatable AI workflows

Llamafile Returns

Signs of introspection in large language models

Pangolin (YC S25) Is Hiring a Full Stack Software Engineer (Open-Source)

How to build silos and decrease collaboration on purpose

Sustainable memristors from shiitake mycelium for high-frequency bioelectronics

The 1924 New Mexico regional banking panic

Corrosion

Attention lapses due to sleep deprivation due to flushing fluid from brain

Lording it, over: A new history of the modern British aristocracy

The cryptography behind electronic passports

Apple reports fourth quarter results

Just use a button

It's the “hardware”, stupid

AMD could enter ARM market with Sound Wave APU built on TSMC 3nm process

Floppy Disk / Diskettes // retrocmp / retro computing

If a pilot ejects, what is the autopilot programmed to do? (2018)

Futurelock: A subtle risk in async Rust

Introducing architecture variants

A theoretical way to circumvent Android developer verification

Leaker reveals which Pixels are vulnerable to Cellebrite phone hacking

Addiction Markets

Use DuckDB-WASM to query TB of data in browser

My Impressions of the MacBook Pro M4

Hacking India's largest automaker: Tata Motors

Perfetto: Swiss army knife for Linux client tracing

AI scrapers request commented scripts

How We Found 7 TiB of Memory Just Sitting Around

Nix Derivation Madness

x86 architecture 1 byte opcodes

Show HN: Pipelex – Declarative language for repeatable AI workflows

Llamafile Returns

Signs of introspection in large language models

Pangolin (YC S25) Is Hiring a Full Stack Software Engineer (Open-Source)

How to build silos and decrease collaboration on purpose

Sustainable memristors from shiitake mycelium for high-frequency bioelectronics

The 1924 New Mexico regional banking panic

Corrosion

Attention lapses due to sleep deprivation due to flushing fluid from brain

Lording it, over: A new history of the modern British aristocracy

The cryptography behind electronic passports

Apple reports fourth quarter results

Just use a button

It's the “hardware”, stupid

AMD could enter ARM market with Sound Wave APU built on TSMC 3nm process

Floppy Disk / Diskettes // retrocmp / retro computing

If a pilot ejects, what is the autopilot programmed to do? (2018)

Launch HN: Relace (YC W23) – Models for fast and reliable codegen

Comments