Here’s an example of our apply model vs. whole file edits: https://youtu.be/J0-oYyozUZw
Building reliable code agents is hard. Beyond simple prototypes, any app with code generation in production quickly runs into two problems -- how do you reliably apply diffs, and how do you manage codebase context?
We're focused on solving these two problems at order-of-magnitude lower price and latency.
Our first model that we released, in February, is the Fast Apply model -- it merges code snippets with files at 4300 tok/s. It is more reliable (in terms of merge errors) than Sonnet, Qwen, Llama, or any other model at this task. Each file takes ~900ms and gives an instantaneous user experience, as well as saving ~40% on Claude 4 output tokens.
Our second model focuses on retrieval. For both vibe-coded and enterprise codebases, retrieving only the files relevant to a user request saves both on SoTA input token cost and reduces the number of times code agents need to view files. Our reranker (evals below) can scan a million-line codebase in ~1-2s, and our embedding model outperforms any other embedding model for retrieval as evaluated on a corpus of Typescript/React repositories.
There are many different ways to build coding agents, but being able to edit code reliably and retrieve the most relevant parts of the codebase is going to be a foundational issue. We're excited to be building ways to make it more accessible to millions of users who don't want to spend $$$ on Claude.
These models are used in production, millions of times per week. If you've used Lovable, Create.xyz, Magic Patterns, Codebuff, Tempo Labs then you've used us!
Here's a link to try it out: https://app.relace.ai, and here are our docs: https://docs.relace.ai.
We've opened up free access for prototyping on our website to everyone, and the limits should be enough for personal coding use and building small projects (correct us if it’s not). We integrate directly with Open-Source IDE's like Continue.dev. Please try us out, we'd love to hear your feedback!
bigyabai•1d ago
What is your plan to beat the performance and cost of first-party models like Claude and GPT?
eborgnia•1d ago
ramoz•1d ago
What’s the differentiator or plan for arbitrary query matching?
Latency? If you think about it - not really a huge issue. Spend 20s-1M mapping an entire plan with Gemini for a feature.
Pass that to Claude Code.
At this point you want non-disruptive context moving forward and presumably any new findings would only be redundant with what is in long context already.
Agentic discovery is fairly powerful even without any augmentations. I think Claude Code devs abandoned early embedding architectures.
eborgnia•1d ago
For Cline or Claude Code where there's a dev in the loop, it makes sense to spend more money on Gemeni ranking or more latency on agentic discovery. Prompt-to-app companies (like Lovable) have a flood of impatient non-technical users coming in, so latency and cost become a big consideration.
That's when using a more traditional retrieval approach can be relevant. Our retrieval models are meant to work really well with non-technical queries on these vibe-coded codebases. They are more of a supplement to the agentic discovery approaches, and we're still figuring out how to integrate them in a sensible way.