Looking at the graph, it would appear there's an implicit "today" in that statement, as they do appear poised to equal or surpass Sonnet 4.5 on that same benchmark in the near future.
It’s the only coding agent I’m actually really motivated to use out of the box because it really does make me feel more productive while the others keep messing up the project, from way too large changes I didn’t ask for all the way to constant syntax and request errors.
It’s the only coding agent I’ve used that feels serious about being a product rather than a prototype. Their effort in improving their stack is totally paying off.
Countless times my requests in the AI chat just hang there for 30+ seconds more until I can retry them.
When I decided to give Claude Code a try (I thought I didn't need it because I used Claude in Cursor) I couldn't believe how faster it was, and literally 100% reliable.
EDIT: given today's release, decided to give it a go. The Composer1 model _is_ fast, but right at the second new agent I started I got this:
> Connection failed. If the problem persists, please check your internet connection or VPN
(Cursor dev)
I would be willing to bet money your issue is on your side. I am a daily user since the beginning and cannot recall when I have had issues like you describe unless it was related to my corp network.
Also, somehow magically, I’ve found Cursor’s Auto mode to be significantly faster than the specific models I’ve tried, Claude being among them.
I would agree it is not as good on doing lengthy work where it’s taking design all the way through implementing a feature in a single shot but trivial is not a good description.
I also don’t think you’re right. 3.5 was recently deprecated and even before then, Cursor has been hitting rate limits with Anthropic. Auto is as much a token cost optimization as it is a rate limit optimization.
Can't help but notice you haven't tried Zed!
It's generation speed is not the problem or the time sink.
It's wrestling with it to get the right output.
---
And just to clarify as maybe I misunderstood again but people are comparing cursor to Claude Code and codex etc here- isn't this whole article all cursor just using different models?
Also, didn't realize you worked at Cursor - I'm a fan of your work - they're lucky to have you!
Totally agree that "smart model" is the table stakes for usefulness these days.
Wow, no kidding. It is quite good!
literally a 30 day old model and you've moved the "low" goalpost all the way there haha. funny how humans work
Speed of model just isn't the bottleneck for me.
Before it I used Opus 4.1, and before that Opus 4.0 and before that Sonnet 4.0 - which each have been getting slightly better. It's not like Sonnet 4.5 is some crazy step function improvement (but the speed over Opus is definitely nice)
I wonder how much the methods/systems/data transfer, if they can pull off the same with their agentic coding model that would be exciting.
Every time I write code myself I find myself racing the AI to get an indentation in before the AI is done... gets annoying
I run Claude Code in the background near constantly for a variety of projects, with --dangerously-skip-permissions, and review progress periodically. Tabbing is only relevant when it's totally failing to make progress and I have to manually intervene, and that to me is a failure scenario that is happening less and less often.
I'm not against YOLO vibe coding, but being against tab completion is just insane to me. At the end of the day, LLMs help you achieve goals quicker. You still need to know what goal you want to achieve, and tab completion basically let's me complete a focused goal nearly as soon as I determine what my goal is.
And it's not remotely "YOLO vibe coding". All the code gets reviewed, and tested thoroughly, and they are worked to specs, and gated by test suites.
What I don't do is babysit the LLM until it's code passes both the test suite and automated review stages, because it's a waste of time.
Others of these projects are research tasks. While I wrote this comment, Claude unilaterally fixed a number of bugs in a compiler.
Usually I'll have several Claude Code sessions running in parallel on different projects, and when one of them stops I will review the code for that project and start it again - either moving forwards or re-doing things that have issues.
I think competition in the space is a good thing, but I'm very skeptical their model will outperform Claude.
I am an ML researcher at Cursor, and worked on this project. Would love to hear any feedback you may have on the model, and can answer question about the blog post.
GPT-5-codex does more research before tackling a task, that is the biggest weakness for me not using Composer yet.
Could you provide any color on whether ACP (from zed) will be supported?
I don't use these tools that much ( I tried and rejected Cursor a while ago, and decided not to use it ) but having played with GPT5 Codex ( as a paying customer) yesterday in regular VSCode , and having had Composer1 do the exact same things just now, it's night and day.
Composer did everything better, didn't stumble where Codex failed, and most importantly, the speed makes a huge difference. It's extremely comfortable to use, congrats.
Edit: I will therefore reconsider my previous rejection
other links across the web:
Cursor Cheetah wouldve been amazing. reusing the Composer name feels like the reverse OpenAI Codex move haha
(Cursor researcher)
[1] https://www.businessinsider.com/no-shoes-policy-in-office-cu...
Do you have to split the plan in parallelizable tasks that could be worked in parallel in one codebase without breaking and confusing the other agents?
It's the most prominent part of the release post - but it's really hard to understand what exactly it's saying.
($1.25 input, $1.25 cache write, $0.13 cache read, and $10 output per million tokens)
> their own internal benchmark that they won't release
If they'd release their internal benchmark suite, it'd make it into the training set of about every LLM, which from a strictly scientific standpoint, invalidates all conclusions drawn from that benchmark from then on. On the other hand, not releasing the benchmark means they could've hand-picked the datapoints to favor them. It's a problem that can't be resolved unfortunately.
We could have third-party groups with evaluation criteria who don't make models or sell A.I.. Strictly evaluators. Alternatively, they have a different type of steady income with the only A.I. work they're doing being evaluation.
Right now, it seems free when you are a Cursor Pro user, but I'd love more clarity on how much it will cost (I can't believe it'll be unlimited usage for subscribers)
It made migrating for everyone using VSCode (probably the single most popular editor) or another vscode forked editor (but at the time it was basically all VSCode) as simple as install and import settings.
I do not think Cursor would have done nearly as well as it has if it didn't. So even though it can be subpar in some areas due to VSCodes baggage, its probably staying that way for a while.
Maybe my complaint is that I wish vscode had more features like intellij, or that intellij was the open source baseline a lot of other things could be built on.
Intellij is not without its cruft and problems, dont get me wrong. But its git integration, search, navigation, database tools - I could go on - all of these features are just so much nicer than what vscode offers.
Still not up to Cursor standards though :)
SWE-grep was able to hit ~700tokens/s and Cursor ~300token/s, hard to compare the precision/recall and cost effectiveness though, considering SWE-grep also adopted a "hack" of running it on Cerebras.
I'm trying to kickstart a RL-based code search project called "op-grep" here[1], still pretty early, but looking for collaborators!
[0]: https://cognition.ai/blog/swe-grep [1]: https://github.com/aperoc/op-grep
ibash•11h ago