Improving Composer through real-time RL

https://cursor.com/blog/real-time-rl-for-composer

47•ingve•1d ago

Comments

polishdude20•1h ago

I'd love to see some data for how much it has improved via this process in the last week

heliumtera•38m ago

It would be the same as kimi k2.5, the underlying model

CitrusFruits•1h ago

I've been wondering how they've been able to be so generous with Composer usage with it still making business sense. Seems like this is the answer: presumably they think they'll have a competitive advantage in not just the UX space but the model space as well soon. It's a great strategy, but I do wonder if the moat will be big enough with how fast things are moving and how competitive the model landscape is.

ketzo•14m ago

After seeing the last few releases for GPT and Claude, I’m not sure how anyone (else) is gonna build a durable advantage on proprietary model quality.

The capabilities of the top labs’ models have improved so much in just the last few releases, and I definitely foresee a world where they gate those models away behind 1st-party harnesses/tooling.

kgeist•23m ago

>We used a Kimi base, with midtraining and RL on top. Going forward, we'll include the base used in our blog posts, that was a miss. Also, the license is through Fireworks. [0]

And still no mention of Kimi in a new blog post :)

Also apparently the inference provider they use, Fireworks AI, already has built-in API for RL tuning Kimi [1], so I wonder which parts are Cursor's own effort and where Fireworks AI actually deserves credit, especially since they repeatedly brag about being able to create a new checkpoint every 5 hours, which would be largely thanks to Fireworks AI's API/training infrastructure.

I mean, I'm genuinely curious how much effort it would actually take me to go from "here, lots of user data" to "the model gains +1% on benchmarks" to produce my own finetune, assuming I already use a good existing foundational model, my inference provider already handles all the tuning infrastructure/logic, and I already have a lot of usage logs.

[0] https://news.ycombinator.com/item?id=47459529

[1] https://fireworks.ai/blog/kimi-k2p5

fzysingularity•15m ago

What do you think actually happened here in the past week?

They used Kimi, failed to acknowledge it in the original Composer announcement. Kimi team probably reached out and asked WTF? Their only recourse was to publicly disclose their whitepaper with Kimi mentioned to win brownie points about being open about their training pipeline, while placating the Kimi team.

fzysingularity•5m ago

Real-time or continuous learning is great on paper, but to get this to work without extremely expensive regression testing and catastrophic forgetting is a real challenge.

Credit to the team for taking this on, but I’d be skeptical of announcements like this without at least 3–6 months of proven production deployments. Definitely curious how this plays out.

Make macOS consistently bad (unironically)

Improving Composer through real-time RL

Show HN: Twitch Roulette – Find live streamers who need views the most

Velxio 2.0 – Emulate Arduino, ESP32, and Raspberry Pi 3 in the Browser

ISBN Visualization

Anatomy of the .claude/ folder

LG's new 1Hz display is the secret behind a new laptop's battery life

Don't YOLO your file system

Nashville library launches Memory Lab for digitizing home movies

Telnyx package compromised on PyPI

DOJ confirms FBI Director Kash Patel's personal email was hacked

Installing a Let's Encrypt TLS certificate on a Brother printer with Certbot

Explore the Hidden World of Sand

Meow.camera

Building FireStriker: Making Civic Tech Free

The Future of SCIP

Automatically generate all 3D print files for organizing a drawer

Fets and Crosses: Tic-Tac-Toe built from 2458 discrete transistors

‘Energy independence feels practical’: Europeans building mini solar farms

Type Construction and Cycle Detection

Colorado House passes bill to limit surveillance pricing and wage setting

Embracing Bayesian methods in clinical trials

Capability-Based Security for Redox: Namespace and CWD as Capabilities

Desk for people who work at home with a cat

People inside Microsoft are fighting to drop mandatory Microsoft Account

Slovenia becomes first EU country to introduce fuel rationing

21,864 Yugoslavian .yu domains

Solving Semantle with the Wrong Embeddings

Hold on to Your Hardware

Should QA exist?

Make macOS consistently bad (unironically)

Improving Composer through real-time RL

Show HN: Twitch Roulette – Find live streamers who need views the most

Velxio 2.0 – Emulate Arduino, ESP32, and Raspberry Pi 3 in the Browser

ISBN Visualization

Anatomy of the .claude/ folder

LG's new 1Hz display is the secret behind a new laptop's battery life

Don't YOLO your file system

Nashville library launches Memory Lab for digitizing home movies

Telnyx package compromised on PyPI

DOJ confirms FBI Director Kash Patel's personal email was hacked

Installing a Let's Encrypt TLS certificate on a Brother printer with Certbot

Explore the Hidden World of Sand

Meow.camera

Building FireStriker: Making Civic Tech Free

The Future of SCIP

Automatically generate all 3D print files for organizing a drawer

Fets and Crosses: Tic-Tac-Toe built from 2458 discrete transistors

‘Energy independence feels practical’: Europeans building mini solar farms

Type Construction and Cycle Detection

Colorado House passes bill to limit surveillance pricing and wage setting

Embracing Bayesian methods in clinical trials

Capability-Based Security for Redox: Namespace and CWD as Capabilities

Desk for people who work at home with a cat

People inside Microsoft are fighting to drop mandatory Microsoft Account

Slovenia becomes first EU country to introduce fuel rationing

21,864 Yugoslavian .yu domains

Solving Semantle with the Wrong Embeddings

Hold on to Your Hardware

Should QA exist?

Improving Composer through real-time RL

Comments