frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scaling long-running autonomous coding

https://cursor.com/blog/scaling-agents
63•samwillis•1h ago

Comments

jphoward•1h ago
The browser it built, obviously the context window of the entire project is huge. They mention loads of parallel agents in the blog post, so I guess each agent is given a module to work on, and some tests? And then a 'manager' agent plugs this in without reading the code? Otherwise I can't see how, even with ChatGPT 5.2/Gemini 3, you could do this otherwise? In retrospect it seems an obvious approach and akin to how humans work in teams, but it's still interesting.
simonw•1h ago
GPT-5.2-Codex has a 400,000 token window. Claude 4.5 Opus is half of that, 200,000 tokens.

It turns out to matter a whole lot less than you would expect. Coding Agents are really good at using grep and writing out plans to files, which means they can operate successfully against way more code than fits in their context at a single time.

observationist•1h ago
Get a good "project manager" agents.md and it changes the whole approach of vibe coding. For a professional environment, with each person given a little domain, arranged in the usual hierarchy of your coding team, truly amazing things can get done.

Presumably the security and validation of code still needs work, I haven't read anything that indicates those are solved yet, so people still need to read and understand the code, but we're at the "can do massive projects that work" stage.

Division of labor and planning and hierarchy are all rapidly advancing, the orchestration and coordination capabilities are going to explode in '26.

galaxyLogic•34m ago
> so I guess each agent is given a module to work on, and some tests?

Who created those agents and gives them the tasks to work on. Who created the tests? AI, or the humans?

simonw•1h ago
"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."

I shared my LLM predictions last week, and one of them was that by 2029 "Someone will build a new browser using mainly AI-assisted coding and it won’t even be a surprise" https://simonwillison.net/2026/Jan/8/llm-predictions-for-202... and https://www.youtube.com/watch?v=lVDhQMiAbR8&t=3913s

This project from Cursor is the second attempt I've seen at this now! The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...

cheevly•1h ago
2029? I have no idea why you would think this is so far off. More like Q2 2026.
xmprt•1h ago
You're either overestimating the capabilities of current AI models or underestimating the complexity of building a web browser. There are tons of tiny edge cases and standards to comply with where implementing one standard will break 3 others if not done carefully. AI can't do that right now.
geeunits•58m ago
because it makes him look smart when inevitably he's 'right'
gordonhart•37m ago
Web browsers are insanely hard to get right, that’s why there are only ~3 decent implementations out there currently.
mrefish•55m ago
Time to raise the bar. By 2029 someone will build a new browser using mainly AI-assisted coding and the surprise is that it was designed to be used by pelicans.
bob1029•41m ago
The goal I am currently using for long horizon coding experiments is implementation of a PDF rasterizer given an ISO32000 specification document.
sashank_1509•1h ago
Can a browser expert please go through the code the agent wrote (skim it), and let us know how it is. Is it comparable to ladybird, or Servo, can it ever reach that capability soon?
ZitchDog•1h ago
I used similar techniques to build tjs [1] - the worlds fastest and most accurate json schema validator, with magical TypeScript types. I learned a lot about autonomous programming. I found a similar "planner/delegate" pattern to work really well, with the use of git subtrees to fan out work [2].

I think any large piece of software with well established standards and test suites will be able to be quickly rewritten and optimized by coding agents.

[1] https://github.com/sberan/tjs

[2] /spawn-perf-agents claude command: https://github.com/sberan/tjs/blob/main/.claude/commands/spa...

trjordan•1h ago
This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.

The implied future here is _unreal cool_. Swarms of coding agents that can build anything, with little oversight. Long-running projects that converge on high-quality, complex projects.

But the examples feel thin. Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets. The closest to real code is what they've done with Cursor's codebase .... but it's not merged yet.

I don't want to say, call me when it's merged. But I'm not worried about agents ability to produce millions of lines of code. I'm worried about their ability to intersect with the humans in the real world, both as users of that code and developers who want to build on top of it.

dist-epoch•1h ago
Pretty much everything exists in the training sets. All non-research software is just a mishmash of various standard modules and algorithms.
galaxyLogic•38m ago
Not everything, only code-bases of existing (open-source?) applications.

But what would be the point of re-creating existing applications? It would be useful if you can produce a better version of those applications. But the point in this experiment was to produce something "from scratch" I think. Impressive yes, but is it useful?

A more practically useful task would be for Mozilla Foundation and others to ask AI to fix all bugs in their application(s). And perhaps they are trying to do that, let's wait and see.

risyachka•19m ago
>> why haven't they merged that PR.

because it is absolutely impossible to review that code and there is gazillion issues there.

The only way it can get merged is YOLO and then fix issues for months in prod which kinda defeats the purpose and brings gains close to zero.

dist-epoch•1h ago
So, who is going to compile the browser and post the binaries so we can check it out? (in a sandbox/VM obviously)
mccoyb•1h ago
Supposing agents and their organization improve, it seems like we’re approaching a point where the cost of a piece of software will be driven down to the cost of running the hardware, and the cost of the tokens required to replicate it.

The tokens were “expensive” from the minds of humans …

Daishiman•47m ago
It will be driven down to the cost of having a good project and product manager effectively understanding what the customer wants, which has been the main barrier to excellent software for a good long time.
galaxyLogic•27m ago
And not only understanding what the customer wants, but communicating that unambiguously to the AI. And note who is the "customer" here? Is it the end-users, or is it a client-company which contracts the project-manager for this task? But then the issue is still there, who in the client-company decides exactly what is needed and what the (potential) users want?

I think this situation emphasizes the importance of (something like) Agile. To produce something useful can only happen via experimentation and getting feedback from actual users, and re-iterating relentlessly.

jphelan•1h ago
This looks like extremely brittle code to my eyes. Look at https://github.com/wilsonzlin/fastrender/blob/main/crates/fa...

What is `FrameState::render_placeholder`?

``` pub fn render_placeholder(&self, frame_id: FrameId) -> Result<FrameBuffer, String> { let (width, height) = self.viewport_css; let len = (width as usize) .checked_mul(height as usize) .and_then(|px| px.checked_mul(4)) .ok_or_else(|| "viewport size overflow".to_string())?;

    if len > MAX_FRAME_BYTES {
      return Err(format!(
        "requested frame buffer too large: {width}x{height} => {len} bytes"
      ));
    }

    // Deterministic per-frame fill color to help catch cross-talk in tests/debugging.
    let id = frame_id.0;
    let url_hash = match self.navigation.as_ref() {
      Some(IframeNavigation::Url(url)) => Self::url_hash(url),
      Some(IframeNavigation::AboutBlank) => Self::url_hash("about:blank"),
      Some(IframeNavigation::Srcdoc { content_hash }) => {
        let folded = (*content_hash as u32) ^ ((*content_hash >> 32) as u32);
        Self::url_hash("about:srcdoc") ^ folded
      }
      None => 0,
    };
    let r = (id as u8) ^ (url_hash as u8);
    let g = ((id >> 8) as u8) ^ ((url_hash >> 8) as u8);
    let b = ((id >> 16) as u8) ^ ((url_hash >> 16) as u8);
    let a = 0xFF;

    let mut rgba8 = vec![0u8; len];
    for px in rgba8.chunks_exact_mut(4) {
      px[0] = r;
      px[1] = g;
      px[2] = b;
      px[3] = a;
    }

    Ok(FrameBuffer {
      width,
      height,
      rgba8,
    })
  }
} ```

What is it doing in these diffs?

https://github.com/wilsonzlin/fastrender/commit/f4a0974594e3...

I'd be really curious to see the amount of work/rework over time, and the token/time cost for each additional actual completed test case.

blibble•54m ago
this is certainly an interesting way to pull out an attribute from a tag: https://github.com/wilsonzlin/fastrender/blob/main/crates/fa...
blamestross•14m ago
I suppose brittle code is fine if you have cursor to update and fix it. Ideal really, keeps you dependent.
mk599•1h ago
Define "from scratch" in "building a web browser from scratch". This thing has over 100 crates as dependencies... To implement css layouting, it uses Taffy, a crate used by existing browser implementations...
embedding-shape•1h ago
Did anyone manage to run the tests from the repository itself? The code seems filled with errors and warnings, as far as I can tell none of them because of the platform I'm on (Linux). I went and looked at the Action workflow history for some pages, and seems CI been failing for a while, PRs also all been failing CI but merged. How exactly was this verified to be something to be used as an successful example, or am I misunderstanding what point they are trying to make? They mention a screenshot, but they never actually mention if their goal was successfully met, do they?

I'm not sure the approach of "completely autonomous coding" is the right way to go. I feel like maybe we'll be able to use it more effectively if we think of them as something to be used by a human to accomplish some thing instead, lean into letting the human drive the thing instead, because quality spirals so quickly out of control.

micimize•37m ago
> While it might seem like a simple screenshot, building a browser from scratch is extremely difficult.

> Another experiment was doing an in-place migration of Solid to React in the Cursor codebase. It took over 3 weeks with +266K/-193K edits. As we've started to test the changes, we do believe it's possible to merge this change.

In my view, this post does not go into sufficient detail or nuance to warrant any serious discussion, and the sparseness of info mostly implies failure, especially in the browser case.

It _is_ impressive that the browser repo can do _anything at all_, but if there was anything more noteworthy than that, I feel they'd go into more detail than volume metrics like 30K commits, 1M LoC. For instance, the entire capability on display could be constrained to a handful of lines that delegate to other libs.

And, it "is possible" to merge any change that avoids regressions, but the majority of our craft asks the question "Is it possible to merge _the next_ change? And the next, and the 100th?"

If they merge the MR they're walking the walk.

If they present more analysis of the browser it's worth the talk (not that useful a test if they didn't scrutinize it beyond "it renders")

Until then, it's a mountain of inscrutable agent output that manages to compile, and that contains an execution pathway which can screenshot apple.com by some undiscovered mechanism.

embedding-shape•36m ago
> it's a mountain of inscrutable agent output that manages to compile

But is this actually true? They don't say that as far as I can tell, and it also doesn't compile for me nor their own CI it seems.

tired_and_awake•30m ago
The moment all code is interacted with through agents I cease to care about code quality. The only thing that matters is the quality of the product, cost of maintenance etc. exactly the thing we measure software development orgs against. It could be handy to have these projects deployed to demonstrate their utility and efficacy? Looking at PRs of agents feels a wrong headed, like who cares if agents code is hard to read if agents are managing the code base?
icedchai•6m ago
This is how we wound up with non-technical "engineering managers." Looks good to me.
visarga•6m ago
> Looking at PRs of agents feels a wrong headed

It would be walking the motorcycle.

matthewfcarlson•6m ago
It’s fascinating that many of the issues they faced I’ve seen in human software engineering teams.

Things like integration creating bottlenecks or a lack of consistent top down direction leading to small risk adverse changes instead of bold redesigns. All things I’ve seen before.

Counterpoint: Ben Horowitz on Micromanagement (2007)

https://pmarchive.com/counterpoint_ben_horowitz.html
1•stmw•49s ago•1 comments

Apache DataSketches Rust 0.2.0: A library of stochastic streaming algorithms

https://docs.rs/datasketches/0.2.0/datasketches/
1•tison•3m ago•0 comments

Trouble Redeeming YC Student Event Deal?

1•NirekShetty•4m ago•0 comments

DeepSeek Engram Explained

https://medium.com/@sampan090611/deepseek-engram-explained-how-conditional-memory-and-o-1-lookups...
1•zinc_philip•7m ago•0 comments

EU-US relationship is 'disintegrating,' says Germany's vice chancellor

https://www.politico.eu/article/europe-us-germany-vice-chancellor-lars-klingbeil-donald-trump/
4•doener•9m ago•0 comments

Why being a 'loner' could be good for you [video]

https://www.bbc.com/reel/video/p0kkxh7x/why-being-a-loner-could-be-good-for-you
4•devonnull•12m ago•0 comments

Billion-Dollar Idea Generator

https://www.pivotgpt.ceo/
3•greenRust•12m ago•4 comments

The $150/HR Poet: On Mercor, Kant, and the Administration of Beauty

https://secondvoice.substack.com/p/the-150hr-poet
1•paulpauper•13m ago•0 comments

The political culture that is Malawi

https://www.wsj.com/world/a-custody-battle-over-dogs-rocks-an-african-nation-bab415d8
1•paulpauper•13m ago•0 comments

Build your own programming language (2020)

https://thesephist.com/posts/pl/
1•birdculture•14m ago•0 comments

Show HN: FlixLines – opens 10 GB logs in ~10 seconds in browser (demo)

1•kamxgal•18m ago•0 comments

FBI raids Washington Post journalist's home, seizes devices

https://www.smh.com.au/world/north-america/fbi-raids-home-of-washington-post-journalist-seizes-de...
2•KnuthIsGod•20m ago•1 comments

Experimental dual-boot project for iPhone 7/7 Plus devices

https://github.com/Jinketomy-Masheldia/uPhone
1•mlacks•21m ago•1 comments

AI models are starting to crack high-level math problems

https://techcrunch.com/2026/01/14/ai-models-are-starting-to-crack-high-level-math-problems/
2•teleforce•22m ago•0 comments

Greenland: Macron warns of 'cascading consequences' if US seizes island

https://www.lemonde.fr/en/international/article/2026/01/14/greenland-macron-warns-of-cascading-co...
4•perihelions•23m ago•0 comments

Emergent Gravity Is Quantum Entanglement

https://zenodo.org/records/18238492
1•dmvkmusic•27m ago•1 comments

My Fitbit Buzzed and I Understood Enshittification

https://tidyfirst.substack.com/p/my-fitbit-buzzed-and-i-understood
2•rbanffy•27m ago•2 comments

What's New in Livewire 4

https://saasykit.com/blog/whats-new-in-livewire-v40
1•MarcellusDrum•29m ago•0 comments

My AI got a GitHub account

https://www.maragu.dev/blog/my-ai-got-a-github-account
1•mtlynch•30m ago•0 comments

Keybox Might No Longer Work from February 2026

https://droidwin.com/keybox-might-no-longer-work-from-february-2026/
1•thunderbong•31m ago•0 comments

Show HN: I built a local RAG pipeline to index 28 years of my personal data [video]

https://www.youtube.com/watch?v=3-WIIP_UmUM
3•botwork•32m ago•1 comments

Show HN: Cutting through AI noise with verified startup traction

https://www.trusers.com/
1•kevinbaur•33m ago•1 comments

Gas Town Emergency User Manual

https://steve-yegge.medium.com/gas-town-emergency-user-manual-cf0e4556d74b
1•erhuve•35m ago•0 comments

Cloudflare's broken abuse report system AND lack of staff to review issues

1•rtsam•36m ago•0 comments

The Mature Optimization Handbook

https://carlos.bueno.org/optimization/
1•tosh•37m ago•0 comments

Local File over Cloud App for Fast Context (2023)

https://stephango.com/file-over-app
1•walterbell•37m ago•0 comments

Nick Shirley Exposed Minnesotas Billion Dollar Fraud Scandal [video]

https://www.youtube.com/watch?v=zF2a3aTfA9w
1•zahlman•38m ago•1 comments

Rams Owner Stan Kroenke Becomes Largest Private Landowner in US with 2.7M Acres

https://www.nytimes.com/2026/01/13/realestate/stan-kroenke-largest-private-landowner.html
4•bookofjoe•39m ago•1 comments

(informed?) Opinion: why boys struggle in class

https://www.wsj.com/opinion/why-boys-struggle-in-class-girls-recess-math-5fdeb6ce
1•gsf_emergency_6•39m ago•0 comments

Modder Runs PC in a Chest Freezer

https://www.youtube.com/watch?v=P4W8f-703rI
1•gsf_emergency_6•41m ago•0 comments