frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scaling long-running autonomous coding

https://cursor.com/blog/scaling-agents
68•samwillis•1h ago

Comments

jphoward•1h ago
The browser it built, obviously the context window of the entire project is huge. They mention loads of parallel agents in the blog post, so I guess each agent is given a module to work on, and some tests? And then a 'manager' agent plugs this in without reading the code? Otherwise I can't see how, even with ChatGPT 5.2/Gemini 3, you could do this otherwise? In retrospect it seems an obvious approach and akin to how humans work in teams, but it's still interesting.
simonw•1h ago
GPT-5.2-Codex has a 400,000 token window. Claude 4.5 Opus is half of that, 200,000 tokens.

It turns out to matter a whole lot less than you would expect. Coding Agents are really good at using grep and writing out plans to files, which means they can operate successfully against way more code than fits in their context at a single time.

observationist•1h ago
Get a good "project manager" agents.md and it changes the whole approach of vibe coding. For a professional environment, with each person given a little domain, arranged in the usual hierarchy of your coding team, truly amazing things can get done.

Presumably the security and validation of code still needs work, I haven't read anything that indicates those are solved yet, so people still need to read and understand the code, but we're at the "can do massive projects that work" stage.

Division of labor and planning and hierarchy are all rapidly advancing, the orchestration and coordination capabilities are going to explode in '26.

galaxyLogic•39m ago
> so I guess each agent is given a module to work on, and some tests?

Who created those agents and gives them the tasks to work on. Who created the tests? AI, or the humans?

simonw•1h ago
"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."

I shared my LLM predictions last week, and one of them was that by 2029 "Someone will build a new browser using mainly AI-assisted coding and it won’t even be a surprise" https://simonwillison.net/2026/Jan/8/llm-predictions-for-202... and https://www.youtube.com/watch?v=lVDhQMiAbR8&t=3913s

This project from Cursor is the second attempt I've seen at this now! The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...

cheevly•1h ago
2029? I have no idea why you would think this is so far off. More like Q2 2026.
xmprt•1h ago
You're either overestimating the capabilities of current AI models or underestimating the complexity of building a web browser. There are tons of tiny edge cases and standards to comply with where implementing one standard will break 3 others if not done carefully. AI can't do that right now.
geeunits•1h ago
because it makes him look smart when inevitably he's 'right'
gordonhart•43m ago
Web browsers are insanely hard to get right, that’s why there are only ~3 decent implementations out there currently.
mkoubaa•10m ago
Yeah if you let them index chromium I'm sure it could do it next week. It just won't be original or interesting.
mrefish•1h ago
Time to raise the bar. By 2029 someone will build a new browser using mainly AI-assisted coding and the surprise is that it was designed to be used by pelicans.
bob1029•47m ago
The goal I am currently using for long horizon coding experiments is implementation of a PDF rasterizer given an ISO32000 specification document.
sashank_1509•1h ago
Can a browser expert please go through the code the agent wrote (skim it), and let us know how it is. Is it comparable to ladybird, or Servo, can it ever reach that capability soon?
ZitchDog•1h ago
I used similar techniques to build tjs [1] - the worlds fastest and most accurate json schema validator, with magical TypeScript types. I learned a lot about autonomous programming. I found a similar "planner/delegate" pattern to work really well, with the use of git subtrees to fan out work [2].

I think any large piece of software with well established standards and test suites will be able to be quickly rewritten and optimized by coding agents.

[1] https://github.com/sberan/tjs

[2] /spawn-perf-agents claude command: https://github.com/sberan/tjs/blob/main/.claude/commands/spa...

trjordan•1h ago
This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.

The implied future here is _unreal cool_. Swarms of coding agents that can build anything, with little oversight. Long-running projects that converge on high-quality, complex projects.

But the examples feel thin. Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets. The closest to real code is what they've done with Cursor's codebase .... but it's not merged yet.

I don't want to say, call me when it's merged. But I'm not worried about agents ability to produce millions of lines of code. I'm worried about their ability to intersect with the humans in the real world, both as users of that code and developers who want to build on top of it.

dist-epoch•1h ago
Pretty much everything exists in the training sets. All non-research software is just a mishmash of various standard modules and algorithms.
galaxyLogic•44m ago
Not everything, only code-bases of existing (open-source?) applications.

But what would be the point of re-creating existing applications? It would be useful if you can produce a better version of those applications. But the point in this experiment was to produce something "from scratch" I think. Impressive yes, but is it useful?

A more practically useful task would be for Mozilla Foundation and others to ask AI to fix all bugs in their application(s). And perhaps they are trying to do that, let's wait and see.

risyachka•25m ago
>> why haven't they merged that PR.

because it is absolutely impossible to review that code and there is gazillion issues there.

The only way it can get merged is YOLO and then fix issues for months in prod which kinda defeats the purpose and brings gains close to zero.

dist-epoch•1h ago
So, who is going to compile the browser and post the binaries so we can check it out? (in a sandbox/VM obviously)
mccoyb•1h ago
Supposing agents and their organization improve, it seems like we’re approaching a point where the cost of a piece of software will be driven down to the cost of running the hardware, and the cost of the tokens required to replicate it.

The tokens were “expensive” from the minds of humans …

Daishiman•52m ago
It will be driven down to the cost of having a good project and product manager effectively understanding what the customer wants, which has been the main barrier to excellent software for a good long time.
galaxyLogic•32m ago
And not only understanding what the customer wants, but communicating that unambiguously to the AI. And note who is the "customer" here? Is it the end-users, or is it a client-company which contracts the project-manager for this task? But then the issue is still there, who in the client-company decides exactly what is needed and what the (potential) users want?

I think this situation emphasizes the importance of (something like) Agile. To produce something useful can only happen via experimentation and getting feedback from actual users, and re-iterating relentlessly.

jphelan•1h ago
This looks like extremely brittle code to my eyes. Look at https://github.com/wilsonzlin/fastrender/blob/main/crates/fa...

What is `FrameState::render_placeholder`?

``` pub fn render_placeholder(&self, frame_id: FrameId) -> Result<FrameBuffer, String> { let (width, height) = self.viewport_css; let len = (width as usize) .checked_mul(height as usize) .and_then(|px| px.checked_mul(4)) .ok_or_else(|| "viewport size overflow".to_string())?;

    if len > MAX_FRAME_BYTES {
      return Err(format!(
        "requested frame buffer too large: {width}x{height} => {len} bytes"
      ));
    }

    // Deterministic per-frame fill color to help catch cross-talk in tests/debugging.
    let id = frame_id.0;
    let url_hash = match self.navigation.as_ref() {
      Some(IframeNavigation::Url(url)) => Self::url_hash(url),
      Some(IframeNavigation::AboutBlank) => Self::url_hash("about:blank"),
      Some(IframeNavigation::Srcdoc { content_hash }) => {
        let folded = (*content_hash as u32) ^ ((*content_hash >> 32) as u32);
        Self::url_hash("about:srcdoc") ^ folded
      }
      None => 0,
    };
    let r = (id as u8) ^ (url_hash as u8);
    let g = ((id >> 8) as u8) ^ ((url_hash >> 8) as u8);
    let b = ((id >> 16) as u8) ^ ((url_hash >> 16) as u8);
    let a = 0xFF;

    let mut rgba8 = vec![0u8; len];
    for px in rgba8.chunks_exact_mut(4) {
      px[0] = r;
      px[1] = g;
      px[2] = b;
      px[3] = a;
    }

    Ok(FrameBuffer {
      width,
      height,
      rgba8,
    })
  }
} ```

What is it doing in these diffs?

https://github.com/wilsonzlin/fastrender/commit/f4a0974594e3...

I'd be really curious to see the amount of work/rework over time, and the token/time cost for each additional actual completed test case.

blibble•59m ago
this is certainly an interesting way to pull out an attribute from a tag: https://github.com/wilsonzlin/fastrender/blob/main/crates/fa...
blamestross•19m ago
I suppose brittle code is fine if you have cursor to update and fix it. Ideal really, keeps you dependent.
mk599•1h ago
Define "from scratch" in "building a web browser from scratch". This thing has over 100 crates as dependencies... To implement css layouting, it uses Taffy, a crate used by existing browser implementations...
embedding-shape•1h ago
Did anyone manage to run the tests from the repository itself? The code seems filled with errors and warnings, as far as I can tell none of them because of the platform I'm on (Linux). I went and looked at the Action workflow history for some pages, and seems CI been failing for a while, PRs also all been failing CI but merged. How exactly was this verified to be something to be used as an successful example, or am I misunderstanding what point they are trying to make? They mention a screenshot, but they never actually mention if their goal was successfully met, do they?

I'm not sure the approach of "completely autonomous coding" is the right way to go. I feel like maybe we'll be able to use it more effectively if we think of them as something to be used by a human to accomplish some thing instead, lean into letting the human drive the thing instead, because quality spirals so quickly out of control.

micimize•43m ago
> While it might seem like a simple screenshot, building a browser from scratch is extremely difficult.

> Another experiment was doing an in-place migration of Solid to React in the Cursor codebase. It took over 3 weeks with +266K/-193K edits. As we've started to test the changes, we do believe it's possible to merge this change.

In my view, this post does not go into sufficient detail or nuance to warrant any serious discussion, and the sparseness of info mostly implies failure, especially in the browser case.

It _is_ impressive that the browser repo can do _anything at all_, but if there was anything more noteworthy than that, I feel they'd go into more detail than volume metrics like 30K commits, 1M LoC. For instance, the entire capability on display could be constrained to a handful of lines that delegate to other libs.

And, it "is possible" to merge any change that avoids regressions, but the majority of our craft asks the question "Is it possible to merge _the next_ change? And the next, and the 100th?"

If they merge the MR they're walking the walk.

If they present more analysis of the browser it's worth the talk (not that useful a test if they didn't scrutinize it beyond "it renders")

Until then, it's a mountain of inscrutable agent output that manages to compile, and that contains an execution pathway which can screenshot apple.com by some undiscovered mechanism.

embedding-shape•41m ago
> it's a mountain of inscrutable agent output that manages to compile

But is this actually true? They don't say that as far as I can tell, and it also doesn't compile for me nor their own CI it seems.

sashank_1509•6m ago
Oh it doesn’t compile? that’s very revealing
micimize•4m ago
Hah I don't know actually! I was assuming it must if they were able to get that screenshot video.
tired_and_awake•36m ago
The moment all code is interacted with through agents I cease to care about code quality. The only thing that matters is the quality of the product, cost of maintenance etc. exactly the thing we measure software development orgs against. It could be handy to have these projects deployed to demonstrate their utility and efficacy? Looking at PRs of agents feels a wrong headed, like who cares if agents code is hard to read if agents are managing the code base?
icedchai•12m ago
This is how we wound up with non-technical "engineering managers." Looks good to me.
visarga•12m ago
> Looking at PRs of agents feels a wrong headed

It would be walking the motorcycle.

matthewfcarlson•12m ago
It’s fascinating that many of the issues they faced I’ve seen in human software engineering teams.

Things like integration creating bottlenecks or a lack of consistent top down direction leading to small risk adverse changes instead of bold redesigns. All things I’ve seen before.

2001zhaozhao•5m ago
At least the AI teams aren't politically competing against each other unlike human teams.

(Or are they?)

Claude Cowork Exfiltrates Files

https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files
343•takira•3h ago•158 comments

Scaling long-running autonomous coding

https://cursor.com/blog/scaling-agents
69•samwillis•1h ago•38 comments

The State of OpenSSL for pyca/cryptography

https://cryptography.io/en/latest/statements/state-of-openssl/
44•SGran•2h ago•5 comments

Show HN: WebTiles – create a tiny 250x250 website with neighbors around you

https://webtiles.kicya.net/
94•dimden•4d ago•13 comments

Ask HN: Share your personal website

355•susam•6h ago•1170 comments

Sun Position Calculator

https://drajmarsh.bitbucket.io/earthsun.html
45•sanbor•2h ago•10 comments

Why some clothes shrink in the wash and how to unshrink them

https://www.swinburne.edu.au/news/2025/08/why-some-clothes-shrink-in-the-wash-and-how-to-unshrink...
408•OptionOfT•3d ago•225 comments

Roam 50GB is now Roam 100GB

https://starlink.com/support/article/58c9c8b7-474e-246f-7e3c-06db3221d34d
234•bahmboo•8h ago•257 comments

SparkFun Officially Dropping AdaFruit due to CoC Violation

https://www.sparkfun.com/official-response
350•yaleman•9h ago•356 comments

ChromaDB Explorer

https://www.chroma-explorer.com/
10•arsentjev•1h ago•0 comments

Native ZFS VDEV for Object Storage (OpenZFS Summit)

https://www.zettalane.com/blog/openzfs-summit-2025-mayanas-objbacker.html
69•suprasam•5h ago•16 comments

Generate QR Codes with Pure SQL in PostgreSQL

https://tanelpoder.com/posts/generate-qr-code-with-pure-sql-in-postgres/
16•tanelpoder•4d ago•0 comments

Find a pub that needs you

https://www.ismypubfucked.com/
195•thinkingemote•8h ago•152 comments

Show HN: Webctl – Browser automation for agents based on CLI instead of MCP

https://github.com/cosinusalpha/webctl
56•cosinusalpha•9h ago•15 comments

I hate GitHub Actions with passion

https://xlii.space/eng/i-hate-github-actions-with-passion/
391•xlii•13h ago•289 comments

Ford F-150 Lightning outsold the Cybertruck and was then canceled for poor sales

https://electrek.co/2026/01/13/ford-f150-lightning-outsold-tesla-cybertruck-canceled-not-selling-...
381•MBCook•6h ago•525 comments

Rubik's Cube in Prolog – Order

https://medium.com/@kenichisasagawa/i-am-preparing-material-for-a-prolog-book-af7580acfee7
5•myth_drannon•4d ago•0 comments

The hunt for a stolen Jackson Pollock

https://www.washingtonpost.com/entertainment/art/interactive/2026/jackson-pollock-theft-isaacs-fa...
9•prismatic•15h ago•0 comments

Ask HN: How do you safely give LLMs SSH/DB access?

44•nico•5h ago•69 comments

Ski map artist James Niehues, the 'Monet of the mountains' (2021)

https://adventure.com/ski-map-artist-james-niehues/
101•gyomu•4d ago•11 comments

US, for first time in 50 years, experienced negative net migration in 2025

https://abcnews.go.com/US/us-1st-time-50-years-experienced-negative-net/story?id=129175522
44•pqtyw•1h ago•19 comments

So, you’ve hit an age gate. What now?

https://www.eff.org/deeplinks/2026/01/so-youve-hit-age-gate-what-now
281•hn_acker•6h ago•227 comments

GitHub should charge everyone $1 more per month to fund open source

https://blog.greg.technology/2025/11/27/github-should-charge-1-dollar-more-per-month.html
200•evakhoury•7h ago•186 comments

Every country should set 16 as the minimum age for social media accounts

https://www.afterbabel.com/p/why-every-country-should-set-16
104•paulpauper•4h ago•159 comments

Is Rust faster than C?

https://steveklabnik.com/writing/is-rust-faster-than-c/
203•vincentchau•4d ago•238 comments

Show HN: Digital Carrot – Block social media with programmable rules and goals

https://www.digitalcarrot.app/
30•newswangerd•8h ago•11 comments

I’m leaving Redis for SolidQueue

https://www.simplethread.com/redis-solidqueue/
291•amalinovic•14h ago•122 comments

Edge of Emulation: Game Boy Sewing Machines (2020)

https://shonumi.github.io/articles/art22.html
104•mosura•9h ago•7 comments

Lago (Open-Source Billing) is hiring across teams and geos

1•Rafsark•11h ago

How much of my observability data is waste?

https://usetero.com/blog/the-question-your-observability-vendor-wont-answer
92•binarylogic•7h ago•48 comments