Cursor's latest “browser experiment” implied success without evidence

https://embedding-shapes.github.io/cursor-implied-success-without-evidence/

724•embedding-shape•3w ago

Related: Scaling long-running autonomous coding - https://news.ycombinator.com/item?id=46624541 - Jan 2026 (174 comments)

Comments

embedding-shape•3w ago

I'm eager to find out if this was actually successfully compiled at one point (otherwise how did they get the screenshots?), so I'm running `cargo check` for each of the last 100 commits to see if anything works. Will update here with the results once it's ready.

Edit: As mentioned, I ran `cargo check` on all the last 100 commits, and seems every single of them failed in some way: https://gist.github.com/embedding-shapes/f5d096dd10be44ff82b...

techpression•3w ago

I wouldn’t be surprised if any form of screen shot is fake (as in not made the way it claims), in my experience Occam’s razor tends to lead that way when extraordinary claims are made regarding LLM’s.

leerob•3w ago

Should compile now: https://news.ycombinator.com/item?id=46650998

embedding-shape•3w ago

> Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.

> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...

Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/

saghm•3w ago

> otherwise how did they get the screenshots

Their AI is probably better at producing images than writing code

josefritzishere•3w ago

Key phrase "They never actually claim this browser is working and functional " This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny.

embedding-shape•3w ago

In my personal experience, Codex and Claude Code are definitively capable tools when used in certain ways.

What Cursor did with their blogpost seems intentionally and outright misleading, since I'm not able to even run the thing. With Codex/Claude Codex it's relatively easy to download it and run it to try for yourself.

netdevphoenix•3w ago

"definitively capable tools when used in certain ways". This sounds like "if it doesn't work for you is because you don't use in the right way" imo.

Reminds me of SAAP/Salesforce.

embedding-shape•3w ago

Yes, many tools work like that, especially professional tools.

You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time? No, it requires practice and understanding. Every tool works like this, some different difficulties, some different skill levels, but all of them have it in some way.

immibis•3w ago

Not even the Ableton marketing team is telling me I can just fire up Ableton and make great music and if I can't do that I must be a brainwashed doomer.

embedding-shape•3w ago

The argument isn't what OpenAI/Anthropic are selling their users, what I said was:

> are definitively capable tools when used in certain ways

Which I received pushback on. My reply is to that pushback, defending what I said, not what others told you.

Edit: Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way. There is a whole industry of people (teachers) who specialize in specific software/hardware and teaching others "how to hold the tool correctly".

Xorakios•3w ago

or the iPhone...

Capricorn2481•3w ago

> Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way

It's just an odd comparison to begin with. You said

> You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time

I don't think you have to be good at Ableton at all to make good music. I don't think you can even argue it would benefit your music to learn Ableton. There's a crap ton of people who are wizards with their DAW making mediocre music. A DAW can be fun to learn, and that can help me keep my flow state. But it's not literally going to make better music, and the fundamentals of production don't change at all from DAW to DAW.

That's a totally separate thing from LLMs. We are constantly told that if we learn the magic way to use LLMs, we can spit out functioning code a lot faster. But in reality, people are just generating code faster than they can verify it.

embedding-shape•3w ago

> That's a totally separate thing from LLMs. We are constantly told that if we learn the magic way to use LLMs, we can spit out functioning code a lot faster. But in reality, people are just generating code faster than they can verify it.

I don't see it as it is. LLMs are not magically gonna make you be able to produce high-quality software, just like Ableton isn't gonna magically gonna make you be able to produce high-quality music. But if you learn the tool, it gets a lot easier to use effectively. And the better you are at "producing high quality music/code", probably the more use you can make of Ableton/LLMs, compared to someone who aren't good at those things already.

Again, what you're being told by other people, I don't know, and frankly don't really care. OpenAI sold Codex to me as a tool that can help me, a programmer, do programming, and that's exactly what that tool gives me.

Cursor in their article tried to sell their tool as something that can "Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects" which I claim in TFA, doesn't seem to be true.

deathanatos•3w ago

This is the company making the tool that is holding the tool, in this case, claiming that "[they] built a browser" when, if TFA's assertions are correct, they did not "build a browser" by any reasonable interpretation of those words.

(I grant that you're speaking from your experience, about different tools, two replies up, but this claims is just paper-rock-scissorable through these various AI tools. "Oh, this tool's authors are just hype, but this tool works totes-mc-oates…". Fool me once, and all.)

embedding-shape•3w ago

Yes, and apparently is a horrible way, because they've obviously failed to produce a functioning browser. But since I'm the author of TFA, I guess I'm kind of biased in this discussion.

Codex was sold to me as a tool that can help me do program. I tried it, evaluated it, found it helpful, continued using it. Based on my experience, it definitively helps with some tasks. Apparently also, it does not work for others, for some not at all. I know the tool works for me, and I take the claim that it doesn't for others, what am I left to believe in? That the tool doesn't actually work, even though my own experience and usage of it says otherwise?

Codex is still an "AI success", regardless if it could build an entire browser by itself, from scratch, or whatever. It helps as it is today, I wouldn't need it to get better to continue using it.

But even with this perspective, which I'd say is "nuanced" (others would claim "AI zealot" probably), I'm trying to see if what Cursor claims is actually true, that they managed to build a browser in that way. When it doesn't seem true, I call it out. I still disagree with "This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny", and I'm claiming what Cursor is doing here is different.

airstrike•3w ago

FWIW IMHO Windsurf is better than Cursor. Claude Code is better than both for many tasks, but not all.

epolanski•3w ago

> if it doesn't work for you is because you don't use in the right way

That's an almost universal truth that you need to learn how to use any non trivial tool.

Kiro•3w ago

> "definitively capable tools when used in certain ways". This sounds like "if it doesn't work for you is because you don't use in the right way" imo.

Yes, because that's what it is. If you seriously can't get Gemini 3 or Opus 4.5 to work you're either using it wrong or coding on something extremely esoteric.

falloutx•3w ago

> Codex and Claude Code are definitively capable tools when used in certain ways.

They definitely can make some things better and you can do somethings faster, but all the efficiency is gonna get sucked up by companies trying to drop more slop.

hexbin010•3w ago

No you see you just need to prompt it to implement functional and working code. You're just inexperienced and holding it wrong

falloutx•3w ago

$200/month tool (real cost could be $1000/month), but you have to babysit it.

hexbin010•3w ago

Yes that's completely expected. Just like any other tool or service.

It's just like a chisel. Well the chisel company didn't promise to let you become a master craftsman overnight but anyway it's just like a chisel in that you have to learn how to use it. And people expect a chisel to actually chisel through wood out the box but anyway it's exactly like a chisel.

ares623•3w ago

Last I checked the chisel industry promised way less and didn’t hold the entire planet’s economy hostage

falloutx•3w ago

chisel works exactly as advertised but Claude thinks I can use it to cure cancer by Tuesday.

7777777phil•3w ago

I wonder who they actually tried to impress with that? People who understand and appreciate the difficulty of building a browser from scratch would surely be interested to understand what you (or your Agent) did to a degree that they would understand if you didn’t.

koolala•3w ago

It worked to impress people on Twitter...

thewhitetulip•3w ago

You'll see countless posts on LinkedIn about how great LLM is. Nobody goes in depth these days - just superficial posts

Snuggly73•3w ago

Not sure - if it works, then who needs Cursor (and all other IDEs). You just ask for a browser and it comes out of the thin air.

fernandotakai•3w ago

>I wonder who they actually tried to impress with that?

investors?

paulus_magnus2•3w ago

The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]

CEO stated "We built a browser with GPT-5.2 in Cursor"

instead of

"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"

[0] https://cursor.com/blog/scaling-agents

[1] https://x.com/kimmonismus/status/2011776630440558799

[2] https://x.com/mntruell/status/2011562190286045552

[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...

embedding-shape•3w ago

So clearly someone, at some point, managed to run this, surely? That's where the screenshots come from? I just don't understand how, given the code is riddled with errors.

deeth_starr_v•3w ago

Maybe they just asked an AI to create an image of a rendered webpage?

nicoburns•3w ago

Somebody managed to get it to compile https://x.com/CanadaHonk/status/2011612084719796272

But apparently "some pages take a literal minute to load"

embedding-shape•3w ago

> to be clear those 2 hours were fixing compile errors and bugs, not compile time

Seems like "I had to do the last mile myself", not "autonomous coding" which was Cursor's claim here.

deng•3w ago

Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').

fzzzy•3w ago

that’s not guaranteed to work. Other parts of the CodeBase that didn’t conflict could depend on the discarded code.

formerly_proven•3w ago

Well they did mention the code doesn't work.

nyeah•3w ago

Where did Cursor say that?

logicallee•3w ago

It's implied by the fact that early in the post they say:

>"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."

and then near the end, they say:

>"Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects."

This means they only make progress toward it, but do not "build a web browser from scratch".

If you're curious, the State of Utopia (will be available at https://stateofutopia.com ) did build a web browser from scratch, though it used several packages for the networking portion of it.

See my other comments and posts for links.

madeofpalk•3w ago

The point is that the merge conflict was resolved, regardless of whether there was a working product at the end. Which there apparently isn’t.

paulus_magnus2•3w ago

Haha. True, CI success was not part of PR accept criteria at any point.

If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline

formerly_proven•3w ago

If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.

vizzier•3w ago

> but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.

True, but it is shocking how often claude suggests just disabling or removing tests.

ciaranmca•3w ago

100%, trying a bit of an experiment like this(similar in that I mostly just care about playing around with different agents, techniques etc.) it has built out literally hundreds of tests. Dozens of which were almost pointless as it decided to mock apis. When the number of failed tests exceeded 40 it just started disabling tests.

icedchai•3w ago

To be fair, many human developers are fond of pointless tests that mock everything to the extent that no real code is actually exercised. At least the tests are fast though.

falkensmaize•3w ago

Citing the absolute worst practices from terrible developers as a way to exonerate or legitimize LLM code production issues is something we need to stop doing in my opinion. I would not excuse or expect a day one junior on my team that wrote pointless tests or worse yet removed tests to get the CI to pass.

If LLMs do this it should be seen as an issue and should not be overlooked with “people do it too…”. Professional developers do not do this. If we’re going to use Ai for creating production code we need to be honest about its deficiencies.

icedchai•3w ago

I agree, but if LLMs are trained on common practices, best or worst, what do you expect?

Testing, specifically, is heavily opinionated among professional developers.

zephen•3w ago

> it is shocking how often claude suggests just disabling or removing tests.

Arguably, Claude is simply successfully channeling what the developers who wrote the bulk of its training data would do. We've already seen how bad behavior injected into LLMs in one domain causes bad behavior in other domains, so I don't find this particularly shocking.

The next frontier in LLMs has to be distinguishing good training data from bad training data. The companies have to do this, even if only in self defense against the new onslaught of AI-generated slop, and against deliberate LLM poisoning.

If the models become better at critically distinguishing good from bad inputs, particularly if they can learn to treat bad inputs as examples of what not to do, I would expect one benefit of this is that the increased ability of the models to write working code will then greatly increase the willingness of the models to do so, rather than to simply disable failing tests.

icedchai•3w ago

A coworker opened a PR full of AI slop. One of the first things I do is check if the tests pass. Of course, the didn't. I asked them to fix the tests, since there's no point in reviewing broken code.

"Fix the tests." This was interpreted literally, and assert status == 200 got changed to assert status == 500 in several locations. Some tests required more complex edits to make them "pass."

Inquiries about the tests went unanswered. Eventually the 2000 lines of slop was closed without merging.

saghm•3w ago

After a certain point the response to low effort vibe code has to be vibe reviews. Failing tests? Bad vibes, close without merging. Much more efficient than vibe coding too, since no AI is needed.

ewoodrich•3w ago

The sneaky move that I hate most is when Claude (and does seem to mostly be a Claude-ism I haven’t encountered on GPT Codex or GLM) is when dealing with an external data source (API, locally polling hardware, etc) as a “helpful” fallback on failures it returns fake data in the shape of the expected output so that the rest of the code “works”.

Latest example is when I recently vibe coded a little Python MQTT client for a UPS connected to a spare Raspberry Pi to use with Home Assistant, and with a just few turns back and forth I got this extremely cool bespoke tool and felt really fun.

So I spent a while customizing how the data displayed on my Home Assistant dashboard and noticed every single data point was unchanging. It took a while to realize because the available data points wouldn’t be expected to change a whole lot on a fully charged UPS but the voltage and current staying at the exact same value to a decimal place for three hours raised my suspicions.

After reading the code I discovered it had just used one of the sample command line outputs from the UPS tool I gave it to write the CLI parsing logic. When an exception occurred in the parser function it instead returned the sample data so the MQTT portion of the script could still “work”.

Tbf Claude did eventually get it over the finish line once I clarified that yes, using real data from the actual UPS was in fact an important requirement for me in a real time UPS monitoring dashboard…

teiferer•3w ago

Always check the code.

It's similar to early versions of autonomous driving. You's not want to sit in the back seat with nobody at the wheel. That would get you killed guaranteed.

DonHopkins•3w ago

And how is that not good for humanity in an evolutionary sense (as long as it doesn't kill or maim anyone else)?

Tesla owner keeps using Autopilot from backseat—even after being arrested:

https://mashable.com/article/tesla-autopilot-arrest-driving-...

duskdozer•3w ago

Sounds to me like more evidence in favor of the idea that they're meant to play the golden retriever engineer reporting to you, the extremely intelligent manager.

dullcrisp•3w ago

If I had a nickel for every time I’ve seen a human being pull down their pants and defecate in the middle of the street I’d have a couple nickels. That’s not a lot but it suggests that this behavior is not LLM specific.

mickdarling•3w ago

Had humans not been doing this already, I would have walked into Samsung with the demo application that was working an hour before my meeting, rather than the android app that could only show me the opening logo.

There are a lot of really bad human developers out there, too.

moregrist•3w ago

> Entrepreneur, CEO and founder of Tomorrowish a social media DVR

So you flubbed managing a project and are now blaming your employees. Classy.

DonHopkins•3w ago

Nice blog post, gp serial entrepreneur founder bro -- what did your investors think of that?

http://www.mickdarling.com/2019/07/26/busy-summer/

  An embedded page at landr-atlas.com says:

  Attention!

  MacOS Security Center has identified that your system is under threat. 
  Please scan your MacOS as soon as possible to avoid more damage.
  Don't leave this page until you have undertaken all the suggested steps 
  by authorised Antivirus.

  [OK]

mickdarling•2w ago

Thank you for the note. It's not a site I used all that often.

Whether you had anything to do with it or not, I have no idea. And, since you didn't follow best practices and tell me directly rather than trying to score points here, there's really no way of knowing whether you're the one who caused the problem in the first place.

I built a new site without Wordpress. That took in less than a day.

I don't imagine you will alter your behavior to align with general best security practices anytime soon.

DonHopkins•2w ago

> Whether you had anything to do with it or not, I have no idea. And, since you didn't follow best practices and tell me directly rather than trying to score points here, there's really no way of knowing whether you're the one who caused the problem in the first place.

Are you actually accusing me (slyly couched in weasel words, but still explicitly) of hacking your wordpress blog, then pointing it out on Hacker News to score points?

Yeah, you have a point /s: there's really no way to tell if I hacked your blog or not, nor any way of knowing whether any statement is true or not if you're nihilistic enough, but you're going to have to take my word that I didn't, and clean up your own mess without shifting the blame to me, or demanding I should have helped you. You're the one who chose to use wordpress, not me. FYI, "general best security practices" include DON'T USE WORDPRESS.

What possible evidence or delusional reasons do you have to imply that I hacked your wordpress blog? Is your security really that lax and password that easy to guess? And even if I did, then why would I post about it publicly or notify you privately? You sound pathologically paranoid and antisocially aggressive to make such baseless accusations out of the blue, to try to shift the blame to me for your own mistakes. That makes me glad I didn't try to contact you directly. Funny thing for you to complain about when you don't even openly publish your contact email address on your blog or hn profile like I do, though.

mickdarling•3w ago

Wasn't my project to manage. That was a consulting gig. And I fired the client right after this.

teiferer•3w ago

Where I work, that's exceptionally rare to the point practically non-existing.

Tade0•3w ago

If anything, the LLMs had to learn that from somewhere, so they're just copying human behaviour.

aspenmartin•3w ago

I'm definitely in the camp that this browser implementation is shit, but just a reminder: agent training does involve human coding data in early stages of training to bootstrap it but in its reinforcement learning phase it does not -- it learns closer to the way AlphaGo did, self play and verifiable rewards. This is why people are very bullish on agents, there is no limit technically to how well they can learn (unlike LLMs) and we know we will reach superhuman skill, and the crucial crucial reason for this is: verifiable rewards. You have this for coding, you do not have this for e.g. creative tasks etc.

So agents will actually be able to build a {browser, library, etc} that won't be an absolute slopfest, but the real crucial question is when. You need better and more efficient RL training, further scaling (Amodei thinks really scaling is the only thing you technically need here and we have about 3-4 orders of magnitude of headroom left before we hit insurmountable limits), bigger context windows (that models actually handle well) and possibly continual learning paradigms, but solutions to these problems are quite tangible now.

PunchyHamster•3w ago

So, AI agent battle royale

anonzzzies•3w ago

We use claude code a lot for updating systems to a newer minor/major version. We have our own 'base' framework for clients which is a, by now, very large codebase that does 'everything you can possibly need'; so not only auth, but payments, billing, support tickets, email workflows, email wysiwyg editing, landing page editor, blogging, cms, AI /agent workflows etc etc (across our client base, we collect features that are 'generic' enough and create those in the base). It has many updates for the product lead working on it (a senior using Claude code) but we cannot just update our clients (whose versions are sometimes extremely customised/diverging) at the same pace; some do not want updates outside security, some want them once a year etc. In this case AI has been really a productivity booster; our framework always was quite fast moving before AI too when we had 3.5 FTE (client teams are generally much larger, especially the first years) on it but then merging, that to mean; including the new features and improvements in the client version that are in the new framework version without breaking/removing changes on the client side, was a very painful process taking a lot of time and at at least 2 people for an extended period of time; one from the client team, one from the framework team. With CC it is much less painful: it will merge them (it is not allowed, by hooks, to touch the tests), it will run the client tests and the new framework tests and report the difference. That difference is evaluated usually by someone from the client team who will then merge and fix the tests (mostly manually) to reflect the new reality and test the system manually. Claude misses things (especially if functionalities are very similar but not exactly the same, it cannot really pick which to take so it does nothing usually) but the biggest bulk/work is done quickly and usually without causing issues.

efreak•3w ago

`git add .; git merge continue` also "solves" the conflict

nyeah•3w ago

The link [0] implies that the browser worked. Can you help me understand what's "conservative" about that?

DonHopkins•3w ago

> Can you help me understand what's "conservative" about that?

It's the gaslighting.

emp17344•3w ago

This is why AI skeptics exist. We’re now at the point where you can make entirely unsubstantiated claims about AI capability, and even many folks on HN will accept it with a complete lack of discernment. The hype is out of control.

embedding-shape•3w ago

> folks on HN will accept it with a complete lack of discernment

Well, I'm a heavy LLM user, I "believe" LLM helps me a lot for some tasks, but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.

But even us "folks on HN" who generally keep up with where the ecosystem is going, have a limit I suppose. You need to substantiate what you're saying, and if you're saying you've managed to create a browser, better let others verify that somehow.

emp17344•3w ago

Take a look at this thread regarding the original claim: https://news.ycombinator.com/item?id=46624541

The top comment is indeed baseless hype without a hint of skepticism.

embedding-shape•3w ago

The second top comment is my own (skeptical) comment, with 20 points at this moment. Thanks to those 20 people, I felt compelled to write the blog-post in this submission, and try to ask a bit clearer "what is going on?", since apparently we're at least 20 people who is wondering about this.

There is also clearly a lot of other skeptical people in that submission too. Also, simonw (from that top comment) told me themselves "it's not clear that what they built even runs": https://bsky.app/profile/simonwillison.net/post/3mckgw4mxoc2...

blibble•3w ago

> The top comment is indeed baseless hype without a hint of skepticism.

and he wonders why people call him a shill

accepting everything some shit company tells you as gospel is not the default position of a "researcher"

he better hope he's on the right side of history here, as otherwise he will have burnt his reputation

emp17344•3w ago

I certainly don’t think Simon is a shill. He’s obviously a highly talented person, who in my opinion just doesn’t exercise appropriate discernment in some cases.

Edit: Of course, this isn’t a trait unique to Simon either. Everybody has blind spots, and it’s reasonable to be excited when new tech is released. On an unrelated note, my intent is to push back against some of the people here who try to shut down skepticism. Obviously, this doesn’t describe Simon, but I’ve seen others here who try to silence skeptical voices. This comes across as highly controlling and insecure.

simonw•3w ago

See comment here: https://news.ycombinator.com/item?id=46646777#46650837

I do not think you are reacting to what I said in good faith.

> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation

That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.

(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)

simonw•3w ago

As usual, I was careful with my words:

> This project from Cursor is the second attempt I've seen at this now!

I used the word "attempt" very deliberately, to avoid suggesting that either of these two projects had achieved the goal.

I don't see how you can get to "baseless hype without a hint of skepticism" there unless you've already decided to take anything I say in bad faith.

habinero•3w ago

C'mon. Your comment starts with you hyping your own prediction and crowing, "See, it's coming true!"

"But I didn't say this exact word!" and then accusing the other person of bad faith is some textbook DARVO.

simonw•3w ago

Right, because it IS coming true (gotta be careful with that word choice - "coming true" and "has come true" mean different things.)

There are already multiple attempts at building a from-scratch browser with LLM assistance. Unsurprisingly none of them have achieved full working browser status yet, several weeks after their attempts started.

ben_w•3w ago

> but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.

Also with decades experience, I'd say that it depends how big the non-programmer is dreaming:

To agree with you: A well-meaning friend sent an entrepreneur my direction, whose idea was "Uber for aircraft". I tried to figure out exactly what they meant, ending the conversation when I realised all answers were rephrasing of that vague three words pitch, that they didn't really know what they wanted to do in any specific enumerable sense.

LLMs can't solve the problem when even the person asking doesn't know what they want.

But on the other end the scale, I've been asked to give an estimate for an app which, in its entirety, would've been one day's work even with the QA and acceptance testing and going through the Apple App Store upload process. Like, I kept asking if there was any other hidden complexity, and nope, the entire pitch was what you'd give as a pre-interview code-challenge.

An LLM would've spat out the solution to that in less time than I spent with the people who'd asked me to estimate it.

geooff_•3w ago

I think the original post was just headline bait. There is such a fast news cycle around AI that many people would take "Thousands of AI agents collaborate to make a web browser" at face value.

embedding-shape•3w ago

At least I now have something to link to, when this inevitable gets mentioned in some off-hand HN comment, about how "now AI agents can build whole browsers from scratch".

gusmally•3w ago

It's a great post, I will use it for the same. Thank you.

buggy6257•3w ago

Literally happened at work. Breathless thread of people saying how insane it was and then we got to link this and it immediately 180-ed and everyone was like “holy shit that’s messed up”

embedding-shape•3w ago

And I haven't even published part 2 yet!

fernandotakai•3w ago

yup, same. slack thread with a lot of comments with people praising it.

others were quite skeptical, specially the ones that actually perused the code.

themafia•3w ago

A fast news cycle around projects that don't actually work. It's a real bummer that "fake news" became politically charged because it's a perfect description of this segment.

nindalf•3w ago

The CEO said

> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.

"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find

- html5ever

- cssparser

- rquickjs

That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.

[1] - https://github.com/wilsonzlin/fastrender/blob/main/Cargo.tom...

[2] - https://github.com/servo/servo

[3] - https://blogs.igalia.com/mrego/servo-2025-stats/

zipy124•3w ago

Honestly as soon as I saw browser in rust I assumed it had just reproduced the servo source code in part, or utilised its libraries.

nindalf•3w ago

I thought they'd plagiarise, not import. Importing servo's code would make it obvious because it's so easy to look at their dependencies file. And yet ... they did. I really think they thought no one would check?

satvikpendem•3w ago

> And yet ... they did. I really think they thought no one would check?

I doubt even they checked, given they say they just let the agents run autonomously.

bonesss•3w ago

Hypothetically: what if they did check, only in order to ‘check’ they asked the LLM instead of manually verifying and were told a story? Or, perhaps, they did check manually but sometime after the files were subtly changed despite no incentive or reason to do so outside of a passing test? …

Humans who are bad and also bad at coding have predictable, comprehensible, failure modes. They don’t spontaneously sabotage their career and your project because Lord Markov twitched one of its many tails. They also lie for comprehensible reasons with attempts at logical manipulations of fact. They don’t spontaneously lie claiming not to having a nose, apologize for lying and promise to never do it again, then swear they have no nose in the next breath while maintaining eye contact.

Semi-autonomous to autonomous is a doozy of a step.

dormento•3w ago

You know, a good test would be to tell it to write a browser using a custom programming language, or at least some language for which there are no web browsers written.

embedding-shape•3w ago

Write a browser without any access to the internet, is what I'd attempted if I was running this experiment. Just seed it with a bunch of local HTML, CSS and JS files from the various testing suites that exists.

koolala•3w ago

You would want to download all the W3C and WHATWG specifications first.

shermantanktop•3w ago

Some of them practically have pseudocode just waiting to be picked up.

computerex•3w ago

I think that's too restrictive; agents should be allowed to reference the internet like we do.

semi-extrinsic•3w ago

Fortran 90 should fit the bill nicely.

g947o•3w ago

Sounds like it's finally the time to put my matlab license up for good use.

louthy•3w ago

Good idea, I propose Brainfuck

nicoburns•3w ago

Also selectors and taffy.

It's also using weirdly old versions of some dependencies (e.g. wgpu 0.17 from June 2023 when the latest is 28 released in Decemeber 2025)

satvikpendem•3w ago

That is because I've noticed the AI just edits the version management files (package.json, cargo.toml, etc) directly instead of using the build tool (npm add, cargo add), so it always hallucinates a random old version that's found in its training set. I explicitly have to tell the AI to use the build tool whenever I use AI.

bn-l•3w ago

It’s interesting that they don’t even know this

notatallshaw•3w ago

I assume lock and dependency files are in the training data, so predicting version number tokens have high probabilities associated with them.

computerex•3w ago

I was LITERALLY thinking the other day of a niche tool for engineers to help them discover and fix this in the future because at the rate I have seen models version lock dependencies I thought this is going to be a big problem in the future.

satvikpendem•3w ago

Just use Dependi or similar VSCode extensions, they'll tell you if dependencies are outdated.

mikestorrent•3w ago

Bigger companies have vulnerability and version management toolsets like Snyk, Cycode, etc. to help keep things up to date at scale across lots of repos.

ljm•3w ago

You can do prompt injection through versions. The LLM would go back to GitHub in its endless attempt to people please, but dependency managers would ignore it for being invalid.

solid_fuel•3w ago

No need to build a tool for it, engineers can avoid the whole issue by simply avoiding slop-spewing code generation tools. Hell, just never allow an LLM to modify the dependency configuration - if you want to use a library, choose and import it yourself. Like an engineer.

callc•3w ago

Proposal to not tarnish the good name of actual engineers: slopgineers.

Maybe LLemgineers? Slopgrammers?

f311a•3w ago

Yeah, it's

- Servo's HTML parser

- Servo's CSS parser

- QuickJS for JS

- selectors for CSS selector matching

- resvg for SVG rendering

- egui, wgpu, and tiny-skia for rendering

- tungstenite for WebSocket support

And all of that has 3M of lines!

avaer•3w ago

To be fair, even if "from scratch" means "download and build Chromium", that's still nontrivial to accomplish. And with how complicated a modern browser is, you can get into Ship of Theseus philosophy pretty fast.

I wouldn't particularly care what code the agents copied, the bigger indictment is the code doesn't work.

So really, they failed to meet the bar of "download and build Chromium" and there's no point to talk about the code at all.

koolala•3w ago

The whole point of Servo is its not impossible to use like Chromium.

leerob•3w ago

> The JS engine used a custom JS VM being developed in vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.

https://news.ycombinator.com/item?id=46650998

singron•3w ago

It looks like there are two JS backends: quickjs and vm-js (vendor/ecma-rs/vm-js), based on a brief skim of the code. There is some logic to select between the two. I have no idea if either or both of them work.

themafia•3w ago

> is just calling out to code written by humans

We at least it's not outright ripping them off like it usually does.

wmf•3w ago

Is it using Servo's layout code or did Cursor write its own layout? That's one of the hardest parts.

brabel•3w ago

Cursor didn't write anything, they used ChatGPT 5.2.

afishhh•3w ago

It seemingly did but after I saw it define a VerticalAlign twice in different files[1][2][3] I concluded that it's probably not coherent enough to waste time on checking the correctness.

Would be interesting if someone who has managed to run it tries it on some actually complicated text layout edge cases (like RTL breaking that splits a ligature necessitating re-shaping, also add some right-padding in there to spice things up).

[1] https://github.com/wilsonzlin/fastrender/blob/main/src/layou...

[2] https://github.com/wilsonzlin/fastrender/blob/main/src/layou...

[3] Neither being the right place for defining a struct that should go into computed style imo.

nicoburns•3w ago

It's using layout code from my library (Taffy) for Flexbox and CSS Grid. Servo uses Taffy for CSS Grid, and another open source engine that I work on (Blitz) uses it for Flexbox, CSS Grid, Block and float layout.

The older block/inline layout modes seem to be custom code that looks to me similar but not exactly the same as Servo code. But I haven't compared this closely.

I would note that the AI does not seem to have matched either Servo or Blitz in terms of layout: both can layout Google.com better than the posted screenshot.

brabel•3w ago

Why would they think it's a great idea to claim they implemented CSS and JS from scratch when the first thing any programmer would do is to look at the code and immediately find out that they're just using libraries for all of that?! They can't be as dumb as thinking no one would've noticed?!

I guess the answer is that most people will see the claim, read a couple of comments about "how AI can now write browsers, and probably anything else" from people who are happy to take anything at face value if it supports their view (or business) and move on without seeing any of the later comotion. This happens all the time with the news. No one bothers to check later if claims were true, they may live their whole lives believing things that later got disproved.

bflesch•3w ago

I'm actually impressed by their ignorance. I could never sleep at night knowing my product is built on such brazen lies.

Bullshitting and fleecing investors is a skill that needs to be nurtured and perfected over the years.

I wonder how long this can go on.

Who is the dumb money here? Are VCs fleecing "stupid" pension funds until they go under?

Or is it symptom of a larger grifting economy in the US where even the president sells vaporware, and people are just emulating him trying to get a piece of the cake?

moregrist•3w ago

> Why would they think it's a great idea to claim they implemented CSS and JS from scratch when the first thing any programmer would do is to look at the code and immediately find out that they're just using libraries for all of that?!

Programmers were not the target audience for this announcement. I don’t 100% know who was, but you can kind of guess that it was a mix of: VC types for funding, other CEOs for clout, AI influencers to hype Cursor.

Over-hyping a broken demo for funding is a tale as old as time.

That there’s a bit of a fuck-you to us pleb programmers is probably a bonus.

never_inline•3w ago

I don't think he intentionally lied. He just didn't know how to check that and AI wrote

   - [tick mark emoji] implemented CSS and JS rendering from scratch - **no dependencies**.

estearum•3w ago

I mean... Cursor is the CEO's first non-internship job. And it was a VSCode Extension that caught fire atop the largest technological groundswell in a few decades.

The default assumption should be that this is a moderately bright, very inexperienced person who has been put way out over his skis.

ben_w•3w ago

That tracks with what I'm seeing.

Unfortunately for them, I've seen things go very very wrong in this situation. It's very easy to mistake luck-based financial success for skill-based, especially when it happens fresh out of university.

autoexec•3w ago

> They can't be as dumb as thinking no one would've noticed?!

Maybe they're just hoping that there's an investor out there who is exactly that dumb.

ben_w•3w ago

> They can't be as dumb as thinking no one would've noticed?!

With over 20 years of experience as an adult, and more years of noticing dumb mistakes of adults when I was a teen, I can absolutely assure you that even before LLMs were blowing smoke up their user's backsides and flattering their user's intelligence, plenty of people are dumb enough to make mistakes like this without noticing anything was wrong.

For example, I'm currently dealing with customer support people that can't seem to handle two simultaneous requests or read the documents they send me, even after being ordered to pay compensation by an Ombudsman. This kind of person can, of course, already be replaced by an LLM.

vrighter•2w ago

It's because the AI said it did, and nobody bothered to actually check the code.

That, or they have some incentive to lie about it.

I'm not sure which one of these is false (if any)

adamrezich•3w ago

Very sad to see Paul Graham boosting this slop on X.

levocardia•3w ago

I'm reminded of the viral tweet along the lines of "Claude just one-shotted a 10k LOC web app from scratch, 10+ independent modules and full test coverage. None of it works, but it was beautiful nonetheless."

wilsonzlin•3w ago

Thanks for the feedback. I've addressed similar feedback at [0] and provided some more context at [1].

I do want to briefly note that the JS VM is custom and not QuickJS. It also implemented subsystems like the DOM, CSS cascade, inline/block/table layouts, paint systems, text pipeline, and chrome, and I'd push back against the assertion that it merely calls out to external code. I addressed these points in more detail at [0].

[0] https://news.ycombinator.com/item?id=46650998 [1] https://news.ycombinator.com/item?id=46655608

nindalf•3w ago

> I do want to briefly note that the JS VM is custom and not QuickJS

It's hard to verify because your project didn't actually compile. But now that you've fixed the compilation manually, can you demonstrate the javascript actually executing? Some of the people who got the slop compiling claimed credibly that it isn't executing any JavaScript.

You merely have to compile your code, run the binary and open this page - http://acid3.acidtests.org. Feel free to post a video of yourself doing this. Try to avoid the embellishment that has characterised this effort so far.

Snuggly73•3w ago

This is from the "official" build - https://imgur.com/fqGLjSA

The "in progress" build has a slightly different rendering but the same result

nindalf•3w ago

Yeah, it's not executing any JavaScript. Hey Mr. Wilson! You've spent millions creating this worthless slop. How about making sure that the code is actually being executed? Or is that not necessary to raise millions more in VC funding?

user432678•3w ago

Are you telling me AI bros lying about their products? No way that ever happened…

m00dy•3w ago

Cursor CEO got grilled in HN for a good reason.

deng•3w ago

If you look at the original Cursor post, they say they are currently running similar experiments, for instance, this Excel clone:

https://github.com/wilson-anysphere/formula

The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.

felipeerias•3w ago

IMHO people are missing the forest for the trees. The point of this experiment is not to build a functional browser but to develop ways to make agents create large codebases from scratch over a very long time span. A Web browser is just a convenient target because there are lots of documentation, specs and tests available.

noodletheworld•3w ago

...but it didn't develop ways of doing that did it?

Any idiot can have cursor run for 2 weeks and produce a pile of crap that doesn't compile.

You know the brilliant insight they came out with?

> A surprising amount of the system's behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.

i.e. It's kind of hard and we didn't really come up with a better solution than 'make sure you write good prompts'.

Wellll, geeeeeeeee! Thanks for that insight guys!

Come on. This was complete BS. Planners and workers. Cool. Details? Any details? Annnnnnnyyyyy way to replicate it? What sort of prompts did you use? How did you solve the pathalogical behaviours?

Nope. The vagueness in this post... it's not an experiment. It's just fund raising hype.

chrisandchris•3w ago

IMHO, this whole thing could be read with "human" instread of "agent" and would make the exact same amount of sense.

"We put 200 human in a room and gave them instructions how to build a browser. They coded for hours, resolving merge conflicts and producing code that did not build in the end without intervention of seniors []. We think, giving them better instructions leads to better results"

So they actually invented humans? And will it come down to either "managing humans" or "managing agents"? One of both will be more reliable, more predictable and more convenient to work with. And my guess is, it is not an agent...

As it seemed in the git log, something is weird.

saghm•3w ago

The point is to learn how to make very large codebases that don't compile? Why do you need tests and specs if it's not going to even run, much less run correctly?

felipeerias•3w ago

As discussed elsewhere, it is apparently possible to compile and run this particular project. It seems that whatever process they followed allows commits to break the build pretty often.

Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.

embedding-shape•3w ago

> As discussed elsewhere, it is apparently possible to compile and run this particular project.

After a human stepped in to fix it, yes. You can see it yourself here: https://github.com/wilsonzlin/fastrender/issues/98

> Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.

But that's not what they demonstrated here. What they demonstrated, so far, is that you can let agents write millions of lines of code, and eventually if you actually need to run it, some human need to "merge the latest snapshot" or do some other management to actually put together the system into a workable state.

Very different from what their original claims were.

thegeomaster•3w ago

I actually ran this one. It measures some 700k lines of code, and seems to contain things like a full VBA implementation, complex currency and date parsing, etc. But the UI is extremely basic, doesn't seem to expose any of this advanced functionality, and and is buggy to the point of being unusable. Focus will jump around as you type, cells will reset to old values, it will stop responding to keyboard events, etc.

Matthyze•3w ago

Out of curiosity, what is the most difficult thing about building a browser?

MobiusHorizons•3w ago

The very long task list.

Browsers contain several high complexity pieces each of could take a while to build on its own, and interconnect them with reasonably verbose APIs that need to be implemented or at least stubbed out for code to not crash. There is also the difficulty of matching existing implementations quirk for quirk.

I guess the complexity is on-par with operating systems, but with the added compatibility problems that in order to be useful it doesn't just have to load sites intended to be compatible with it, it has to handle sites people actually use on the internet, and those are both a moving target, and tend to use lots of high complexity features that you have to build or at least stub out before the site will even work.

asadotzler•3w ago

In all sincerity, this question is almost identical to "what's the most difficult thing about building an operating system" as a modern browser is tens of millions of lines of code that can run sophisticated applications. It has a network stack, half a dozen parsers, frame construction and reflow modules, composite, render and paint components, front end UI components, an extensibility framework, and more. Each one of these must enable supporting backward compatibility for 30 year old content as well as ridiculously complex contemporary web apps. And it has to load and render sites that a completely programming illiterate fool like me wrote. It must do this all in a performant and secure way using minimal system resources. Also, it probably also must run on Mac, Windows, Linux, Android, iOS, and maybe more.

potamic•3w ago

Check out the list of all CSS specifications [1], and then open any one of them and see how lengthy and elaborate each is. Then do the same for each version of the spec published over the last thirty years. Before you can start, you must read and understand all of this at a great level of depth. Still, specifications never tell the complete story. You must be aware of all the nuances that are implied by each requirement in the spec and know how to handle the zillion corner cases that will crop up inevitably.

And this is just one part. Not even considering the fully sandboxed, mini operating system for running webapps.

[1] https://www.w3.org/Style/CSS/specs.en.html

Pinus•3w ago

I haven’t studied the project that this is a comment on, but: The article notices that something that compiles, runs, and renders a trivial HTML page might be a good starting point, and I would certainly agree with that when it’s humans writing the code. But is it the only way? Instead of maintaining “builds and runs” as a constant and varying what it does, can it make sense to have “a decent-sized subset of browser functionality” as a constant and varying the “builds and runs” bit? (Admittedly, that bit does not seem to be converging here, but I’m curious in more general terms.)

madeofpalk•3w ago

...What use is code if it doesn't build and run? What other way is there to build a browser that doesn't involved 'build and run'?

Writing junk in a text file isn't the hard part.

Pinus•3w ago

Obviously, it has to eventually build and run if there’s to be any point to it, but is it necessary that every, or even any, step along the way builds and runs? I imagine some sort of iterative set-up where one component generates code, more or less "intelligently", and others check it against the C, HTML, JavaScript, CSS and what-have-you specs, and the whole thing iterates until all the checking components are happy. The components can’t be completely separate, of course, they’d have to be more or less intermingled or convergence would be very slow (like when lcamtuf had his fuzzer generate a JPEG out of an empty file), but isn’t that basically what (large) neural networks are; tangled messes of interconnected functions that do things in ways too complicated for anyone to bother figuring out?

malfist•3w ago

How do you iteratively improve a broken codebase that doesn't compile with more than 3 million lines of code?

brabel•3w ago

I don't want to defend the AI slop, but it's common for me to go on for a few weeks without being able to compile everything when doing something realy big. I can still compile individual modules and run their tests, but not the full application (which puts all modules together)... but it may take a lot of time until all modules can come together and actually run the app.

fwip•3w ago

Human brains are big, tangled messes of interconnected neurons that do things in way too complicated to figure out.

That doesn't mean we can usefully build software that is a big, tangled mess.

johntb86•3w ago

In theory you could generate a bunch of code that seems mostly correct and then gradually tweak it until it's closer and closer to compiling/working, but that seems ill-suited to how current AI agents work (or even how people work). AI agents are prone to make very local fixes without an understanding of wider context, where those local fixes break a lot of assumptions in other pieces of code.

It can be very hard to determine if an isolated patch that goes from one broken state to a different broken state is on net an improvement. Even if you were to count compile errors and attempt to minimize them, some compile errors can demonstrate fatal flaws in the design while others are minor syntax issues. It's much easier to say that broken tests are very bad and should be avoided completely, as then it's easier to ensure that no patch makes things worse than it was before.

eloisius•3w ago

> generate a bunch of code that seems mostly correct and then gradually tweak it until it's closer and closer to compiling/working

The diffusion model of software engineering

rsynnott•3w ago

> an it make sense to have “a decent-sized subset of browser functionality” as a constant and varying the “builds and runs” bit?

I mean by definition something that doesn't build and run doesn't have any browser-like functionality at all.

thedelanyo•3w ago

These are stories that solely exist just to sell shovels and would cause one uninformed CEO to layoff actual humans.

AIorNot•3w ago

Lesson 1:

Always take any pronouncement from an AI company (heavily dependent on VC and public sentiment on AI) with a heavy grain of salt..

hype over reality

I’m building an AI startup myself and I know that world and its full of hypsters and hucksters unfortunately - also social media communication + low attention span + AI slop communication is a blight upon todays engineering culture

Snuggly73•3w ago

The latest commit now builds and runs (at least on my Mac). It’s tragically broken and the code is…dunno…something. 3m lines of something.

I couldn’t make it render the apple page that was on the Cursor promo. Maybe they’ve used some other build.

embedding-shape•3w ago

Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.

Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...

Snuggly73•3w ago

Noticed that as well - I think it was “manual”

torginus•3w ago

I am not an expert AI user, but one typical 'failure mode' I see constantly is the AI reimplementing features that already exist in the codebase, or breaking existing ones.

lifetimerubyist•3w ago

> company claims they "built a browser" from scratch

> looks inside

> completely useless and busted

30 billion dollar VS Code fork everyone. When we do start looking at these people for what they are: snake oil salesmen.

They slop laundered the FOSS Servo code into a broken mess and called it a browser, but dumbasses with money will make line go up based on lies. EFF right off.

bn-l•3w ago

30. Billion. Dollars.

Man

lifetimerubyist•3w ago

It's absolutely absurd.

ben_w•3w ago

When I tried it last year, the mac download of this particular VS Code fork was only available as an Intel build. This… was suggestive.

ravenstine•3w ago

I recently was convinced by colleagues to try Cursor at work for project planning. At first I was excited for something even better than Copilot Chat, but then I quickly realized it's just a shitty VS Code fork with maybe a few improvements that are by no means a moat for that company. Worse yet, it makes lots of bizarre mistakes, deletes lines for no good reason, and sometimes even deletes the entire project plan. The scroll on the chat history gets laggy very quickly, and I've found the whole program to be glitchy as hell.

The hype around Cursor is unreal. My suspicion is this company is little more than a parasite with no qualms about getting you to waste your tokens. It's one thing when a product is new, but Cursor has been around since 2023 I guess? The quality of their software is unacceptable.

pavlov•3w ago

The comment that points out that this week-long experiment produced nothing more than a non-functional wrapper for Servo (an existing Rust browser) should be at the top:

https://news.ycombinator.com/item?id=46649046

pera•3w ago

Has anyone tried to rewrite some popular open source project with IA? I imagine modern LLMs can be very effective at license-washing/plagiarizing dependencies, it could be an interesting new benchmark too

benhoyt•3w ago

Not me personally, but a GitHub user wrote a replacement for Go's regexp library that was "up to 3-3000x+ faster than stdlib": https://github.com/coregx/coregex ... at first I was impressed, so started testing it and reporting bugs, but as soon as I ran my own benchmarks, it all fell apart (https://github.com/coregx/coregex/issues/29). After some mostly-bot updates, that issue was closed. But someone else opened a very similar one recently (https://github.com/coregx/coregex/issues/79) -- same deal, "actually, it's slower than the stdlib in my tests". Basically AI slop with poor tests, poor benchmarks, and way oversold. How he's positioning these projects is the problematic bit, I reckon, not the use of AI.

Same user did a similar thing by creating an AWK interpreter written in Go using LLMs: https://github.com/kolkov/uawk -- as the creator of (I think?) the only AWK interpreter written in Go (https://github.com/benhoyt/goawk), I was curious. It turns out that if there's only one item in the training data (GoAWK), AI likes to copy and paste freely from the original. But again, it's poorly tested and poorly benchmarked.

I just don't see how one can get quality like this, without being realistic about code review, testing, and benchmarking.

CuriouslyC•3w ago

To be fair, good benchmarking is hard, most people get it wrong. Scientific training helps.

dragonwriter•3w ago

> up to 3-3000x+ faster than stdlib

Note that this is semantically exactly equivalent to "up to 3000x faster than stdlib" and doesn't actually claim any particular actual speedup since "up to" denotes an upper bound, not a lower bound or expected value. It’s standard misleading-but-not-technically-false marketing language to create a false impression because people tend to focus on the number and ignore the "up to".

supriyo-biswas•3w ago

Reminds me of https://xkcd.com/870/

arcticbull•3w ago

With the “up to 3-3000x+” language the plus leaves us with the entire number line.

Dylan16807•3w ago

When you say "up to" about a list of data points, it's not just a bound. At least one has to reach that amount or it's a lie.

nkrisc•3w ago

Saying “up to” means that bound is the maximum value of the data set. It may be far from the median value, but it is included (or you’re lying). With any other interpretation the phrase has no meaning whatsoever.

nkrisc•3w ago

I will concede, proactively, that "up to" could refer to some maximum possible bound, even if the current set doesn't include a value at that bound, though I would argue that's likely deceptive wording. For example, you could say that each carton of of eggs on a pallet contains up to 12 eggs, because that's the maximum capacity of the carton, even if none of the actual cartons on this pallet actually have 12 eggs in them.

DonHopkins•3w ago

3000x Faster Optimized Random Number Generator: https://xkcd.com/221/

AlexeyBelov•2w ago

Oh yeah, I recognize this guy. The author of most commits in coregex posted his vibecoded projects to Reddit.

I've looked at his other repos and it's the same shit. Responses are also quite funny, does he not realize this reads like the worst of AI?

gorkaerana•3w ago

I think it's fair enough to consider porting a subset of rewriting, in which case there are several successful experiments out there:

- JustHTML [1], which in practice [2] is a port of html5ever [3] to Python.

- justjshtml, which is a port of JustHTML to JavaScript :D [4].

- MiniJinja [5] was recently ported to Go [6].

All three projects have one thing in common: comprehensive test suites which were used to guardrail and guide AI.

References:

1. https://github.com/EmilStenstrom/justhtml

2. https://friendlybit.com/python/writing-justhtml-with-coding-...

3. https://github.com/servo/html5ever

4. https://simonwillison.net/2025/Dec/15/porting-justhtml/

5. https://github.com/mitsuhiko/minijinja

6. https://lucumr.pocoo.org/2026/1/14/minijinja-go-port/

daxfohl•3w ago

Interesting, IIUC the transformer architecture / attention mechanism were initially designed for use in the language translation domain. Maybe after peeling back a few layers, that's still all they're really doing.

nathan_compton•3w ago

This has long been how I have explained LLMs to non-technical people: text transformation engines. To some extent, many common, tedious, activities basically constitute a transformation of text into one well known form from another (even some kinds of reasoning are this) and so LLMs are very useful. But they just transform text between well known forms.

daxfohl•3w ago

And while it appears that lots of problems can be contorted into translation, "if all you have is a hammer, everything looks like a nail". Maybe we do hit a brick wall unless we can come up with a model that more closely aligns with actual human reasoning.

EmilStenstrom•3w ago

As the author, it's a stretch to say that JustHTML is a port of html5ever. While you're right that this was part of the initial prompt, the code is very different, which is typically not what counts as "port". Your mileage may wary.

MrJohz•3w ago

Note that it's not clear that any of the JustHTML ports were actually ports per se, as in the end they all ended up with very different implementations. Instead, it might just be that an LLM generated roughly the same library several different times.

See https://felix.dognebula.com/art/html-parsers-in-portland.htm...

DonHopkins•3w ago

More vibe coded browser modules:

V8 => H8 - JavaScript engine that hates code, misunderstands equality, sponsored by Brendan Eich and "Yes on Prop H8".

Expat => Vexpat - An annoying, irritating rewrite of an XML parser.

libxml2 => libxmlpoo - XML parsing, same quality as the spec.

libxslt => libxsalt - XSLT transforms with extra salt in the wound.

Protobuf => Probabuf - Probably serializes correctly, probably not, fuzzy logic.

Cap'n Proto => Crap'n Proto - Zero-copy, zero quality.

cURL => cHURL - Throws requests violently serverward, projectile URLemitting.

SDL => STD - Sexually Transmitted Dependency. It never leaves and spreads bugs to everything you touch.

Servo => Swervo - Drunk, wobbly layout that can't stay on the road.

WebKit => WebShite - British pronunciation, British quality control.

Blink => Blinkered - Only renders pages it agrees with politically.

Taffy => Daffy - Duck typed Flexbox layout that's completely unhinged. "You're dethpicable!"

html5ever => html5never - Servo's HTML parser that never finishes tokenizing.

Skia => SkAI - AI-generated graphics that hallucinates extra pixels and fingers.

FreeType => FreeTypo - Introduces typos during keming and rasterization.

Firefox => Foxfire - Burns through your battery in 12 minutes, while molesting children.

WebGL => WebGLitch - Shader compilation errors as art.

WebGPU => WebGPUke - Makes your GPU physically ill.

SQLite => SQLHeavy - Embedded database, 400MB per query.

Vulkan => Vulcan't - Low-level graphics that can't.

Clang => Clanger - Drops errors loudly at runtime.

libevent => liebevent - Event library that lies about readiness.

Opus => Oops - Audio codec, "oops, your audio's gone."

All modules now available on GitPub:

GitHub => GitPub - Microsoft's vibe control system optimized for the Ballmer Peak. Commit quality peaks at 0.129% BAC, mass reverts at 0.15%.

hedgehog•3w ago

I used one of the assistants to reverse and rewrite a browser-hosted JS game-like app to desktop Rust. It required a lot of steering but it was pretty useful.

quotemstr•3w ago

Negative results are great. When you publish them on purpose, it's honorable. When you reveal them by accidentally, it's hilarious. Cheers to Cursor for today's entertainment.

gjsman-1000•3w ago

What the hell?

I was seeing screenshots and actually getting scared for my job for a second.

It’s broken and there’s no browser engine? Cursor should be tarred and feathered.

autoexec•3w ago

A lie like this seems like it should be considered fraud

thrwaway55•2w ago

FSD next year! Our safeguards are a joke so it's not surprising to see this behavior

AstroBen•3w ago

Apparebtly this person actually got it to compile: https://xcancel.com/CanadaHonk/status/2011612084719796272#m

observationist•3w ago

https://x.com/CanadaHonk/status/2011612084719796272 as well.

I went through the motions. There are various points in the repo history where compilation is possible, but it's obscure. They got it to compile and operate prior to the article, but several of the PRs since that point broke everything, and this guy went through the effort of fixing it. I'm pretty sure you can just identify the last working commit and pull the version from there, but working out when looks like a big pain in the butt for a proof of concept.

embedding-shape•3w ago

> but several of the PRs since that point broke everything, and this guy went through the effort of fixing it. I'm pretty sure you can just identify the last working commit and pull the version from there, but working out when looks like a big pain in the butt for a proof of concept.

I went through the last 100 commits (https://news.ycombinator.com/item?id=46647037) and nothing there was working (yet/since). Seems now after a developer corrected something it managed to pass `cargo check` without errors, since commit 526e0846151b47cc9f4fcedcc1aeee3cca5792c1 (Jan 16 02:15:02 2026 -0800)

observationist•3w ago

There are conversations elsewhere - I'd have to go look through them, but at some point about an hour before the article was published, it could be compiled, and then things got pushed that broke it again? There's no central discussion, I had to piece together information from multiple threads.

Sorry, I should have taken notes, lol. At any rate, it was so much digging around I just gave up, I didn't want to invest more effort into it. I figured they'd get a stable version for others to try and I'd return to it at some point.

mvdtnz•3w ago

Why is the top comment on this item just a link to another comment on this same story?

M4v3R•3w ago

It’s not just a wrapper for Servo, the linked poster just checked the dependencies in the Cargo file and proclaimed that without checking anything further.

In reality this project does indeed implement a functioning custom JS Engine, Layout engine, painting etc. It does borrow the CSS selectors package from Servo but that’s about it.

oefrha•3w ago

Yeah there's more to a browser than a couple of out-of-tree servo components, otherwise https://github.com/servo/servo wouldn't have 300k+ lines of Rust code, 400k+ if you count comments and blanks (I cloned the repo, nuked the tests directory, then did a count).

Plus that linked comment doesn't even say it's "nothing more than a non-functional wrapper for Servo". It disputes the "from scratch" claim.

Most people aren't interested in a nuanced take though. Someone said something plausible sounding and was voted to top by other people? Good enough for me, have another vote. Then twist and exaggerate a little and post it to another comment section. Get more votes. Rinse and repeat.

pera•3w ago

"Borrow" is an interesting choice of word, see for example this:

    /// The quirks mode of the document.
    #[inline]
    pub fn quirks_mode(&self) -> QuirksMode {
        self.quirks_mode
    }

https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...

And then this:

    /// The quirks mode of the document.
    pub fn quirks_mode(&self) -> QuirksMode {
        self.stylist.quirks_mode()
    }

https://github.com/servo/stylo/blob/71737ad5c8b29c143a6c992a...

It seems ChatGPT is still copying segments of code almost verbatim, although sometimes it does weird things, compare these for example:

https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...

https://github.com/servo/stylo/blob/71737ad5c8b29c143a6c992a...

torginus•3w ago

Interesting, I remembered that when trying out Stable Diffusion, once I ventured outside of the realm of anime waifus, the images ended up being so similar to existing sources, that image search could find the references.

Which is also kinda crazy since superficially there was very little similar between the 2 images, but I guess AI models used for image search converge on similar embedding than the ones used for AI generation.

Snuggly73•3w ago

Well, could it be because it was instructed to kinda "study" Servo?

https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...

nindalf•3w ago

In your hurry to defend this slop you didn't do your due diligence. You know that 1 million LoC JS VM? Yeah, it isn't actually running - https://imgur.com/fqGLjSA. And you can tell this is actually the case because it's been brought up a few times on this thread and that guy has ducked around it.

wilsonzlin•3w ago

I've responded to this claim in more detail at [0], with additional context at [1].

Briefly, the project implemented substantial components, including a JS VM, DOM, CSS cascade, inline/block/table layout, paint systems, text pipeline, and chrome, and is not merely a Servo wrapper.

[0] https://news.ycombinator.com/item?id=46650998

[1] https://news.ycombinator.com/item?id=46655608

embedding-shape•3w ago

Could you somewhere make clear exactly how much of the code was "autonomously" built vs how much was steered by humans? Because at this point it's clear that it wasn't 100% autonomous as originally claimed, but right now it's not clear if this was just the work of an engineer running Cursor vs "autonomously organised a fleet of agents".

pera•3w ago

Just for context, this was the original claim by Cursor's CEO on Twitter:

> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.

> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.

> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.

https://xcancel.com/mntruell/status/2011562190286045552#m

delusional•3w ago

I cannot make these two statements true at the same time in my head:

> Briefly, the project implemented substantial components, including a JS VM

and from the linked reply:

> vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.

If it's using a copy of your personal JS parser that you decided it should use, then it didn't implement it "autonomously". The references you're linking don't summarize to the brief you've provided.

What the fuck is going on?

nonima•3w ago

It's funny how their whole grift hinges on people not reading clearly.

Roark66•3w ago

Does any of it actually work? Can you build that JS VM separately and run serious JS on it? That would be an accomplishment.

Looking at the comments and claims (I've not got the time to review a large code base just to check this claim) I get an impression _something_ was created, but none of it actually builds and no one knows what is the actual plan.

Did your process not involve recursive planning stages (these ALWAYS have big architectural error and gotchas in my experience, unless you're doing a small toy project or something the AI has seen thousands of already).

I find agents doing pretty well once you have a human correct their bad assumptions and architectural errors. But this assumes the human has absolute understanding of what is being done down to the tiniest component. There will be errors agents left to their own will discover at the very end after spending dozens of millions of tokens, then they will try the next idea they hallucinated, spend another few dozen million tokens and so on. Perhaps after 10 iterations like this they may arrive at something fine or more likely they will descent into hallucinations hell.

This is what happens when one of :the complexity, the size, or it being novel enough (often a mix of all 3) of the task exceed the capability of the agents.

The true way to success is the way of a human-ai hybrid, but you absolutely need a human that knows their stuff.

Let me give you a small example from systems field. The other day I wanted to design an AI observability system with the following spec: - use existing OS components, none or as little code as possible - ideally runs on stateless pods on an air gapped k3s cluster (preferably uses one of existing DBs, but clickhouse acceptable) - able to proxy openai, anthropic(both api and clause max), google(vercel+gemini), deepinfra, openrouter including client auth (so it is completely transparent to the client) - reconstruct streaming responses, recognises tool calls, reasoning content, nice to have ability to define own session/conversation recognition rules

I used gemini 3 and opus 4.5 for the initial planning/comparison of os projects that could be useful. Both converged on helicone as being supposedly the best. Until towards the very end of implementation it was found helicone has pretty much zero docs for properly setting up self hosted platform, it tries redirecting to their Web page for auth and agents immediately went into rewriting parts of the source attempting to write their own auth/fixing imaginary bugs that were really miscondiguration.

Then another product was recommended (I forgot which), there upon very detailed questioning, requesting re-confirmations of actual configs for multiple features that were supposedly supported it turned out it didn't pass through auth for clause max.

Eventually I chose litellm+langfuse (that was turned down initially in favour of helicone) and I needed to make few small code changes so Claude max auth could be read, additional headers could be passed through and within a single endpoint it could send Claude telemetry as pure pass through and real llm api through it's "models" engine (so it recognised tool calls and so on).

kxbnb•2w ago

This spec reads like what we've been building at toran.sh - transparent proxy for AI API calls with observability.

The core idea: you create a "toran" (read-only inspection endpoint) bound to a single upstream. Point your client at the toran URL instead of the API directly. No SDK changes, no code changes - just swap the base URL. It shows exactly what went over the wire in real time.

For the multi-provider setup you're describing (OpenAI, Anthropic, Google, etc.), you'd create separate torans for each upstream. Auth passthrough works because the toran is transparent - it just forwards headers.

We're still early (focused on the "see what's happening" problem before tackling rate limiting/policy), but if the visibility piece would help with your setup, happy to give you access and hear how it compares to litellm+langfuse for your use case.

csomar•3w ago

Did you actually review these implementations and compare them to Servo (and WebKit)? Can you point to a specific part or component that was fully created by the LLM but doesn't clearly resemble anything in existing browser engines?

nindalf•3w ago

You're claiming that the JS VM was implemented. Is it actually running? Because this screenshot shows that the ACID3 benchmark is requesting that you enable JavaScript (https://imgur.com/fqGLjSA). Why don't you upload a video of you loading this page?

Your slop is worthless except to convince gullible investors to give you more money.

holoduke•3w ago

A bit off topic, but fun for people having lots of Claude credits. Auto Claude is a nice opensource repo to let Claude generate entire application from just one prompt. Lots of Jolo vibing her, but nevertheless impressive. Last week I asked it in one sentence to create a full blown Hotel website including all the software tools for backoffice. It took almost 4 days with 4 Claude accounts. It actually created a working thing.

chaosprint•3w ago

I really doubt this marketing approach is effective. Isn't this just shooting themselves in the foot? My actual experience with Cursor has been: their design is excellent and the UX is great—it handles frontend work reasonably well. But as soon as you go deeper, it becomes very prone to serious bugs. While the addition of Claude's new models has helped somewhat, the results are still not as good as Google's Antigravity (despite its poor UX and numerous bugs). What's worse, even with this much-hyped Claude model, you can easily blow through the $20 subscription limit in just a few days. Maybe they're betting on models becoming 10x better and 10x cheaper, but that seems unlikely to happen anytime soon.

bonesss•3w ago

Hitting my head into buggy apps made by these AI companies and seeing them all be amazed in parallel that skills/MCP would be necessary for real work has me pretty relaxed about ‘our jobs’.

OpenAIs business-model floundering, degenerating inline to ads soon (lol), shows what can be done with infini-LLM, infini-capital, and all the smarts & connections on Earth… broadly speaking, I think the geniuses at Google who invented a lot of this shizz understand it and were leveraging it appropriately before ChatGPT blew up.

thewhitetulip•3w ago

We use mcp at work. Due to some typo the model ran absolutely random queries on our database most of the cases. We had initially kept ot open ended but after that, we wrote custom tools that took an input, gave an output and that was strictly mentioned in the prompt. Only then did it work fine.

ryanisnan•3w ago

The amount of negativity in the original post was astounding.

People were making all sorts of statements like: - “I cloned it and there were loads of compiler warnings” - “the commit build success rate was a joke” - “it used 3rd party libs” - “it is AI slop”

What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.

If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.

Don’t look at where the tech is. Look where it’s going.

embedding-shape•3w ago

As mentioned elsewhere (I'm the author of this blogpost), I'm a heavy LLM user myself, use it everyday as a tool, get lots of benefits from it. It's not a "hit post" on using LLM tools for development, it's a post about Cursor making grand claims without being able to back them up.

No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.

ryanisnan•3w ago

I think for the point of the article, it appeared to, at some point, render homepages for select well known sites. I certainly did not expect this to be a serious browser, with any reliability or legs. I don’t think that is dishonest.

embedding-shape•3w ago

> I certainly did not expect this to be a serious browser, with any reliability or legs.

Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.

ryanisnan•3w ago

My apologies - my point(s) were more about the original submission for the Cursor blog post, not your post itself.

I did read your post, and agree with what you're saying. It would be great if they pushed the agents to favour reliability or reproducibility, instead of just marching forwards.

svieira•3w ago

> What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.

Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).

[1]: https://github.com/steveyegge/gastown

serial_dev•3w ago

It is hard to look at where it is going when there are so many lies about where the tech is today. There are extraordinary claims made on Twitter all the time about the technology, but when you look into things, it’s all just smoke and mirrors, the claims misrepresent the reality.

jcims•3w ago

People that spend time poking holes in random vendor claims remind me of folks you see video of standing on the beach during a tsunami warning. Their eyes fixed on the horizon looking for a hundred foot wave, oblivious to the shore in front of them rapidly being gobbled up by the sea.

gordonhart•3w ago

> oblivious to the shore in front of them rapidly being gobbled up by the sea

Am I misunderstanding this metaphor? Tsunamis pull the sea back before making landfall.

alfalfasprout•3w ago

What a silly take. Where the tech is is extremely relevant. The reality of this blog post is it shows the tech is clearly not going anywhere better either, as they seem to imply. 24 hours of useless code is still useless code.

This idea that quality doesn't matter is silly. Quality is critical for things to work, scale, and be extensible. By either LLMs or humans.

malfist•3w ago

>If you are hung up on commit build quality

I'm sorry but what? Are you really trying to argue that it doesn't matter that nothing works, that all it produced is garbage and that what is really important is that it made that garbage really quickly without human oversight?

That's.....that's not success.

ryanisnan•3w ago

Quality absolutely matters, but it's hyper context dependent.

Not everything needs to, or should have the same quality standards applied to them. For the purposes of the Cursor post, it doesn't bother me that most of the commits produced failed builds. I assume, from their post, that at some points, it was capable of building, and rendering the pages shown in the video on the post. That alone, is the thing that I think is interesting.

Would I use this browser? Absolutely not. Do I trust the code? Not a chance in hell. Is that the point? No.

malfist•3w ago

"Quality" here isn't if A is better than B. It's "Does this thing actually work at all?"

Sure, I don't care too much if the restaurant serves me food with silverware that is 18/10 vs 18/0 stainless steel, but I absolutely do care if I order a pizza and they just dump a load of gravel onto my plate and tell me it's good enough, and after all, quality isn't the point.

dragonwriter•3w ago

> Quality absolutely matters, but it's hyper context dependent.

There are very few software development contexts where the quality metric of “does the project build and run at all” doesn’t matter quite a lot.

falkensmaize•3w ago

Software that won’t compile and doesn’t do anything is not software, it’s just a collection of text files. A computer that won’t boot isn’t a computer anymore, it’s a paperweight. A car that won’t start isn’t a car anymore, it’s scrap metal.

I can bang on a keyboard for a week and produce tons of text files - but if they don’t do anything useful, would you consider me a programmer?

array_key_first•3w ago

Spending 24h/day to build nothing isn't impressive - it's really, really bad. That's worse than spending 8h/day to build nothing.

If the piece of shit can't even compile, it's equivalent to 0 lines of code.

> Don’t look at where the tech is. Look where it’s going.

Given that the people making the tech seem incapable of not lying, that doesn't give me hope for where it's going!

Look, I think AI and LLMs in particular are important. But the people actively developing them do not give me any confidence. And, neither do comments like these. If I wanted to believe that all of this is in vain, I would just talk to people like you.

ben_w•3w ago

> What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.

The reason I have yet to publish a book is not because I can't write words. I got to 120k words or so, but they never felt like the right words.

Nobody's giving me (nor should they give me) a participation trophy for writing 120k words that don't form a satisfying novel.

Same's true here. We all know that LLMs can write a huge quantity of code. Thing is, so does:

  yes 'printf("Hello World!");'

The hard part, the entire reason to either be afraid for our careers or thrilled we can switch to something more productive than being code monkeys for yet-another-CRUD-app (depending on how we feel), that's the specific test that this experiment failed at.

noosphr•3w ago

If this is what makes the AI bubble pop I'll laugh so hard.

only-one1701•3w ago

Wishful thinking. They’re trying to (and maybe successfully) doing a military-industrial complex style thing with AI.

noosphr•3w ago

Probably, but this is one of the few cases where instead of being told how amazing some AI tool is we are shown just what it can do.

mikojan•3w ago

Dear god please let AI get forever stuck at this point because it would be so funny

themafia•3w ago

Just view the "input cost" vs "output accuracy" graph.

It _is_ stuck at this point.

There's so much money involved no one wants to admit it out loud.

They have no path to the necessary exponential gains and no one is actually working on it.

bn-l•3w ago

The greatest grift of all time.

I don’t mean the tech itself—-which is kind of useful. I mean the 99% of the value inflation of a kind of useful tool (if you know what you’re doing).

tyre•3w ago

AI is not a bigger grift than crypto. Crypto produced basically nothing of value. If all model improvement stops today, Opus 4.5 with Claude Code is a huge leap in productivity building certain types of software.

mirsadm•3w ago

I would disagree on the huge boost to productivity but it is a very useful tool.

Kiro•3w ago

Hilarious thing to say when we've just had some of the biggest leaps ever with Gemini 3 and Opus 4.5.

lifetimerubyist•3w ago

Just one more new model bro the next one is AGI bro just give me a trillion dollars and I’ll build the datacenters and everything will be perfect bro I promise bro please

Kiro•3w ago

Even if it doesn't see any improvements beyond this point it wouldn't be a big deal. It's good enough for most programmers and any improvements are just a bonus.

mikojan•3w ago

The masters of mankind are yearning to replace expensive tech workers with this. With agentic versions of LLMs we are at a point now where they can (and should) certainly try and create a more hilarious world

ankit219•3w ago

Like it or not, it's a fundraising strategy. They have followed it mutliple times (eg: vague posts about how much their inhouse model is writing code, online RL, and lines of code etc. earlier) and it was less vague before. They released a model and did not give us the exact benchmarks or even tell us the base model for the same. This is not to imply there is no substance behind it, but they are not as public about their findings as one would like them to be. Not a criticism, just an observation.

alfalfasprout•3w ago

Unfortunately all the major LLM companies have realized the truth doesn't really matter anymore. We even saw this with the GPT-5 launch with obviously vibe coded + nebulous metrics.

Diminishing returns are starting to really set in and companies are desperate for any illusion to the contrary.

themafia•3w ago

I don't like it. It's lying in order to capture more market value than they're entitled to. The ends do not justify the means. This is a criticism.

emp17344•3w ago

Basically, fraud. Low-level fraud, but still fraud.

csomar•3w ago

Low-level fraud? It’s used to raise billions that could have been used for other purposes.

nerdponx•3w ago

Fraud is just marketing in the 2020s now.

skciva•3w ago

I'm not a fan of this either but I fail to see how its much different than the happy path tech demos of old.

drawfloat•3w ago

The happy path was functional.

horsawlarway•3w ago

Mmm, as someone forced to write a lot of last minute demos for a startup right out of school that ended up raising ~100MM, there's a fair bit of wiggle room in "Functional".

Not that I would excuse Cursor if they're fudging this either - My opinion is that a large part of the growing skepticism and general disillusionment that permeates among engineers in the industry (ex - the jokes about exiting tech to be a farmer or carpenter, or things like https://imgur.com/6wbgy2L) comes from seeing first hand that being misleading, abusive, or outright lying are often rewarded quite well, and it's not a particularly new phenomenon.

drawfloat•3w ago

But this isn’t wiggle room, it flat out doesn’t compile or run.

horsawlarway•2w ago

Yes. Very naive to assume the demos do.

The worst of them are literal mockups of a feature in the same vein as figma... a screenshot with a hotzone that when clicked shows another screenshot that implies a thing was done, when no such thing was done.

Jcampuzano2•3w ago

Never releasing the benchmarks or being openly benched unlike literally every other model provider always irked me.

I think they know they're on the backfoot at the moment. Cursor was hot news for a long time but now it seems terminal based agents are the hot commodity and I rarely see cursor mentioned. Sure they already have enterprise contracts signed but even at my company we're about to swap from a contract with cursor to Claude code because everyone wants to use that instead now - especially since it doesn't tie you to one editor.

So I think they're really trying to get "something" out there that sticks and puts them in the limelight. Long context/sessions are one of the hot things especially with Ralph being the hot topic so this lines up with that.

Also I know cursor has its own cli but I rarely see mention of it.

PlatoIsADisease•3w ago

I used to hate this, I've seen Apple do it with claims of security and privacy, I've seen populist demagogues do this with every proposal they make. Now I realize this is just the reality of the world.

Its just a reminder not to trust, instead verify. Its more expensive, but trust only leads to pain.

callc•3w ago

“Lying is just the reality of the world” is a cop-out

Don’t give them, or anyone, a free pass for bad behavior.

pessimizer•3w ago

The reality of the world is that nobody needs a pass from you.

autoexec•3w ago

Fraud, lies, and corruption are so often the reality of the world right now because people keep getting away with it. The moment they're commonly and meaningfully held accountable for lying to the public we'll start seeing it happen less often. This isn't something that can't be improved, it just takes enough people willing to work together to do something about it.

nerdponx•3w ago

Several major world powers right now are at the endgame of a decades-long campaign to return to a new Gilded Age and prevent it from ending any time soon. Destroying the public's belief in objective truth and fact is part of the plan. A side effect is that fraud in general becomes normalized. "We are cooked" as the kids say.

autoexec•3w ago

Fraud is not a very innovative fundraising strategy, but sadly it does sometimes work

ironbound•3w ago

Devin 2.0

heliumtera•3w ago

Making it compile will considerably decrease productivity. PR number go up

logicallee•3w ago

(this has been fixed)

embedding-shape•3w ago

Thank you for telling me about the email, it had a typo :( Been fixed now.

Regarding the downvotes, I think it's because it's feeling like you're pushing your project although it isn't really super relevant to the topic. The topic is specifically about Cursor failing to live up to their claims.

jonathanstrange•3w ago

I think it's only a matter of time until this becomes reality. It's almost inevitable.

My prediction last year was already that in the distant future - more than 10 years into the future - operating systems will create software on the fly. It will be a basic function of computers. However, there might remain a need for stable, deterministic software, the two human-machine interaction models can live together. There will be a need for software that does exactly what one wants in a dumb way and there will be a need for software that does complex things on the fly in an overall less reliable ad hoc way.

falkensmaize•3w ago

We might cure cancer in 10 years. We could have Martian colonists in the next decade. Everyone might be commuting to work with a jet pack. Literally anything could happen, especially given a long enough time horizon.

jonathanstrange•3w ago

You do realize that AI can already today write fairly complex software autonomously, don't you? It's not as if I haven't tested that. It works quite well for certain tasks and with certain programming languages.

Anyone who knows history knows that people initially tend to underestimate the impact of technologies, yet few people learn something from that lesson.

falkensmaize•2w ago

What fairly complex software has it written autonomously for you?

ares623•3w ago

Can’t help but draw parallels to how working with AI feels like. Your coworker opens a giant impressive looking PR and marks it ready for review. Meanwhile it’s up to someone else in the team to do the actual work of checking. Meanwhile the PR author gets patted on the back by management for being forward thinking and pro-active while everyone else is “nitpicky” and holding progress back.

callc•3w ago

I’m dealing with similar issues.

It’s reasonable to come up with team rules like:

- “if the reviewer finds more than 5 issues the PR shall be rejected immediately for the submitter to rework”

- “if the reviewer needs to take more than 8 hours to thoroughly review the PR it must be rejected and sent back to split up into manageable change sets”

Etc etc. let’s not make externalizing work for others appropriate behavior.

tyre•3w ago

Eight hours to review! Girlie how big are these PRs?

I can’t imagine saying, “ah, only six hours of heads down time to review this. That’s reasonable.”

A combination of peer reviewed architecture documentation and incremental PRs should prevent anything taking nearly 8 hours of review.

embedding-shape•3w ago

Agreed, if it takes 8 hours to review a PR, then the process is broken and you need to start talking before anyone starts writing code. I'd put the max window on maybe 30 minutes for a PR, otherwise we're doing something else, not a "last pass before merge into production".

thewhitetulip•3w ago

Not to mention the fact that juniors can now put the entire problem statement in AI chatbot which spits out _some_ code. The said juniors then don't understand half the code and run the code and raise the PR. They don't get a pat on the back but this raises countless bugs later on. This is much worse as they don't develop skills on their own. They blindly copy from AI.

LegitShady•3w ago

AI hype is just lying until you get caught

jadenpeterson•3w ago

For my 11th or 12th birthday, I got a pet porcupine and I was ecstatic. It was my first pet, and I spent hours researching what they eat, what habitats they like, etc. I carefully curated my room to accommodate him (him being 'Sonic'), even keeping it clean for the first time in forever so I wouldn't lose him amidst the mess of soiled undergarments and such. He loved it, and I loved him. Of course, it made no difference when my uncle sat on him on Christmas morning. We rushed him to the vet, but they told us his scans showed fractures on several vertebrae or something like that. We took him home, and waited for him to die, but the waiting was too painful. I'll spare the details, but what transpired next involved my dad, his shovel, and a lot of tears.

About an hour later, we got a call from the vet - they'd misread the scan, and Sonic was gonna be fine. I think I was traumatized at the time, but the whole thing later became an inside joke (?) for my family - "Don't kill your porcupine before the vet calls" (a la "Don't count your chickens before they hatch").

I guess my point, as it pertains to Cursor, its AI offerings, and other corporations in the space is that we shouldn't jump the gun before a reasonable framework exists to evaluate such open-ended technologies. Of course Cursor reported this as a success, the incentive structure demands they do so. So remember - don't kill your porcupine before the vet calls.

callc•3w ago

Welcome to HN, thanks for sharing. That’s a very sad story, I hope you aren’t traumatized still.

A reasonable framework does exist. Since the claim is “we made a web browser from scratch” the framework is:

1. Does it actually f*** work?

2. Is it actually from scratch?

It fails on both counts. Further, even when compiled successfully, as others have pointed out, it takes more than a minute to load some pages which is a fail for #1.

Shaanie•3w ago

If it loads pages, then it clearly works. Nobody claims it's a practical, competitive browser.

callc•3w ago

“I built a car from scratch”

…

“Nobody said it has brakes.”

Taken at face value, everyone assumes when you say statement #1 that you are not speaking like a lawyer.

thewhitetulip•3w ago

> other corporations in the space is that we shouldn't jump the gun before a reasonable framework exists to evaluate such open-ended technologies

How else will they raise a Bajillion $ for the next model?

mslate•3w ago

No one's killing a porcupine here.

swyx•3w ago

i'm sorry for your porcupine :(

solid_fuel•3w ago

This is par for the course with this AI slop. Most of the big claims about LLM productivity have completely lacked any backing evidence. Big claims require big evidence, but all I've seen so far is loud assertions and pathetic results.

callc•3w ago

I’m happy that this shows that hard work, understanding your codebase, having performant software, having actually working software, rigorously measuring and proving proof of results still matters.

There’s a huge difference between using LLMs to offload any hard work and for LLMs to be of some assistance while you are in control and take ownership of the output.

Unfortunately, the general public probably didn’t try a git clone and cargo build, and took the article at face value.

sidgarimella•3w ago

there’s a curve where something of a conservative middle in AI marketing stunts are held to a higher level of criticism than headlines on either side

wilsonzlin•3w ago

Hey, Wilson here, author of the blog post and the engineer working on this project. I've been reading the responses here and appreciate the feedback. I've posted some follow up context on Twitter/X[0], which I'll also write here:

The repo is a live incubator for the harness. We are actively researching the behavior of collaborative long running agents, and may in the future make the browser and other products this research produces more consumable by end users and developers, but it's not the goal for now. We made it public as we were excited by the early results and wanted to share; while far off from feature parity with the most popular production browsers today, we think it has made impressive progress in the last <1 week of wall time.

Given the interest in trying out the current state of the project, I've merged a more up-to-date snapshot of the system's progress that resolves issues with builds and CI. The experimental harness can occasionally leave the repo in an incomplete state but does converge, which was the case at the time of the post.

I'm here to answer any further questions you have.

[0] https://x.com/wilsonzlin/status/2012398625394221537?s=20

eloisius•3w ago

That doesn’t really address much of the criticism in this thread. No one is shocked that it’s not as good as production web browsers. It’s that it was billed as “from scratch” but upon deeper inspection it looks like it’s just gluing together Servo and some other dependencies, so it’s not really as impressive or interesting because the “agents” didn’t really create a browser engine.

M4v3R•3w ago

Upon deeper inspection? Someone checked the Cargo file and proclaimed it was just Servo and QuickJS glued together without actually bothering to look if these dependencies are even being used.

In reality while project does indeed have Servo in its dependencies it only uses it for HTML tokenization, CSS selector matching and some low level structures. Javascript parsing and execution, DOM implementation & Layout engine was written from scratch with only one exception - Flexbox and Grid layouts are implemented using Taffy - a Rust layout library.

So while “from scratch” is debatable it is still immensely impressive to be that AI was able to produce something that even just “kinda works” at this scale.

acdha•3w ago

> So while “from scratch” is debatable it is still immensely impressive to be that AI was able to produce something that even just “kinda works” at this scale.

“From scratch” is inarguably wrong given how much third-party code it depends on. There’s a reasonable debate about how much original content there is but if I was a principal at a company whose valuation hinges on the ability to actually deliver “from scratch” for real, I would be worried about an investor suing for material misrepresentation of the product if they bought now and the value went down in the future.

wilsonzlin•3w ago

Thanks for the feedback. I agree that for some parts that use dependencies, the agent could have implemented them itself. I've begun the process of removing many of these and developing them within the project alongside the browser. A reasonable goal for "from scratch" may be "if other major browsers use a dependency, it's fine to do so too". For example: OpenSSL, libpng, HarfBuzz, Skia.

I'd push back on the idea that all the agents did was glue dependencies together — the JS VM, DOM, CSS cascade, inline/block/table layouts, paint systems, text pipeline, chrome, and more are all being developed by agents as part of this project. There are real complex systems being engineered towards the goal of a browser engine, even if not fully there yet.

RandyOrion•3w ago

Hi, there. Two questions about this repo [0].

Can you show us what you did after people failed to compile that project [1]?

There are also questions about the attribution of these commits [2]. Can you share some information?

[0] https://github.com/wilsonzlin/fastrender [1] https://github.com/wilsonzlin/fastrender/issues/98 [2] https://gist.github.com/embedding-shapes/d09225180ea3236f180...

realharo•3w ago

Make it port Firefox's engine to iOS, that's something people would actually use (in countries where Apple is forced to allow other browser engines).

devmor•3w ago

I am just so utterly tired of AI companies lying about everything, constantly without end.

The things that modern machine learning can do are absolutely incredible, mindblowing and have myriad uses. But this culture of startup scams to siphon money out of the economy and into the bank accounts of a few investment firms and a couple "visionaries" has just turned what should be an exciting field full of technical advancement into a deluge of mental sewage that's constantly pumped into our faces.

DeathArrow•3w ago

So they prove that if you have enough money to burn you can use AI to generate terabytes of useless junk?

Who would have thought of that?

nubskr•3w ago

That's actually the state of autonomous coding in 2026, scale the output, skip the verification.

thewhitetulip•3w ago

Also since firefox is FOSS and any model has reasonably been trained on the code base of at least Firefox if not also Chromium, it's not a shock that agents are able to generate a similar code!

motbus3•3w ago

If it just forks chromium because it found it on the web it would also claim it made a browser from scratch. LLM does not know. It is not a person, it is a thing, just an algorithm

orourke•3w ago

I feel that getting anywhere into the neighborhood of “kind of working” for a project like this is noteworthy and a huge milestone. Maybe a better headline would be, however: Agents almost create a working browser.

embedding-shape•3w ago

Yes, if Cursor claimed "We let autonomous agents run for weeks, and they produced millions of lines of code, and it kind of looks like a browser, and it kind of runs", then I wouldn't have written and published TFA.

But their claim wasn't so nuanced, it was "hundreds of agents can work on a single codebase autonomously for weeks and build an entire browser from scratch that works (kinda)". Considering the hand-holding that seems to have been required to get it to compile, this claim doesn't seem to hold up to scrutiny.

Snuggly73•3w ago

I've watched them today work in the new repo - https://github.com/wilson-anysphere/fastrender/tree/main , adding another 50k lines trying to optimize scroll/rendering performance (spoiler: not really)

At this point, its 1.5mlocs without the vendored crates (so basically excluding the js engine etc). If you compare that to Servo/Ladybird which are 300k locs each and actually happen to work, agents do love slinging slop.

elzbardico•3w ago

I think that the companies that have the mindset "Let's give engineers tools that can leverage their strengths and eliminate toil" have way more success than those scammy "get-rich-fast let's automate software development and stop paying those sv salaries, invest in us!!!" gigs like Cursor and Devin.

Their whole attitude leads to them wasting time with those Willy the Coyote Plans instead of building good products like Amp.

embedding-shape•3w ago

Huge distinction between the two, one is about "Augmenting the human intellect" and the other is about "Get rich quick", but unfortunately it seems it's hard even for software developers to see which is which sometimes.

utopiah•3w ago

That's kind of hilarious (...ly sad) to read knowing that I have on my desk https://browser.engineering so I literally went the opposite direction some months ago.

Not only did I actually build a Web browser myself, from scratch (ok OK of course with a working OS and Python, and its libraries ;) but mine, did work! And it took me what, few hours, maybe few days if adding it altogether but, not only it did work (namely I did browse my own Website with it) but I had fun with it (!), I learned quite a bit with it (including the provable fact that I can indeed build a Web browser, woohoo!) and finally I did it on... I want say few kilowatts at most, including my computer (obviously) but also myself and the food I ate along the way.

So... to each their own ̄\_ (ツ)_/ ̄

simonw•2w ago

They fixed FastRender so that CI passes and added build instructions to the README. I've tried it and it works surprisingly well - screenshots here: https://gist.github.com/simonw/53a725811db8e34f4f99226e8f456...

izucken•2w ago

You people clearly don't understand how important lines of code are. Three millions is a lot of lines of code even if its broken, and you can't even appreciate that number. Clearly you are weak software developers who write very little lines of code, and can't even steal other's lines of code to keep up. I am very glad we are back to reporting results in lines of code which is a very informative metric hence now I can get my many lines of code appreciated.

DoNotNotify is now Open Source

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

LLMs as the new high level language

Software factories and the agentic moment

Moroccan sardine prices to stabilise via new measures: officials

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Speed up responses with fast mode

Modern and Antique Technologies Reveal a Dynamic Cosmos

Roger Ebert Reviews "The Shawshank Redemption" (1999)

LineageOS 23.2

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Vocal Guide – belt sing without killing yourself

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

uLauncher

Substack confirms data breach affects users’ email addresses and phone numbers

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Show HN: A luma dependent chroma compression algorithm (image compression)

The Scriptovision Super Micro Script video titler is almost a home computer

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

DoNotNotify is now Open Source

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

LLMs as the new high level language

Software factories and the agentic moment

Moroccan sardine prices to stabilise via new measures: officials

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Speed up responses with fast mode

Modern and Antique Technologies Reveal a Dynamic Cosmos

Roger Ebert Reviews "The Shawshank Redemption" (1999)

LineageOS 23.2

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Vocal Guide – belt sing without killing yourself

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

uLauncher

Substack confirms data breach affects users’ email addresses and phone numbers

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Show HN: A luma dependent chroma compression algorithm (image compression)

The Scriptovision Super Micro Script video titler is almost a home computer

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Cursor's latest “browser experiment” implied success without evidence

Comments