Most of the big ones are things like skia, harfbuzz, wgpu - all totally reasonable IMO.
The two that stand out for me as more notable are html5ever for parsing HTML and taffy for handling CSS grids and flexbox - that's vendored with an explanation of some minor changes here: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...
Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.
I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.
I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.
Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.
I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.
To leverage AI to build a working browser you would imo need the following:
- A team of humans with some good ideas on how to improve on existing web engines.
- A clear architectural story written not by agents but by humans. Architecture does not mean high-level diagrams only. At each level of abstraction, you need humans to decide what makes sense and only use the agent to bang out slight variations.
- A modular and human-overseen agentic loop approach: one agent can keep running to try to fix a specific CSS feature(like grid), with a human expert reviewing the work at some interval(not sure how fine-grained it should be). This is actually very similar to running an open-source project: you have code owners and a modular review process, not just an army of contributor committing whatever they want. And a "judge agent" is not the same thing as a human code owner as reviewer.
Example on how not to do it: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...
This rendering loop architecture makes zero sense, and it does not implement web standards.
> in the HTML Standard, requestAnimationFrame is part of the frame rendering steps (“update the rendering”), which occur after running a task and performing a microtask checkpoint
> requestAnimationFrame callbacks run on the frame schedule, not as normal tasks.
This is BS: "update the rendering" is specified as just another task, which means it needs to be followed by a microtask checkpoint. See https://html.spec.whatwg.org/multipage/#event-loop-processin...
Following the spec doesn't mean you cannot optimize rendering tasks in some way vs other tasks in your implementation, but the above is not that, it's classic AI bs.
Understanding Web standards and translating them into an implementation requires human judgement.
Don't use an agent to draft your architecture; an expert in web standards with a interest in agentic coding is what is required.
Message to Cursor CEO: next time, instead of lighting up those millions on fire, reach out to me first: https://github.com/gterzian
How much effort would it take for a group of humans to do it?
But in general, my guess at an answer(supported by the results of the experiment discussed on this thread), is that:
- GenAi left unsupervised cannot write a browser/engine, or any other complex software. What you end-up with is just chaos.
- A group of humans using GenAi and supervising it's output could write such an engine(or any other complex software), and in theory be more productive than a group of humans not using GenAi: the humans could focus on the conceptual bottlenecks, and the Ai could bang-out the features that require only the translation of already established architectural patterns.
When I write conceptual bottlenecks I don't mean standing in front of a whiteboard full of diagrams. What I mean is any work the gives proper meaning and functionality to the code: it can be at the level of an individual function, or the project as a whole. It can also be outside of the code itself, such as when you describe the desired behavior of (some part of) a program in TLA+.
For an example, see: https://medium.com/@polyglot_factotum/on-writing-with-ai-87c...
“This is a clear indication that while the AI can write the code, it cannot design software”
To clarify what I mean by a product. If we want to design a browser system (engine + chrome) from scratch to optimize the human computer symbiosis (Licklider), what would be the best approach? Who should take the roles of making design decisions, implementation decisions, engineering decisions and supervision?
We can imagine a whole system with human out of the loop, that would be a huge unit test and integration test with no real application.
Then human can study it and learn from it.
Or the other way around, we had already made a huge mess of engineering beasts and machine will learn to fix our mess or make it worse by order of magnitude.
I don’t have an answer.
I used to be a big fan of TDD and now I am not, the testing system is a big mess by itself.
Thanks.
> what would be the best approach?
I don't know but it sounds like an interesting research topic.
Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away.
But how much will it cost to maintain those features in the future? So far the answer appears to be a whole lot less than I would previously budget for, but I don't have any code more than a few months old that was built ~100% by coding agents, so it's way too early to judge how maintenance is going to work over a longer time period.
Very little if they have good specs and tests.
Essentially a bet that the rate of model improvement is going to be faster than the rate of decay from bad coding.
Now this hurts me personally to see as someone who actually enjoys having quality code but I don't see why it doesn't have a decent chance of holding
What it should have been willing to do is go off and look for free external assets on the Web that it could download and integrate.
I think the current models are at a capability level that could create a decent 3D game. The challenges are creating graphic assets and debugging/Qa. The debugging problem is you need to figure out a good harness to let the model understand when something is working, or how it is failing.
Also graphics acceleration makes it hard to do from scratch rather than using using the 3D APIs but I guess you could in principle go bare iron on hardware that has published specs such as AMD, or just do software only rendering.
I've reached out to the engineer who seemed to have run the experiment, who hopefully can shed some more light on it and (hopefully) my update to https://news.ycombinator.com/item?id=46646777 will include the replies and more investigations.
It's hard to imagine a human developer misses something so obvious.
AI makes it cheap (eventually almost free) to traverse the already-discovered and reach the edge of uncharted territory. If we think of a sphere, where we start at the center, and the surface is the edge of uncharted territory, then AI lets you move instantly to the surface.
If anything solved becomes cheap to re-instantiate, does R&D reach a point where it can’t ever pay off? Why would one pay for the long-researched thing when they can get it for free tomorrow? There will be some value in having it today, just like having knowledge about a stock today is more valuable than the same knowledge learned tomorrow. But does value itself go away for anything digital, and only remain for anything non-copyable?
The volume of a sphere grows faster than the surface area. But if traversing the interior is instant and frictionless, what does that imply?
In a stage interview (a bit after the "sparks of agi in gpt4" paper came out) he made 3 statemets:
a) llms can't do math. They can trick us with poems and subjective prose, but at objective math they fail.
b) they can't plan
c) by the nature of their autoregressive architecture, errors compound. so a wrong token will make their output irreversibly wrong, and spiral out of control.
I think we can safely say that all of these turned out to be wrong. It's very possible that he meant something more abstract, and technical at its core, but in the real life all of these things were overcome. So, not a luddite, but also not a seer.
The harnesses have helped in training the models themselves (i.e. every good trace was "baked in" the model) and have improved in enabling test time compute. But at the end of the day this is all put back into the models, and they become better.
The simplest proof of this is on benchmarks like terminalbench and swe-bench with simple agents. The current top models are much better than their previous versions, when put in a loop with just a "bash tool". There's a ~100LoC harness called mini-swe-agent [1] that does just that.
So current models + minimal loop >> previous gen models with human written harnesses + lots of glue.
> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!
They can and are improved (papered over) over time. For example by improving and tweaking the training data. Adding in new data sets is the usual fix. A prime example 'count the number of R's in Strawberry' caused quite a debacle at a time where LLM's were meant to be intelligent. Because they aren't they can trip up over simple problems like this. Continue to use an army of people to train them and these edge cases may become smaller over time. Fundamentally the LLM tech hasn't changed.
I am not saying that LLM's aren't amazing, they absolutely are. But WHAT they are is an understood thing so lets not confuse ourselves.
This is orthogonal to the issue of whether all ideas are essentially "remixes." For the record I agree that they are.
You know this is a false dichotomy right? You can treat and consider LLMs statistical parrots and at the same time take advantage of them.
The subsequent argument that "LLMs only remix" => "all knowledge is a remix" seems absurd, and I'm surprised to have seen it now more than once here. Humanity didn't get from discovering fire to launching the JWST solely by remixing existing knowledge.
[1] http://bactra.org/notebooks/nn-attention-and-transformers.ht...
[2] Well, smoothing/estimation but the difference doesn't matter for my point.
Even acknowledging it is interpolation, models can extrapolate slightly without making things up, within the range where the model still applies. Whos to say what this range is for an LLM operating in thousand dimensional space? As far as I can tell the main limiters to LLM creativity are guardrails we put in place for safety and usefulness.
And what exactly is your proof that human ingenuity is not just pattern matching. Im sure a hypothesis can be put that fire was discovered by just adding up all known facts the people of those times knew and stumbling on something that put it all together. Sounds like knowledge remix + slight extrapolating to me.
It's a hypothesis at this stage, but I'm going to have a go at making it more quantitative. It seems the obvious explanation for "hallucinations", and it seems like it should also be rather straightforward to attribute particular inference results to the training data that influenced them. I'm expecting to encounter difficulties, though, since the idea seems so obvious it's vanishingly unlikely it hasn't been tried.
> And what exactly is your proof that human ingenuity is not just pattern matching.
Firstly, I'm not the one making a strong claim that needs to "proved". Secondly, "pattern matching" is ill-defined and not what I'm saying human intelligence isn't. I'm saying human intelligence isn't a kernel smoothing algorithm run over a corpus of text. This seems rather obvious. What's your proof that it is that?
Not every solution needs to be unique, in many cases "remixing" existing solutions in an unique way is better and faster.
It's nearly frictionless, not frictionless because someone has to use the output (or at least verify it works). Also, why do you think the "shape" of the knowledge is spherical? I don't assume to know the shape but whatever it is, it has to be a fractal-like, branching, repeating pattern.
At a minimum:
1. You've got an incredibly clearly defined problem at the high level.
2. Extremely thorough tests for every part that build up in complexity.
3. Libraries, APIs, and tooling that are all compatible with one another because all of these technologies are built to work together already.
4. It's inherently a soft problem, you can make partial progress on it.
5. There's a reference implementation you can compare against.
6. You've got extremely detailed documentation and design docs.
7. It's a problem that inherently decomposes into separate components in a clear way.
8. The models are already trained not just on examples for every module, but on example browsers as a whole.
9. The done condition for this isn't a working browser, it's displaying something.
This isn't a realistic setup for anything that 99.99% of people work on. It's not even a realistic setup for what actual developers of browsers do who must implement new or fuzzy things that aren't in the specs.
Note 9. That's critical. Getting to the point where you can show simple pages is one thing. Getting to the point where you have a working production browser engine, that's not just 80% more work, it's probably considerably more than 100x more work.
Whether it is the best case scenario in terms of benchmark, I am not so sure.
The Web is indeed standardized and there are many open-source implementations out there. But how to implement the Web in a novel way by definition means you are trying to solve some perceived problem with existing implementations.
So I would rephrase your statement as such: rewriting an existing engine in another language without any novelty might be the best case scenario for autonomous coding agents.
As an example of approaching the problem in a novel way: the Fastrender code seems obsessed with metering of resources. Implementing the Web with that constraint in mind would be an interesting problem and not obvious at all. That's not what the project is doing so far by the way, since the code is quite frankly a bunch of spaghetti that does not follow Web standards at all(in a way that is unrelated to the metering story, so the divergence from specs is not novel, it's just wrong).
Not saying that this only happens with LLMs, in fact it should be compared against e.g. a dev team of 4-5
Is it? Use dollar cost of salary and cost for the AI. That wraps up all those things you mentioned.
It's ability to pattern match it's way through a code base is impressive until it's not and you always have to pull it back to reality when it goes astray.
It's ability to plan ahead is so limited and it's way of "remembering" is so basic. Every day it's a bit like 50 first dates.
Nonetheless seeing what can be achieved with this pseudo intelligence tool makes me feel a little in awe. It's the contrast between not being intelligence and achieving clearly useful outcomes if stirred correctly and the feeling that we just started to understand how to interact with this alien.
Because that's exactly what they are. An LLM is just a big optimization function with the objective "return the most probabilistically plausible sequence of words in a given context".
There is no higher thinking. They were literally built as a mimicry of intelligence.
Maybe real intelligence also is a big optimization function? Brain isn't magical, there are rules that govern our intelligence and I wouldn't be terribly surprised if our intelligence in fact turned out to be kind of returning the most plausible thoughs. Might as well be something else of course - my point is that "it's not intelligence, it's just predicting next token" doesn't make sense to me - it could be both!
LLM's do not think, understand, reason, reflect, comprehend and they never shall. I have commented elsewhere but this bears repeating
If you had enough paper and ink and the patience to go through it, you could take all the training data and manually step through and train the same model. Then once you have trained the model you could use even more pen and paper to step through the correct prompts to arrive at the answer. All of this would be a completely mechanical process. This really does bear thinking about. It's amazing the results that LLM's are able to acheive. But let's not kid ourselves and start throwing about terms like AGI or emergence just yet. It makes a mechanical process seem magical (as do computers in general).
I should add it also makes sense as to why it would, just look at the volume of human knowledge (the training data). It's the training data with the mass quite literally of mankind's knowledge, genius, logic, inferences, language and intellect that does the heavy lifting.
But you could make the exact same argument for a human mind? (could just simulate all those neural interactions with pen and paper)
The only way to get out of it is to basically admit magic (or some other metaphysical construct with a different name).
It would be an argument and you are free to make it. What the human mind is, is an open scientific and philosophical problem many are working on.
The point is that LLM's are NOT the same because we DO know that LLM's are. Please see the myriad of tutorials 'write an LLM from scratch'
But we have no idea how many "essential" differences there are (if any!).
Dismissing LLMs as avenues toward intelligence just because they are simpler and easier to understand than our minds is a bit like looking at a modern phone from a 19th century point of view and dismissing the notion that it could be "just a Turing machine": Sure, the phone is infinitely more complex, but at its core those things are the same regardless.
Many people are throwing around that they don't "think", that they aren't "conscious", that they don't "reason", but I don't see those people sharing interesting heuristics to use LLMs well. The "they don't reason" people tend to, in my opinion/experience, underestimate them by a lot, often claiming that they will never be able to do <thing that LLMs have been able to do for a year>.
To be fair, the "they reason/are conscious" people tend to, in my opinion/experience, overestimate how much a LLM being able to "act" a certain way in a certain situation says about the LLM/LLMs as a whole ("act" is not a perfect word here, another way of looking at it is that they visit only the coast of a country and conclude that the whole country must be sailors and have a sailing culture).
It's an algorithm and a completely mechanical process which you can quite literally copy time and time again. Unless of course you think 'physical' computers have magical powers that a pen and paper Turing machine doesn't?
> Many people are throwing around that they don't "think", that they aren't "conscious", that they don't "reason", but I don't see those people sharing interesting heuristics to use LLMs well.
My digital thermometer doesn't think. Imbibing LLM's with thought will start leading to some absurd conclusions.
A cursory read of basic philosophy would help elucidate why casually saying LLM's think, reason etc is not good enough.
What is thinking? What is intelligence? What is consciousness? These questions are difficult to answer. There is NO clear definition. Some things are so hard to define (and people have tried for centuries) e.g. what is consciousness? That they are a problem set within themselves please see Hard problem of consciousness.
What kind of absurd conclusions? And what kind of non absurd conclusions can you make when you follow your let's call it "mechanistic" view?
>It's an algorithm and a completely mechanical process which you can quite literally copy time and time again. Unless of course you think 'physical' computers have magical powers that a pen and paper Turing machine doesn't?
I don't, just like I don't think a human or animal brain has any magical power that imbues it with "intelligence" and "reasoning".
>A cursory read of basic philosophy would help elucidate why casually saying LLM's think, reason etc is not good enough.
I'm not saying they do or they don't, I'm saying that from what I've seen having a strong opinion about whether they think or they don't seem to lead people to weird places.
>What is thinking? What is intelligence? What is consciousness? These questions are difficult to answer. There is NO clear definition.
You see pretty certain that whatever those three things are a LLM isn't doing it, a paper and pencil aren't doing it even when manipulated by a human, the system of a human manipulating a paper and pencil isn't doing it.
[0] http://www.catb.org/~esr/jargon/html/N/neats-vs--scruffies.h...
But you can automate much of that work by having good tests. Why vibe-test AI code when you can code-test it? Spend your extra time thinking how to make testing even better.
The other day I found [beads](https://github.com/steveyegge/beads) and thought maybe that could be a good improvement over my current state.
But I'm quite hesitant because I also have seen these AGENTS.md files become stale and then there is also the question of how much information is too much especially with the limited context windows.
Probably all things that could again just be solved by leveraging AI more and I'm just an LLM noob. :D
I've used it quite a bit, but now that Gas Town is a thing Beads getting a bit bloated and they're adding new features left and right, dunno why.
Might have to steal the best bits of Beads (the averaged out cli experience and JSONL for storing issues in the repo + local sqlite cache) and build my one with none of the extra bells and whistles.
"8 unit tests? Great, I'll code up 8 branches so all your tests pass!" Of course that neglects the fact that there's now actually 2^8 paths through your code.
Perhaps more advanced llms + specifications + better tests.
The reality of things is, AI still can't handle long running tasks without blowing $500k worth of tokens for an end result that doesn't work, and further work is another $100k worth to get nothing novel.
Although I dissented on the decision, we banned the use of AI. Outside of the project I've been enjoying agentic coding and I do think it can be used already today to build production-grade software of browser-like complexity.
But this project shows that autonomous agents without human oversight is not the way forward.
Why? Because the generated code makes little sense from a conceptual perspective and does not provide a foundation on which to eventually build an entire web engine.
For example, I've just looked into the IndexedDB implementation, which happens to be what I am working on at the moment in Servo.
Now, my work in Servo is incomplete, but conceptually the code that is in place makes sense and there is a clear path towards eventually implementing the thing as a whole.
In Fastrender, you see an Arc<Mutex<Database>> which is never going to work, because by definition a production browser engine will have to involve multiple processes. That doesn't mean you need the IPC in a prototype, but you certainly should not have shared state--some simple messaging between threads or tasks would do.
The above is an easy coding fix for the AI, but it requires input from a human with a pretty good idea of what the architecture should look like.
For comparison, when I look at the code in Ladybird, yet another browser project, I can immediately find my way around what for me is a stranger codebase: not just a single file but across large swaths of the project and understand things like how their rendering loop works. With Fastrender I find it hard to find my way around, despite all the architectural diagrams in the README.
So what do I propose instead of long-running autonomous agents? The focus should shift towards demonstrating how AI can effectively assist humans in building well-architected software. The AI is great at coding, but you eventually run into what I call conceptual bottlenecks, which can be overcome with human oversight. I've written about this elsewhere: https://medium.com/@polyglot_factotum/on-writing-with-ai-87c...
There is one very good idea in the project: adding the web standards directly in the repo so it can be used as context by the AI and humans alike. Any project can apply this by adding specs and other artifacts right next to the code. I've been doing this myself with TLA+, see https://medium.com/@polyglot_factotum/tla-in-support-of-ai-c...
To further ground the AI code output, I suggest telling it to document the code with the corresponding lines from the spec.
Back in early 2025 when we had those discussions in Servo about whether to allow some use of AI, I wrote this guide https://gist.github.com/gterzian/26d07e24d7fc59f5c713ecff35d... which I think is also the kind of context you want to give the AI. Note that this was back in the days of accepting edits with tabs...
That said, it's possible that none of that code even gets executed at run time, and the only code that is actually run is some translated glue code, with the other million lines essentially dead, so who knows.
You're right that lots of code appears only used in unit-tests, of which there is an enormous amount(making it hard to tell whether what is being tested makes sense). In Servo we don't have a single line of unit-tests in the script component, because all of it is covered by the WPT integration test suite shared with all other engines...
Just made some last edits above so not sure which version you saw. I toned it down a bit and clarified some stuff...
> When I made my 2029 prediction this is more-or-less the quality of result I had in mind.
There seems to be a lot of compensation and leniency made by the author here.
So, it is seemingly impressive that someone was able to use agents to build a browser.
But they used trillions of tokens? This equates to millions of dollars of spend. Are we really happy with this?
The browser itself is not fully complete. There's rendering glitches stated in the article. So millions of dollars for something that has obvious bugs.
This is also pure agent code. Can a code base like this ever be maintained by a team of humans? Are you vendor locked into a specific model if you want to build more features? How will support work? How will releases work? The lack of reflection over the rest of the software lifecycle except building is shocking.
So I'm not sure after reflecting, whether any of this is impressive outside of "someone with unlimited tokens built a browser using ai agents". It's the same class of problem being solved over and over again. Nothing new is really being done here.
Maybe it's just me but there's much more to software than just building.
This isn't a POC web engine; it's throw-away code that can never scale to a full web engine.
So instead of wasting millions on this autonomous run, they should have put together a small team of people with some ideas on how to improve on existing web engines, and then give that team a large token development budget. You could get a nice POC after a couple of weeks, and after a year or two of further iterations you might have something really interesting.
So this is a great example of how AI fails when left unsupervised; a more interesting experiment would be about how a small team can leverage AI to leapfrog Chromium; not in one week but in a year or two.
Yes, arguably 5 million is a fair price and cheaper than what it would take to pay humans.
EDIT: About the rendering speed. It doesn't really make sense to compare it with a fully functioning browser, as you could potentially drop features or make bogus optimisations to go faster.
$ time target/release/fetch_and_render "https://www.lauf-goethe-lauf.de/"
real 0m0,685s
user 0m0,548s
sys 0m0,070s
$ time chromium --headless --disable-gpu --screenshot=out.png --window-size=1200,800 https://www.lauf-goethe-lauf.de/
real 0m1,099s
user 0m0,927s
sys 0m0,692s
# edit: with a hot-standby chrome and a running node instance a can reach 0,369s seconds here
anilgulecha•2w ago
I think a good abstractions design and good test suite will make it break success of future coding projects.