Super-Flat ASTs

100•mmphosis•2mo ago

Comments

mitchellh•1mo ago

For a good example of this sort of pattern in the real world, take a look at the Zig compiler source code. I'm sure others might do it but Zig definitely does. I have a now very outdated series on some of the Zig internals: https://mitchellh.com/zig/parser And Andrew's old DoD talk is very good and relevant to this: https://vimeo.com/649009599

More generally, I believe its fair to call this a form of handle-based designs: https://en.wikipedia.org/wiki/Handle_(computing) Which are EXTREMELY useful for a variety of reasons and imo woefully underused above the lowest system level.

sestep•1mo ago

My hypothesis is that handles are underused because programming languages make it very easy to dereference a pointer (you just need the pointer) whereas "dereferencing" a handle requires also having the lookup table in hand at the same time, and that little bit of extra friction is too much for most people. It's not that pointers don't require extra machinery to be dereferenced, it's just that that machinery (virtual memory) is managed by the operating system, and so it's invisible in the language.

My current research is about how to make handles just as convenient to use as pointers are, via a form of context: like a souped-up version of context in Odin or Jai if one is familiar with those, or like a souped-up version of coeffects if one has a more academic background.

densh•1mo ago

Great summary and I think your argument is sound.

faresahmed•1mo ago

I think that it's a generic programming problem: pointers are easier because the type of the pointee is easy to get (a deref) and also its location (memory) but with index-based handles into containers you can no longer say that given a handle `H` (type H = u32) I can use it to get a type `T` and not only that, you've also introduced the notion of "where", that even if for each type `T` there exists a unique handle type `H` you don't know into which container instance does that handle belong. What you need is a unique handle type per container instance. So "Handle of Pool<T>" != "Handle of Pool<T>" unless the Pool is bound to the same variable.

As far as I know no language allows expressing that kind of thing.

sestep•1mo ago

I think actually Scala does exactly this style of inferring the container instance from its type: https://docs.scala-lang.org/scala3/book/ca-context-parameter...

But from what I understand (being a nonexpert on Scala), this scheme actually causes a lot of problems. I think I've even heard that it adds more undecidability to the type system? So I'm exploring ways of managing context that don't depend on inferring backward from the type.

debugnik•1mo ago

> What you need is a unique handle type per container instance.

You can do this with path-dependent types in Scala, or more verbosely with modules in OCaml. The hard part is keeping the container name in scope wherever these handle types are used: many type definitions will need to reference the container handle types. I'm currently trying to structure code this way in my pet compiler written in OCaml.

pedrozieg•1mo ago

What I like about this writeup is that it surfaces a tension most “let’s build a compiler” tutorials skip: the AST is both a data structure and a UX boundary. Super-flat layouts are fantastic for cache and memory, but they’re hostile to all the things humans care about (debuggable shapes, easy instrumentation, ad-hoc traversals, “just print this node and its children” in a debugger). A lot of production compilers quietly solve that by having two tiers: a nice, inefficient tree for diagnostics and early passes, and increasingly flattened / interned / arena-allocated forms as you move toward optimization and codegen.

The interesting question for me is where the crossover is now that IDEs and incremental compilation dominate the workload. If your front-end is effectively a long-running service, it might be worth keeping a friendlier AST around and only using a super-flat representation for hot paths like analysis passes or bulk refactors. Otherwise you risk saving a few hundred MB while spending engineer-months making every new pass fight the layout.

loeg•1mo ago

What about this representation is hostile to humans and ad-hoc traversals? Don't convenience "getters" basically solve usability?

uaksom•1mo ago

(author here) If you run the parser under a debugger like lldb, then attempt to inspect the AST of a program, it appears as an array of u64. Not very useful, unless you work on special support for debuggers (such as a python script to unpack it in lldb). Compare that to a tree of pointers, you can "expand" nodes without any extra effort.

loeg•1mo ago

I guess that makes sense. But I don't ever look at AST in a debugger, and if I needed to, I'd just write some python helpers.

munificent•1mo ago

It looks like, overall, this design gets the parser about twice as fast as a simple one that creates tree-like ASTs.

That's not nothing. But a parser is rarely the most time-intensive part of a production compiler. And the parser does get iterated on a lot in languages that are evolving and adding new syntax.

Given that, I'd be inclined to take the performance hit and stick with a simpler AST representation if that yields a more hackable, maintainable compiler front end.

exyi•1mo ago

Usually yes, but it's still a neat trick to be aware of. For interpreted scripting languages, parsing can actually be a significant slowdown. Even more so when we start going into text-based network protocols, which also need a parser (is CSS a programming language or a network protocol? :) )

benhoyt•1mo ago

That's a good caution. However, traversing a flat AST (iterating a "struct of arrays" rather than a pointer-based tree) is also going to be faster. So the next steps of the compiler, say type checking and code emitting, will also be faster. But how much, or whether it's worth it even then, I'm not sure.

munificent•1mo ago

True, but that does also depend on where you store semantic information. Zipping past a nicely packed AST won't buy you much if for every node you have to look up its type or other semantic information somewhere else in memory through some slow process.

Joker_vD•1mo ago

Another small trick is to use a "reversed" bump allocator, that starts handing out the memory from the end with the larger addresses. Since AST structures are almost always created bottom-up, children before the parents, the root node at the very end on the return from parseProgram/parseModule/etc. function, you will end up with the AST that has most its pointers pointing forward. This means that during AST walks, you'll be going from lower addresses to the higher ones which I think is actually somewhat faster than the reverse order.

uaksom•1mo ago

(author here) I agree that it's a lot of complexity, and I acknowledge this in the article: You can get quite far with just a bump allocator.

I didn't go into this at all, but the main benefit of this design is how well it interacts with CPU cache. This has almost no effect on the parser, because you're typically just writing the AST, not reading it. I believe that subsequent stages benefit much more from faster traversal.

(By the way, I am a huge fan of your work. Crafting interpreters was my introduction to programming languages!)

mediumdeviation•1mo ago

For anyone confused by why the text says the performance is improving between each graph but the lines don't seem to show that - the color for each key and the scale changes between graphs.

loeg•1mo ago

FWIW I think Clang IR does something like this in a lot of places. It is relatively common to see child nodes stored inline following parent nodes. The APIs more or less abstract this away from consumers like static analysis tools, though.

E.g., https://github.com/llvm/llvm-project/blob/62e00a03fba029f82d...

and

https://github.com/llvm/llvm-project/blob/62e00a03fba029f82d...

torginus•1mo ago

Personally I think this is a neat trick to organize memory, but don't these kinds of objects packed together in flat buffers bypass the entire lifetime and safety mechanism of Rust?

I mean if you do an off by one error on indices, essentially you are readin the pointer of another node.

loeg•1mo ago

This is a common argument about Rust. Unlike pointer confusion, with index confusion, you still get bounds checking in the containing collection, and you also avoid type confusion (the wrong index element will still have the same type as the object you intended to access). So there are still some benefits.

torginus•1mo ago

> This is a common argument about Rust.

Because it is a correct one. std::vector can do this in debug mode, Zig and a bunch of other languages do as well. But that's not the point of why memory safety's important. You opt out of all aliasing and lifetime checks this way, which means an easy off-by one indexing bug (which would be a compiler error otherwise), silently propagates through the system, and you can reference wrong nodes, invalid nodes, uninitalized nodex.

It's even worse in some aspects that a dangling pointer, because you can quickly tell a dangling pointer contains garbage data, but here, the stuff looks plausible.

I am not sure this is a critique of Rust - this certainly goes against the grain of how the language has been designed - which in the case of Rust, might make things easier, since lifetimes are not checked for individual elements, but also less safe.

loeg•1mo ago

I don't think this argument is useful to rehash here. That said:

> you can reference wrong nodes, invalid nodes, uninitalized nodex.

No, you cannot access uninitialized nodes in safe Rust.

shoo•1mo ago

It'd also have been interesting to see some overall profiling data of the initial program & some discussion of which optimisations to investigate based on that profiling data.

When investigating performance issues its often very helpful to run with profiling instrumentation enabled and start by looking at some top-down "cumulative sum" profiler output to get a big picture view of which functions/phases are consuming most of the running time, to see where it may be worth spending some effort.

Getting familiar with linux's perf [1] tool is also helpful, both in terms of interpreting summary statistics from perf stat (instructions per cycle, page faults, cache misses, etc) that can give clues what to focus on, but also being able to use it to annotate source line by line with time spent.

I'm not familiar with rust, but e.g. the rustc compiler dev guide has a tutorial on how to profile rustc using perf [2]

[1] Brendan Gregg's Linux perf examples is an excellent place to start https://www.brendangregg.com/perf.html [2] https://rustc-dev-guide.rust-lang.org/profiling/with_perf.ht...

vatsachak•1mo ago

We need to flatten everything. Thanks for the great write up

jauntywundrkind•1mo ago

I want to find some way to link this work to Sea of Nodes representations, but I'm a bit out of my depths to try to do so. https://v8.dev/blog/leaving-the-sea-of-nodes

MrNet32823•1mo ago

Why Mike Acton's data oriented programmmign has not caught up outside game dev and niche languages?

Brookhaven Lab's RHIC concludes 25-year run with final collisions

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Software factories and the agentic moment

I write games in C (yes, C)

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The F Word

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

We mourn our craft

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Coding agents have replaced every framework I used

72M Points of Interest

France's homegrown open source online office suite

Selection Rather Than Prediction

The AI boom is causing shortages everywhere else

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Where did all the starships go?

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Brookhaven Lab's RHIC concludes 25-year run with final collisions

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Software factories and the agentic moment

I write games in C (yes, C)

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The F Word

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

We mourn our craft

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Coding agents have replaced every framework I used

72M Points of Interest

France's homegrown open source online office suite

Selection Rather Than Prediction

The AI boom is causing shortages everywhere else

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Where did all the starships go?

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Super-Flat ASTs

Comments