Why I write recursive descent parsers, despite their issues (2020)

https://utcc.utoronto.ca/~cks/space/blog/programming/WhyRDParsersForMe

67•blobcode•4d ago

Comments

fjfaase•4d ago

I recently wrote a small C compiler that uses a recursive decent parser while this should not be possible if you just look at the syntax grammar. Why, because it looks at some semantic information about the class of identifiers, whether they are variables of typedefs for example. On the otherhand this is not very surprising, because in the days C was developed, easy parsing was a practical implication of it not being an academic research thing, but something that just had to work.

Recursive decent parsers can simply be implemented with recusive functions. Implementing semantic checks becomes easy with additional parameters.

WalterBright•8h ago

When I developed ImportC (which enables D compilers to read and use C code) I tried hard to build it and not require semantic analysis.

What a waste of time. I failed miserably.

However, I also realized that the only semantic information needed was to keep track of typedefs. That made recursive descent practical and effective.

ufo•8h ago

It sounds like you're describing the Lexer Hack[1]. That trick works just the same in an LR parser, so I wouldn't count it as an advantage of recursive descent.

[1] https://en.wikipedia.org/wiki/Lexer_hack

fjfaase•1h ago

Yes, it is basically this. I feel that writing a recursive descent parser with recursive functions is a bit easier than using an LR parser generator or a back-tracking PEG parser from my experience. It also does not requirer any third party tools or libraries, which I see as advantage.

keithnz•8h ago

recursive descent parsers are usually what I do for my little domain specific scripting languages. They are just easy and straightforward. I do like things like ANTLR, but most of the time it seems unnecessary.

ufo•8h ago

A middle ground that I think is sometimes useful is to use an LR parser generator to check if the grammar is ambiguous, but use recursive descent for the actual implementation. Since we won't actually use any code from the LR parser generator, you can pick whatever one you prefer regardless of the programming language.

thechao•5h ago

I've been having thoughts along these lines. Earley parsers match recursive descent really nicely. In my head there'd by an Earley parser "oracle": you'd tell the oracle about the operations you've performed (in terms of terminal consumption); and, then, you can ask the oracle which recursive descent subfunctions are safe to call (based on the prediction phase).

ivanjermakov•7h ago

Related: Resilient LL Parsing Tutorial https://matklad.github.io/2023/05/21/resilient-ll-parsing-tu...

nicoburns•7h ago

I wonder who it is that likes other kinds of parser. Over the last ~10 years or so I've read several articles arguing that recursive descent parsers are in fact great on HN. And they seem to be both the easiest to get started with and what almost all production-grade systems use. I've seen very little in the way of anything arguing for any other approaches.

jasperry•6h ago

But remember that the articles arguing for recursive descent parsers are arguing against the long-dominant paradigm of using LR parsers. Plenty of us still like LR parser generators (see my other comment.)

In between "easiest to get started with" and "what production-grade systems use", there is "easy to actually finish a medium-sized project with." I think LR parsers still defend that middle ground pretty well.

nicoburns•6h ago

> But remember that the articles arguing for recursive descent parsers are arguing against the long-dominant paradigm of using LR parsers

That was part of my question I think. I wouldn't have been able to tell you that the dominant paradigm being argued against was LR parsers, because I've never come across even one that I'm aware of (I've heard of them, but that's about it). Perhaps it's academia where they're popular?

jasperry•5h ago

I did learn about LR parser generators first in my Compilers class in college, but I assumed they were generally known about in language development communities.

o11c•6h ago

Recursive descent is fine if you trust that you won't write buggy code. If you implement a generator for it (easy enough), this may be a justifiable thing to trust (though this is not a given). I am assuming you're willing to put up with the insanity of grammar rewriting, one way or another.

LR however is more powerful, though this mostly matters if you don't have access to automatic grammar rewriting for your LL. More significantly, however, there's probably more good tooling for LR (or perhaps: you can assume that if tooling exists, it is good at what it is designed for); one problem with LL being so "simple" is that there's a lot of bad tooling out there.

The important things are 1. that you meaningfully eliminate ambiguities (which is easy to enforce for LR and doable for LL if your tooling is good), and 2. that you keep linear time complexity. Any parser other than LL/LR should be rejected because it fails at least one of these, and often both.

Within the LL and LR families there are actually quite a few members. SLR(1) is strong enough to be interesting but too weak for anything I would call a "language". LALR(1) is probably fine; I have never encountered a useful language that must resort to LR(1) (though note that modern tooling can do an optimistic fallback, avoiding the massive blowups of ancient LR tools). SLL(1) I'm not personally familiar with. X(k), where X is one of {SLL, LL, SLR, LALR, LR} and where k > 1, are not very useful; k=1 suffices. LL(*) however should be avoided due to backtracking, but in some cases consider if you can parsing token trees first (this is currently poorly represented in the literature; you want to be doing some form of this for error recovery anyway - automated error recovery is a useless lie) and/or defer the partial ambiguity until the AST is built (often better for error messages anyway, independent of using token trees).

kerkeslager•4h ago

> Recursive descent is fine if you trust that you won't write buggy code. If you implement a generator for it (easy enough), this may be a justifiable thing to trust (though this is not a given).

The idea that you're going to hand-roll a parser generator and then use that to generate a parser and the result is going to be less buggy than just hand-rolling a recursive descent parser, screams "I've never written code outside of an academic context".

maxbond•3h ago

> [It] screams "I've never written code outside of an academic context".

SQLite, perhaps the most widely deployed software system, takes this approach.

https://sqlite.org/lemon.html

> The Lemon LALR(1) Parser Generator

> The SQL language parser for SQLite is generated using a code-generator program called "Lemon".

> ...

> Lemon was originally written by D. Richard Hipp (also the creator of SQLite) while he was in graduate school at Duke University between 1987 and 1992.

Here are the grammars, if you're curious.

https://github.com/sqlite/sqlite/blob/master/src/parse.y

mpyne•3h ago

SQLite is kind of cheating here, you won't catching me writing my own source control management system either.

But I do think the wider point is still true, that there can be real benefit to implementing 2 proper layered abstractions rather than implementing 1 broader abstraction where the complexity can span across more of the problem domain.

motorest•3h ago

> The idea that you're going to hand-roll a parser generator and then use that to generate a parser and the result is going to be less buggy than just hand-rolling a recursive descent parser, screams "I've never written code outside of an academic context"

Your comment is quite funny as hand-rolling a recursive descent parser is the kind of thing that is often accused of being a) bug-prone, b) only done in academic environments.

raincole•1h ago

> Your comment is quite funny as hand-rolling a recursive descent parser is the kind of thing that is often accused of being a) bug-prone, b) only done in academic environments.

Accused by who? Literal idiots? Most parsers used in production are hand-rolled recursive descent parsers.

paddim8•53m ago

What? Accused of only being done in academic environments? Never heard that. Academics seem to spend 99% of their time talking about parser generators and LR parsing for some reason while most production compilers have handwritten recursive descent parsers...

lenkite•5h ago

The literature for incremental parsing doesn't appear to have much for recursive descent. Everyone appears to use the LR tree sitter approach.

userbinator•4h ago

I wonder who it is that likes other kinds of parser.

It seems to be mainly academics and others interested in parsing theory, and those who like complexity for the sake of complexity.

jasperry•7h ago

> If I was routinely working in a language that had a well respected de facto standard parser generator and lexer, and regularly building parsers for little languages for my programs, it would probably be worth mastering these tools.

In OCaml, a language highly suited for developing languages in, that de facto standard is the Menhir LR parser generator. It's a modern Yacc with many convenient features, including combinator-like library functions. I honestly enjoy the work of mastering Menhir, poring over the manual, which is all one page: https://gallium.inria.fr/~fpottier/menhir/manual.html

fuzztester•2m ago

>In OCaml, a language highly suited for developing languages in,

What makes OCaml suited for that?

o11c•6h ago

In terms of language-agnosticism, you can use Bison to calculate the tables (the hard part) and dump an xml file, then implement the machine yourself trivially.

I get really annoyed when people still complain about YACC while ignoring the four decades of practical improvement that Bison has given us if you bother to configure it.

zahlman•6h ago

> But in practice I bounce back and forth between two languages right now (Go and Python, neither of which have such a standard parser ecology)

https://pypi.org/project/pybison/ , or its predecessors such as https://pypi.org/project/ply/ ?

But yes, the decidedly non-traditional https://github.com/pyparsing/pyparsing/ is certainly more popular.

marssaxman•6h ago

I have never found parser generators to be worth the hassle. Recursive descent with a little Pratt-style precedence climbing is all you need.

derriz•2h ago

Agree completely and I’ve used a bunch of them and also functional combinator libraries. I‘d go further and say the recursive descent and Pratt approach is the way if you want to offer useful error messages and feedback to the user. They’re also trivial to debug and test unlike any generation based approach.

chadcmulligan•4h ago

fwiw LLM's seem very good at writing recursive descent parsers, at least for the small experiments I've done (wrote a Lua parser in Delphi).

UncleOxidant•2h ago

Agreed. I recently had Gemini write a recursive descent parser for a specified subset of C in C and it did quite well. I've tried similar with Claude 4 and Qwen3 Coder and again, both did quite well.

ogogmad•3h ago

Have people heard of the following top-down parsing algorithm for mathematical expressions:

  1. Replace any expression that's within parentheses by its parse tree by using recursion
  2. Find the lowest precedence operator, breaking ties however you'd like. Call this lowest precedence operator OP.
  3. View the whole unparsed expression as `x OP y`
  4. Generate a parse tree for x and for y. Call them P(x) and P(y).
  5. Return ["OP", P(x), P(y)].

It's easy to speed up step 2 by keeping a table of all the operators in an expression, sorted by their precedence levels. For this table to work properly, the positions of all the tokens must never change.

Enough AI copilots, we need AI HUDs

Big agriculture mislead the public about the benefits of biofuels

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork

Dumb Pipe

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

The Meeting Culture

Blender: Beyond Mouse and Keyboard

How I fixed my blog's performance issues by writing a new Jekyll plugin

Multiplex: Command-Line Process Mutliplexer

I hacked my washing machine

Software Development at 800 Words per Minute

Making Postgres slower

EU age verification app to ban any Android system not licensed by Google

Claude Code Router

ZUSE – The Modern IRC Chat for the Terminal Made in Go/Bubbletea

VPN use surges in UK as new online safety rules kick in

Formal specs as sets of behaviors

Solid protocol restores digital agency

Ask HN: What are you working on? (July 2025)

Why I write recursive descent parsers, despite their issues (2020)

The JJ VCS workshop: A zero-to-hero speedrun

“Tivoization” and your right to install under Copyleft and GPL (2021)

Digitising CDs (a.k.a. using your phone as an image scanner)

IBM Keyboard Patents

Fourble turns lists of MP3 files hosted anywhere into podcasts

How big can I print my image?

Designing a flatpack bed

Bits 0x02: switching to orion as a browser

Tom Lehrer has died

Why does a fire truck cost $2m

Enough AI copilots, we need AI HUDs

Big agriculture mislead the public about the benefits of biofuels

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork

Dumb Pipe

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

The Meeting Culture

Blender: Beyond Mouse and Keyboard

How I fixed my blog's performance issues by writing a new Jekyll plugin

Multiplex: Command-Line Process Mutliplexer

I hacked my washing machine

Software Development at 800 Words per Minute

Making Postgres slower

EU age verification app to ban any Android system not licensed by Google

Claude Code Router

ZUSE – The Modern IRC Chat for the Terminal Made in Go/Bubbletea

VPN use surges in UK as new online safety rules kick in

Formal specs as sets of behaviors

Solid protocol restores digital agency

Ask HN: What are you working on? (July 2025)

Why I write recursive descent parsers, despite their issues (2020)

The JJ VCS workshop: A zero-to-hero speedrun

“Tivoization” and your right to install under Copyleft and GPL (2021)

Digitising CDs (a.k.a. using your phone as an image scanner)

IBM Keyboard Patents

Fourble turns lists of MP3 files hosted anywhere into podcasts

How big can I print my image?

Designing a flatpack bed

Bits 0x02: switching to orion as a browser

Tom Lehrer has died

Why does a fire truck cost $2m

Why I write recursive descent parsers, despite their issues (2020)

Comments