In between "easiest to get started with" and "what production-grade systems use", there is "easy to actually finish a medium-sized project with." I think LR parsers still defend that middle ground pretty well.
That was part of my question I think. I wouldn't have been able to tell you that the dominant paradigm being argued against was LR parsers, because I've never come across even one that I'm aware of (I've heard of them, but that's about it). Perhaps it's academia where they're popular?
LR however is more powerful, though this mostly matters if you don't have access to automatic grammar rewriting for your LL. More significantly, however, there's probably more good tooling for LR (or perhaps: you can assume that if tooling exists, it is good at what it is designed for); one problem with LL being so "simple" is that there's a lot of bad tooling out there.
The important things are 1. that you meaningfully eliminate ambiguities (which is easy to enforce for LR and doable for LL if your tooling is good), and 2. that you keep linear time complexity. Any parser other than LL/LR should be rejected because it fails at least one of these, and often both.
Within the LL and LR families there are actually quite a few members. SLR(1) is strong enough to be interesting but too weak for anything I would call a "language". LALR(1) is probably fine; I have never encountered a useful language that must resort to LR(1) (though note that modern tooling can do an optimistic fallback, avoiding the massive blowups of ancient LR tools). SLL(1) I'm not personally familiar with. X(k), where X is one of {SLL, LL, SLR, LALR, LR} and where k > 1, are not very useful; k=1 suffices. LL(*) however should be avoided due to backtracking, but in some cases consider if you can parsing token trees first (this is currently poorly represented in the literature; you want to be doing some form of this for error recovery anyway - automated error recovery is a useless lie) and/or defer the partial ambiguity until the AST is built (often better for error messages anyway, independent of using token trees).
The idea that you're going to hand-roll a parser generator and then use that to generate a parser and the result is going to be less buggy than just hand-rolling a recursive descent parser, screams "I've never written code outside of an academic context".
SQLite, perhaps the most widely deployed software system, takes this approach.
> The Lemon LALR(1) Parser Generator
> The SQL language parser for SQLite is generated using a code-generator program called "Lemon".
> ...
> Lemon was originally written by D. Richard Hipp (also the creator of SQLite) while he was in graduate school at Duke University between 1987 and 1992.
Here are the grammars, if you're curious.
But I do think the wider point is still true, that there can be real benefit to implementing 2 proper layered abstractions rather than implementing 1 broader abstraction where the complexity can span across more of the problem domain.
Your comment is quite funny as hand-rolling a recursive descent parser is the kind of thing that is often accused of being a) bug-prone, b) only done in academic environments.
Accused by who? Literal idiots? Most parsers used in production are hand-rolled recursive descent parsers.
It seems to be mainly academics and others interested in parsing theory, and those who like complexity for the sake of complexity.
In OCaml, a language highly suited for developing languages in, that de facto standard is the Menhir LR parser generator. It's a modern Yacc with many convenient features, including combinator-like library functions. I honestly enjoy the work of mastering Menhir, poring over the manual, which is all one page: https://gallium.inria.fr/~fpottier/menhir/manual.html
What makes OCaml suited for that?
I get really annoyed when people still complain about YACC while ignoring the four decades of practical improvement that Bison has given us if you bother to configure it.
https://pypi.org/project/pybison/ , or its predecessors such as https://pypi.org/project/ply/ ?
But yes, the decidedly non-traditional https://github.com/pyparsing/pyparsing/ is certainly more popular.
1. Replace any expression that's within parentheses by its parse tree by using recursion
2. Find the lowest precedence operator, breaking ties however you'd like. Call this lowest precedence operator OP.
3. View the whole unparsed expression as `x OP y`
4. Generate a parse tree for x and for y. Call them P(x) and P(y).
5. Return ["OP", P(x), P(y)].
It's easy to speed up step 2 by keeping a table of all the operators in an expression, sorted by their precedence levels. For this table to work properly, the positions of all the tokens must never change.
fjfaase•4d ago
Recursive decent parsers can simply be implemented with recusive functions. Implementing semantic checks becomes easy with additional parameters.
WalterBright•8h ago
What a waste of time. I failed miserably.
However, I also realized that the only semantic information needed was to keep track of typedefs. That made recursive descent practical and effective.
ufo•8h ago
[1] https://en.wikipedia.org/wiki/Lexer_hack
fjfaase•1h ago