I use tree-sitter for developing a custom programming language, you still need an extra step to get from CST to AST, but the overall DevEx is much quicker that hand-rolling the parser.
Could you elaborate on what this involves? I'm also looking at using tree-sitter as a parser for a new language, possibly to support multiple syntaxes. I'm thinking of converting its parse trees to a common schema, that's the target language.
I guess I don't quite get the difference between a concrete and abstract syntax tree. Is it just that the former includes information that's irrelevant to the semantics of the language, like whitespace?
AST is just CST minus range info and simplified/generalised lexical info (in most cases).
Hmm, the strong reason could be latency and layout stability. Tree-sitter parses on the main thread (or a close worker) typically in sub-ms timeframes, ensuring that syntax coloring is synchronous with keystrokes. LSP semantic tokens are asynchronous by design. If you rely solely on LSP for highlighting, you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.
The ideal hygiene could be something like -> tree-sitter provides the high-speed lexical coloring (keywords, punctuation, basic structure) instantly and LSP paints the semantic modifiers (interfaces vs classes, mutable vs const) asynchronously like 200ms later. Relying on LSP for the base layer makes the editor feel sluggish.
Tree-sitter has okay error correction, and that along with speed (as you mentioned) and its flexible query language makes it a winner for people to quickly iterate on a working parser but also obviously integration into an actual editor.
Oh, and some LSPs use tree-sitter to parse.
- i got a hint of language server and tree sitter thanks to this wonderfully written post but it is still missing a lot of details like how does the protocol actually look like, what does a standard language server or tree sitter implementation looks like
- what are the other building blocks?
> Language servers are powerful because they can hook into the language’s runtime and compiler toolchain to get semantically correct answers to user queries. For example, suppose you have two versions of a pop function, one imported from a stack library, and another from a heap library. If you use a tool like the dumb-jump package in Emacs and you use it to jump to the definition for a call to pop, it might get confused as to where to go because it’s not sure what module is in scope at the point. A language server, on the other hand, should have access to this information and would not get confused.
You are correct that a language server will generally provide correct navigation/autocomplete, but a language server doesn’t necessarily need to hook into an existing compiler: a language server might be a latency-sensitive re-implementation of an existing compiler toolchain (rust-analyzer is the one I’m most familiar with, but the recent crop of new language servers tend to take this direction if the language’s compiler isn’t query-oriented).
> It is possible to use the language server for syntax highlighting. I am not aware of any particularly strong reasons why one would want to (or not want to) do this.
Since I spent a lot of time in Rust, I’ll use Rust as an example: you can highlight a binding if it’s mutable or style an enum/struct differently. It’s one of those small things that makes a big impact once you get used to it: editors without semantic syntax highlighting (as it is called in the LSP specification) feel like they’re naked to me.
tetris11•1h ago
taeric•1h ago
codethief•1h ago
woodruffw•1h ago
[1]: https://github.com/tree-sitter-grammars/tree-sitter-yaml
_ache_•1h ago
Since it comes from `tree-sitter-grammars/tree-sitter-yaml`, it may be quick to integrate the official repo.
matthew-craig•1h ago
awk bash bibtex blueprint c c-sharp clojure cmake commonlisp cpp css dart dockerfile elixir glsl gleam go gomod heex html janet java javascript json julia kotlin latex lua magik make markdown nix nu org perl proto python r ruby rust scala sql surface toml tsx typescript typst verilog vhdl vue wast wat wgsl yaml
johanvts•1h ago
jasonjmcghee•1h ago
For others, this is a sub optimal answer, but I’ve played with generating grammars with latest llms and they are surprisingly good at doing this (in a few shots).
That being said, if you’re doing something more serious than syntax highlighting or shipping it in a product, you’ll want to spend more time on it.
zokier•1h ago
https://github.com/tree-sitter/tree-sitter/wiki/List-of-pars...