Felt LSP Servers were too complex. Bash tools alone too brutish. Wanted to see what if it is a tree-sitter as a firstclass tool. Ran a bench over 10 large codebases [bitcoin, django, rails, redis,...] at 5 levels of exploration complexity each. That 150 context isolated runs over the last few days. Sharing the results with full tarnsparency. All scripts, docker image scripts, all transcrpts. There is a TL;DR; but I hope you don't leave it at that. Has been quite a bit of work. Repo links are on the site.
Comments
6thbit•1h ago
This is nicely put together, it does make sense that lsps help more as complexity grows because makes navigation across symbols easier.
I hope someone with a large budget can reproduce these with latest Opus/gpt.
My gut feeling is that higher reasoning models tend to use grep more effectively. But intuitively lsp should still win there.
bonigv•57m ago
You are absolutely right about what we feel intuitively - LSPs should beat the shit out of the competition. But surprisingly it did not. Across 10 different LSP servers, across 5 different levels of prompt complexity it did not. Mind you, I painstakingly warmed up the LSP servers that needed it warmed. Some liked it cold and it fared equally non impressively. The pattern I saw was, LLMs (sonnet w.6 with cc) was very clever to use whatever it had to get to a verifiable answer. It could do it just with bash for sure. But as the prompt complexity grew the cost also rose.
Treesitter is sitting in a sweet spot here. a vrainy LLM can find the shortest path with high quality with treesitter and a few bash calls.
6thbit•1h ago
I hope someone with a large budget can reproduce these with latest Opus/gpt.
My gut feeling is that higher reasoning models tend to use grep more effectively. But intuitively lsp should still win there.
bonigv•57m ago
Treesitter is sitting in a sweet spot here. a vrainy LLM can find the shortest path with high quality with treesitter and a few bash calls.