Show HN: Is grep enough? A transparent benchmark for agentic code navigation

https://entelligentsia.github.io/is-grep-enough/

2•bonigv•1h ago

Felt LSP Servers were too complex. Bash tools alone too brutish. Wanted to see what if it is a tree-sitter as a firstclass tool. Ran a bench over 10 large codebases [bitcoin, django, rails, redis,...] at 5 levels of exploration complexity each. That 150 context isolated runs over the last few days. Sharing the results with full tarnsparency. All scripts, docker image scripts, all transcrpts. There is a TL;DR; but I hope you don't leave it at that. Has been quite a bit of work. Repo links are on the site.

Comments

6thbit•1h ago

This is nicely put together, it does make sense that lsps help more as complexity grows because makes navigation across symbols easier.

I hope someone with a large budget can reproduce these with latest Opus/gpt.

My gut feeling is that higher reasoning models tend to use grep more effectively. But intuitively lsp should still win there.

bonigv•57m ago

You are absolutely right about what we feel intuitively - LSPs should beat the shit out of the competition. But surprisingly it did not. Across 10 different LSP servers, across 5 different levels of prompt complexity it did not. Mind you, I painstakingly warmed up the LSP servers that needed it warmed. Some liked it cold and it fared equally non impressively. The pattern I saw was, LLMs (sonnet w.6 with cc) was very clever to use whatever it had to get to a verifiable answer. It could do it just with bash for sure. But as the prompt complexity grew the cost also rose.

Treesitter is sitting in a sweet spot here. a vrainy LLM can find the shortest path with high quality with treesitter and a few bash calls.

Api.weather.gov's robots.txt disallows all bots

Cursor for iOS

Art Benefits Transaction (ABT): A Proposal for Economic Stimulus

Has Perfume Become Samey?

Big Data File Formats

Sunwæe – your life's AI OS

Language Design Impacts Security

Why Token Optimization Is a Gift to the Hyperscalers

Help! My passive fund is aggressively US tech focused

The Humanoid That Pays to Stand Still – Robotics

Open USD

Show HN: TraceAIO – open-source LLM visibility tracker

Solved.Earth

Counterfeit Verifiability in Autonomous Agent Payments

Speculative Supply Chains: How Rational Incentives Manufacture Madness of Crowds

The AI Productivity Trap

Does Social Media Use Matter for Students' Well-Being?

Show HN: Statuslin.es – a community library of custom Claude Code status lines

Using Playwright to test my static sites

AI and Us: It's Complicated

Workers' share of income explains why many Americans are down on the economy

Reasoning About Async Rust with State Machines

HTTP Status Codes Explained (100–599)

Mojo Quest: A browser-based game for learning Mojo syntax

Too many tables are bad for you

Fata Morgana (Mirage)

Rendering ray tracing in a database (ClickHouse)

Out of the loop

The Grammar of Data: Define Once, Run Anywhere with Cross-Engine Expressions

Show HN: Debategle – ranked 1v1 debates judged by an LLM