I kept hitting Claude Code's rate limits. The usual workaround is feeding file contents into the conversation to help the agent
find what it needs — but you hit the ceiling faster, and the repo starts accumulating context files that exist purely to compensate
for bad search.
I wanted to try something different. Instead of giving the agent files to read, give it a proper search tool. Describe what you're
looking for in plain English, get back the exact function — not a file to skim, not 80 grep matches to filter through.
The approach: parse every tracked file with tree-sitter on function and class boundaries, embed each chunk, store vectors in SQLite,
search with KNN. The result lands directly in context — already extracted, ready to use.
To avoid re-indexing on every machine, embeddings live on a Git orphan branch that mirrors your source tree. One person indexes and
pushes it, the whole team fetches and searches. No API key on the other end, no re-embedding.
Some searches work surprisingly well. Others miss. The bigger unsolved problem is getting coding agents to actually use it — they
default to grep because that's what they know. I can inject a rule into CLAUDE.md telling them to use semantic search instead, but
subagents spawn with fresh context and don't consistently follow it.
That's kind of the whole point of the experiment — I don't know yet if this is meaningfully better than what agents do today. I need
more people using it across different codebases to find out. If you try it, I'm curious: does it actually help, or does it just shift
the problem?
→ https://github.com/ccherrad/git-semantic
cchrrd•2h ago