I am camillo and maker of symdex-100 - semantic fingerprints for fast and token-efficient code-base search.
Symdex-100 indexes every function in your repo into a small SQLite sidecar (`.symdex/index.db`). Each function gets a structured ~20-byte “Cypher” (e.g. `SEC:VAL_TOKEN--ASY` = security, validates token, async) instead of opaque embeddings. You search by intent—“where do we validate user tokens”—and get sub-second, ranked results from the index. Source files are never modified.
Why: Grep and full-text search scale poorly: keyword noise, no notion of “what this function does.” AI agents burn 5k+ tokens reading 10 files to find one function. Symdex compresses function semantics into a queryable index so both humans and agents can go straight to the right place. We see up to ~50x fewer tokens for agent code exploration and ~100x faster index lookup than grepping the same codebase.
Tech (short): Python AST → per-function metadata; LLM (or rule fallback) assigns a Cypher from a fixed taxonomy (domain : action _ object -- pattern). Tiered Cypher patterns (tight/medium/broad) + multi-lane retrieval (exact, domain wildcard, action, tags, name) over SQLite with a candidate cap. Call graph is indexed too (callers/callees/trace). MCP server so Cursor/Claude can `search_codebase("validate token")` and get one precise hit instead of reading half the repo.
Try it: Currently works only locally via clone and pip install -e ".[all]" (soon on pypi via pip install symdex).
Next, set `ANTHROPIC_API_KEY`, then `symdex index .` and `symdex search "validate user tokens"`. Works with OpenAI/Gemini too; or `SYMDEX_CYPHER_FALLBACK_ONLY=1` for no API key (rule-based Cyphers only). CLI, Python API, and MCP (stdio/Streamable HTTP). Docker image for remote MCP.
Repo: https://github.com/symdex-100/symdex/ Docs: README has architecture, benchmarks, FAQ.)