Pipeline is Expansion → BM25/phrase/vector Retrieval → RRF Fusion → optional Qwen reranking. Each stage is independently tunable.
The part I found most interesting to build: the caching layer is modeled after Zig's build system. A BLAKE3 manifest store tracks filesystem metadata so sift knows which files changed without re-reading them. A content-addressable blob store holds pre-extracted text, BM25 term frequencies, and pre-embedded vectors — so repeat queries skip neural inference entirely and go straight to dot-product scoring. Identical files across projects share a single blob entry.
Benchmarked on SciFact (5,185 docs): vector hits 0.826 nDCG@10 with perfect recall at ~26ms p50. BM25 alone is 5ms if latency is the constraint.
Repo: github.com/rupurt/sift