Everyone frames this as "save tokens." Wrong framing. The real issue is reliability—same workflow, same data, different results every run.
You can't prompt your way out of bad input.
Distill fixes the input:
1. Over-fetch from vector DB (50 chunks) 2. Agglomerative clustering groups similar chunks 3. Select best representative from each cluster 4. MMR reranking for diversity
Result: 8-12 diverse chunks. ~12ms overhead. Zero LLM calls. Deterministic.
Written in Go. Works with Pinecone, others like Qdrant, Weaviate are coming soon. Runs post-retrieval, pre-inference.
GitHub: https://github.com/Siddhant-K-code/distill Playground: https://distill.siddhantkhare.com
Happy to discuss the algorithms, tradeoffs, or use cases.