The solution: I built a tool that analyzes your staged git changes, clusters them by semantic similarity using embeddings + hierarchical clustering, and creates separate commits for each cluster.
How it works:
- Parses diffs into semantic chunks using tree-sitter
- Generates embeddings (OpenAI text-embedding-3-small)
- Clusters with single-linkage hierarchical clustering
- Interactive dendrogram UI to adjust cluster threshold
- Creates one commit per cluster
Currently Mac-only (uses self-written https client with kqueue), but happy to add cross-platform support if there's interest. Also planning local LLM/embedding support.
Looking for feedback!