The key insight: most photo sorting time is spent reviewing near-duplicates. Professional photographers shoot in bursts, so you get 10+ shots of the same moment.
PicPick uses CLIP embeddings to cluster visually similar photos, then adds face recognition to keep groups coherent (so you don't mix up "bride with parents" and "bride with friends" just because they look similar).
Tech stack: - CLIP for semantic similarity (not just perceptual hashing) - face_recognition (dlib) for person detection - DBSCAN clustering on combined features - FastAPI + vanilla JS for the UI - SQLite for everything
It reduced my review set from 5,000 → ~1,000 clusters, which I then filtered down to 300 for the album in a few hours instead of days.
The clustering parameters are tunable - tighter for professional shoots with many duplicates, looser for casual photos.
Open to feedback! Especially around: 1. Better clustering algorithms (currently DBSCAN on CLIP embeddings + timestamps + face vectors) 2. UI improvements for rapid reviewing 3. Handling photos without faces (landscapes, food, etc.)
Works entirely offline, no cloud uploads needed.