I built a duplicate file finder for my brother who's a photography enthusiast and constantly runs out of storage constantly due to resizing a lot of photos and having a lot of duplicates around.
Notes: - Incremental hashing: Instead of loading entire files into memory, I hash files in chunks. Files with identical sizes get grouped and progressively hashed until they diverge or match completely. - Perceptual hashing: For images, I use perceptual hashing (pHash) that generates a fingerprint based on visual content rather than bytes. Similar images have similar hashes. - BK-Tree indexing: To efficiently search for similar hashes, I implemented a BK-tree that organizes hashes by Hamming distance. This lets me query "find all images within distance N" without comparing against every single hash. - Configurable similarity: Users can adjust the Hamming distance threshold (1-15) to control how strict the matching should be. - Added macOS Services integration so you can right-click any folder in Finder and select "Scan for Duplicates"
The app has a free trial (10 scans / 7 days, whichever is earlier) and then requires a license. I'm using Dodo Payments for licensing.
I'd love feedback from the community, especially on: - Performance optimizations I might have missed - Better UX patterns for the results view - Edge cases in the similarity detection - More feature suggestions
REQUIREMENTS: macOS 26.0.1 (Tahoe) and Apple Silicon Macs
Happy to answer questions about the implementation or architecture!