Then I realized: if an object store knows every file's content hash, duplicates are just a GROUP BY.
Built a tool that serves local files via S3 API:
docker run -it -v ~/Photos:/data -p 9000:9000 ghcr.io/deepjoy/shoebox /data
Credentials auto-generated and printed on startup. Works with rclone, AWS CLI,
any S3 SDK. Files stay exactly where they are.The duplicate detection I wanted is one query. But the real surprise: every S3 tool just works. I set out to find duplicate photos and accidentally built a local S3 server.
What it is: S3-compatible object store for local filesystems. Rust, axum, SQLite for metadata. MIT licensed. Or build from source with `cargo install`.
What it isn't: not distributed, not for petabyte scale, not a MinIO replacement. MinIO is built for production clusters. This is built for the NAS in your closet. Single node, single process. If you need multi-machine replication, use MinIO or SeaweedFS.
Tested on real hardware this past week. A few early users ran it against their own NAS setups on btrfs, ext4, and ZFS. Built a companion webapp to browse duplicates visually (duplicate detection is non-standard S3, so terminal demos weren't compelling).
GitHub: https://github.com/deepjoy/shoebox Companion webapp: https://deepjoy.github.io/shoebox-webapp/