Existing tools like AWS CLI (aws s3 sync --dryrun) and s5cmd (--dry-run) are great for many workflows, but we had slightly different requirements for deeper and more flexible comparisons. That led us to build this tool, and we’ve now open-sourced it for broader use.
Key Features:
1. Bidirectional diff: Given two buckets A and B, the tool reports both (A−B) and (B−A).
2. Shallow Check Mode: Compares object name + ETag to determine presence.
3. Deep check mode: Compares tags, metadata, and ObjectLock/WORM via HeadObject.
4. Resumability: Checkpointing allows long-running jobs to resume seamlessly. For example, a 10M-object run interrupted at 8M continues from where it left off.
Performance (500K objects): Shallow mode: ~3,900 objects/sec Deep mode: ~1,345 objects/sec (25 worker processes)
The tool works across any two S3-compatible buckets.
Happy to discuss any query regarding the tool!