Sairo is a single Docker container that indexes your bucket into SQLite FTS5 and gives you full-text search in 2.4ms (p50) across 134K objects / 38 TB. No external databases, no microservices, no message queues.
What it does: - Instant search across all your objects (SQLite FTS5, 1,300 objects/sec indexing) - File preview for 45+ formats — Parquet schemas, CSV tables, PDFs, images, code - Password-protected share links with expiration - Version management — browse, restore, purge versions and delete markers - Storage analytics with growth trend charts - RBAC, 2FA, OAuth, LDAP, audit logging - CLI with 24 commands (brew install ashwathstephen/sairo/sairo)
Works with AWS S3, MinIO, Cloudflare R2, Wasabi, Backblaze B2, Ceph, and any S3-compatible endpoint.
docker run -d -p 8000:8000 \
-e S3_ENDPOINT=https://your-endpoint.com \
-e S3_ACCESS_KEY=xxx -e S3_SECRET_KEY=xxx \
stephenjr002/sairo
Site: https://sairo.dev
GitHub: https://github.com/AshwathStephen/sairoI'd love honest feedback — what's missing, what would make you actually switch to this?
ashwathstephen•2h ago
I manage ~160 TB of Apache Iceberg table data across multiple S3-compatible backends (Leaseweb object storage, not AWS). The AWS console and mc CLI were the only options for browsing, and both are painfully slow for large buckets — 14 seconds to search in the console, 3 minutes to enumerate with mc.
The core idea is simple: a background crawler indexes every object key into SQLite FTS5 (about 1,300 objects/sec), and then search is just a local full-text query. No external database needed — each bucket gets its own SQLite file in WAL mode.
A few things I'm particularly happy with: - Parquet/ORC/Avro schema preview without downloading the file (reads just the footer bytes via range requests) - Version scanner that finds hidden delete markers and ghost objects that the S3 API doesn't surface in normal listings - Works the same across AWS, MinIO, R2, Wasabi, B2, Ceph — tested against all of them
What I'm still figuring out: how to handle buckets with 10M+ objects efficiently. The current crawler works well up to ~500K but I'd love ideas on scaling the indexing beyond that.
Happy to answer questions about the architecture or S3 provider quirks.