Show HN: Sairo – Self-hosted S3 browser with 2.4ms search across 134K objects

4•ashwathstephen•2h ago

I built Sairo because searching for a file in the AWS S3 console takes ~14 seconds on a large bucket, and MinIO removed their web console from the free edition last year.

Sairo is a single Docker container that indexes your bucket into SQLite FTS5 and gives you full-text search in 2.4ms (p50) across 134K objects / 38 TB. No external databases, no microservices, no message queues.

What it does: - Instant search across all your objects (SQLite FTS5, 1,300 objects/sec indexing) - File preview for 45+ formats — Parquet schemas, CSV tables, PDFs, images, code - Password-protected share links with expiration - Version management — browse, restore, purge versions and delete markers - Storage analytics with growth trend charts - RBAC, 2FA, OAuth, LDAP, audit logging - CLI with 24 commands (brew install ashwathstephen/sairo/sairo)

Works with AWS S3, MinIO, Cloudflare R2, Wasabi, Backblaze B2, Ceph, and any S3-compatible endpoint.

  docker run -d -p 8000:8000 \
    -e S3_ENDPOINT=https://your-endpoint.com \
    -e S3_ACCESS_KEY=xxx -e S3_SECRET_KEY=xxx \
    stephenjr002/sairo

Site: https://sairo.dev GitHub: https://github.com/AshwathStephen/sairo

I'd love honest feedback — what's missing, what would make you actually switch to this?

Comments

ashwathstephen•2h ago

Hi, I'm the author. Some context on why I built this:

I manage ~160 TB of Apache Iceberg table data across multiple S3-compatible backends (Leaseweb object storage, not AWS). The AWS console and mc CLI were the only options for browsing, and both are painfully slow for large buckets — 14 seconds to search in the console, 3 minutes to enumerate with mc.

The core idea is simple: a background crawler indexes every object key into SQLite FTS5 (about 1,300 objects/sec), and then search is just a local full-text query. No external database needed — each bucket gets its own SQLite file in WAL mode.

A few things I'm particularly happy with: - Parquet/ORC/Avro schema preview without downloading the file (reads just the footer bytes via range requests) - Version scanner that finds hidden delete markers and ghost objects that the S3 API doesn't surface in normal listings - Works the same across AWS, MinIO, R2, Wasabi, B2, Ceph — tested against all of them

What I'm still figuring out: how to handle buckets with 10M+ objects efficiently. The current crawler works well up to ~500K but I'd love ideas on scaling the indexing beyond that.

Happy to answer questions about the architecture or S3 provider quirks.

Show HN: Omni – Open-source workplace search and chat, built on Postgres

Show HN: Web Audio Studio – A Visual Debugger for Web Audio API Graphs

Show HN: I turned Claude Code into a personal assistant

Show HN: Ledge - Policy layer for AI agent payments (prevents unauthorized txns)

Show HN: Clean Express – a native NNTP/Usenet client for iOS/macOS/visionOS

Show HN: MoodJot – Mood tracker mobile app with community feed, built with KMP

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

Show HN: Open-source Loom / Screen Studio with editing and auto-zoom

Show HN: Try Archetype 360 – AI‑powered personality test, 3× deeper than MBTI

Show HN: Ralphex – autonomous GPT Codex agent loop for ChatGPT Pro users

Show HN: Dungeon Coverage – Unit testing as a dungeon crawler

Show HN: HushBrief – A stateless, zero-retention AI document summarizer

Show HN: Aigent – A general-purpose AI agent built for self-improvement

Show HN: Rust Based SEO and AEO Crawler

Show HN: Sairo – Self-hosted S3 browser with 2.4ms search across 134K objects

Show HN: Two tools to make Claude Code more autonomous

Show HN: IDAssist – AI augmented reverse engineering for IDA Pro

Show HN: Augno – a Stripe-like ERP for manufacturing

Show HN: RDAP API – Normalized JSON for Domain/IP Lookups (Whois Replacement)

Show HN: Tradefacts.io – US HTS tariff schedule, a JSON API and change detection

Show HN: Atrium – An open-source, self-hosted client portal

Show HN: Photon – Rust pipeline that embeds/tags/hashes images locally w SigLIP

Show HN: Homebutler – Manage multiple servers from chat, single binary

Show HN: EasyClaw – One-click installer for OpenClaw AI agent

Show HN: AgentKeeper – cognitive persistence layer for AI agents

Show HN: Pulse – a beautiful service monitor that lives in your notch

Show HN: PipeDream – A state-aware AI visualizer for CLI text adventures

Show HN: I built a zero-browser, pure-JS typesetting engine for bit-perfect PDFs

Show HN: Oc-mnemoria – Persistent memory for AI coding agents

Show HN: Logira – eBPF runtime auditing for AI agent runs

Show HN: Sairo – Self-hosted S3 browser with 2.4ms search across 134K objects

Comments

Show HN: Omni – Open-source workplace search and chat, built on Postgres

Show HN: Web Audio Studio – A Visual Debugger for Web Audio API Graphs

Show HN: I turned Claude Code into a personal assistant

Show HN: Ledge - Policy layer for AI agent payments (prevents unauthorized txns)

Show HN: Clean Express – a native NNTP/Usenet client for iOS/macOS/visionOS

Show HN: MoodJot – Mood tracker mobile app with community feed, built with KMP

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

Show HN: Open-source Loom / Screen Studio with editing and auto-zoom

Show HN: Try Archetype 360 – AI‑powered personality test, 3× deeper than MBTI

Show HN: Ralphex – autonomous GPT Codex agent loop for ChatGPT Pro users

Show HN: Dungeon Coverage – Unit testing as a dungeon crawler

Show HN: HushBrief – A stateless, zero-retention AI document summarizer

Show HN: Aigent – A general-purpose AI agent built for self-improvement

Show HN: Rust Based SEO and AEO Crawler

Show HN: Sairo – Self-hosted S3 browser with 2.4ms search across 134K objects

Show HN: Two tools to make Claude Code more autonomous

Show HN: IDAssist – AI augmented reverse engineering for IDA Pro

Show HN: Augno – a Stripe-like ERP for manufacturing

Show HN: RDAP API – Normalized JSON for Domain/IP Lookups (Whois Replacement)

Show HN: Tradefacts.io – US HTS tariff schedule, a JSON API and change detection

Show HN: Atrium – An open-source, self-hosted client portal

Show HN: Photon – Rust pipeline that embeds/tags/hashes images locally w SigLIP

Show HN: Homebutler – Manage multiple servers from chat, single binary

Show HN: EasyClaw – One-click installer for OpenClaw AI agent

Show HN: AgentKeeper – cognitive persistence layer for AI agents

Show HN: Pulse – a beautiful service monitor that lives in your notch

Show HN: PipeDream – A state-aware AI visualizer for CLI text adventures

Show HN: I built a zero-browser, pure-JS typesetting engine for bit-perfect PDFs

Show HN: Oc-mnemoria – Persistent memory for AI coding agents

Show HN: Logira – eBPF runtime auditing for AI agent runs