Show HN: ZeroFS – A log-structured filesystem for S3

https://www.zerofs.net/

126•Eikon•1d ago

Comments

abtinf•1d ago

Entrusting data storage to a vibe coded filesystem seems imprudent.

Eikon•1d ago

Is it? :)

Ask me anything!

wyager•1d ago

FYI it looks like some of your comments are getting auto-flagged by the HN moderation system and marked as dead

abtinf•1d ago

Here is my unsolicited advice:

If one of your goals is to get others to adopt the software, I recommend you redo the marketing page and readme from scratch. Delete them without looking at them again, then hand write the content for them. Once you have the content, you call tell an LLM to format it into a nice landing page, but strictly keep your wording without changes.

Eikon•1d ago

That's fair advice, thanks.

dan_sbl•1d ago

> The test suites run in public CI.

> Each card links to the CI pipeline.

Thanks for being explicit, AI written marketing site. Wouldn't have been able to figure that out! Every currently maintained and reasonably popular open source project either runs CI in public or makes the tests extremely easy to run.

xx_ns•1d ago

I got the same vibe from

> These are asciinema recordings of real terminal sessions, rendered as text rather than video. Playback caps idle pauses at two seconds and changes nothing else.

Thanks? This sounds like it's the LLM's response to the prompter, not something you should display on the page itself...

dizhn•1d ago

I feel bad for actually liking that part now. Capping pauses at 2 seconds would show you where it hung 2+ seconds without wasting your time. Smart I thought.

progbits•1d ago

I see this all the time in code reviews at work. Extremely verbose comments that teach the clueless author how things work but have no place in the final code: aside from codebase not being a coding tutorial, they are also incredibly specific and would become stale and incorrect in matter of weeks.

Eikon•1d ago

Thank you for the feedback, the idea behind this was to say "We make claims that are backed by workflows you can verify". I'll improve the phrasing.

abtinf•1d ago

Why does this landing page load js from merklemap.com?

xx_ns•1d ago

Both projects have the same author.

Eikon•1d ago

Just a self hosted plausible instance :)

toastal•1d ago

The page doesn’t load anything for me… I block JS by default, & something that should be informational is hiding it’s content behind scripts for some reason.

breckognize•1d ago

Under the hood, S3's storage nodes are also built on a log-structured file system: https://cdn.amazon.science/77/5e/4a7c238f4ce890efdc325df8326...

(Not posix compliant because it doesn't need to be.)

iamalizaidi•1d ago

Seems purely vibecoded

lukewarm707•1d ago

wonder when we get agents good enough that we can't say vibecode any more and have to say 'code'.

there was slop with ai jesus but now gpt image is just a photo with hidden watermark

tmach32•1d ago

See also: JuiceFS, S3FS, and quite a few others.

We have done loads of research into using object storage wherever we can (given how cheap it is compared to SSDs), and so far it seems like making your application object store-aware is a far surer bet than abstracting S3 behind the file system. The behavior is just too different.

I'm more interested in applications that cleverly use object storage, e.g. AutoMQ, which is quite compatible with Kafka APIs but needs no HDDs.

the8472•1d ago

s3fs doesn't provide posix semantics. It's good enough™ for some uses, but not comparable to what this one is ostensibly providing.

nyc_pizzadev•1d ago

I agree that most usecases are best suited just using S3 directly, but I would check this out if you want an S3 based filesystem: https://fiberfs.io/

coxley•1d ago

From the docs:

> ZeroFS fetches object data in 128 KiB parts

Read/write operations in object storage are _far more_ expensive than stored bytes. I'm always afraid of anything that abstracts over S3/GCS access specifically for that reason.

throw1234567891•1d ago

Especially that the “one fetch” is who knows how many reads and retries under the hood.

karakanb•1d ago

One of the reasons why ZeroFS seems interesting is they use SlateDB under the hood, which optimizes the requests that hit S3 behind the scenes.

preetham_rangu•1d ago

How does this compare to JuiceFS or SeaweedFS in terms of metadata latency? The LSM tree approach is interesting but compaction pauses on a remote-backed store seem like they could be painful.

tribal808•1d ago

I’ve seen things like this before; your key differentiator needs to be efficiency and safety compared to other options.

felooboolooomba•1d ago

OP this is the best advice here.

Since you are harnessing the sorcery of AI, have it write really good benchmarks, run tests and comparisons on competitive products, (and publish them), look up common pitfalls, often requested features, run security analysis.

Also with marketing texts, write your self first and then you can ask AI to hone it or give you feedback. AI slopped marketing text is visible from miles and really, really puts people off. Even if the product itself would be fine, there is some much slop slushing around in the pipes at the moment.

I really like this project and want to see it succeed! Don't let naysayers wear you down.

Eikon•1d ago

Thank you, appreciated!

ChocolateGod•1d ago

I believe the first version of this required the metadata to be stored on the ZeroFS server, making HA kinda hard.

This has changed now that if I stop the server and create a new instance with the same configuration file it'll pickup the existing metadata from the bucket?

Eikon•1d ago

> I believe the first version of this required the metadata to be stored on the ZeroFS server, making HA kinda hard.

Metadata has always been in the bucket itself.

For HA, there's now a "replicated mode" if you want automatic failover:

https://www.zerofs.net/docs/high-availability

ChocolateGod•1d ago

The HA certainly makes this more of an alternative to JuiceFS now.

Would need something like HAProxy to correctly handle failover for NFS though?

aniketsaini777•1d ago

The 128 KiB chunk size is an interesting tradeoff point — small enough to avoid wasting bandwidth on partial reads, but you're still paying per-request overhead on S3 (both cost and latency) for anything that reads across many chunks. Curious how ZeroFS handles read-ahead/prefetching for sequential access patterns, since that's usually where these abstractions either save you or quietly rack up request costs. Tools like JuiceFS and SeaweedFS handle this differently (local metadata cache + larger block coalescing) — would be interesting to see a head-to-head on request volume for the same workload.

rockwotj•1d ago

The sub-millisecond writes with data in S3 is false and impossible. If you look at the benchmark the fsync is not timed, so this is just the latency of either the network or in kernel file operations depending on the mount settings

xyzzy_plugh•1d ago

I hate it when databases celebrate their performance without synchronous flushing. You should be clear about data loss window (which should be zero for committed transactions by default!) and the flushing interval to persistent storage.

I'm okay if you batch writes, I'm okay if you offer a low-latency mode with less durability, but by being unclear about this it just feels like a scam.

rockwotj•1d ago

Yeah in this case the footnote to the write latency specifically says “at rest in S3”, which is what caused me to go look at the source. To be clear I have no problem with the ZeroFS of only flushing on fsync.

I am very excited for object storage first systems like this to leverage low latency zonal storage for write ahead logs to keep the disaggregated storage but greatly reduce write latency. That ends up being more expensive, but is likely a good tradeoff in lots of cases I have seen

Eikon•1d ago

ZeroFS aims to be a POSIX filesystem, the semantics here are the standard ones (ext4, xfs behave the same): write() is buffered (that's the batching) and "committed" maps to fsync(), which returns only once data is durable.

rockwotj•1d ago

Nothing wrong with that, but you should remove the “at rest in S3” footnote from the write latency on the frontpage of the website, because that is not what is measured

chillfox•1d ago

I don’t get why they went for NFSv3, v4 is quite old and I can’t think of any reason why you would choose v3 over v4.

Eikon•1d ago

NFSv4 is a hard beast to implement correctly, with a lot of protocol surface (state, compound ops, delegations) for benefits ZeroFS mostly gets through 9P with extensions, over a much simpler protocol: https://www.zerofs.net/docs/9p-extensions. NFSv3 stayed in ZeroFS mostly for client compatibility.

chillfox•18h ago

Thanks for explaining it.

rockwotj•1d ago

Because it’s harder. It’s a stateful protocol and has a lot more features

amelius•1d ago

That name makes it sound like your files end up in /dev/null.

sqquima•1d ago

So, can I run this on top of RustFS? And RustFS on top of this?

Eikon•1d ago

Kind-of should work :)

rapatel0•1d ago

I prototyped something like this for fun a long time ago. Treating s3 like a bucket of blocks seemed intuitive way build a scalable filesystem. Arguably ceph and luster are doing something similar except with a seperate metadata servers to serve the hotter content.

I think the critical thing you will need to explain is durability and loss window. Making some guarentees on failure modes would go a long way towards making me believe i can run operations on something like this.

With AI you should be able to do some exhaustive testing both for load, power loss, server loss, etc. Anxious to see the potential results

nyc_pizzadev•1d ago

Worth mentioning FiberFS: https://fiberfs.io/

I believe it just recently launched.

dijksterhuis•1d ago

previous Show HNs

- https://news.ycombinator.com/item?id=48496242 -- 11 points | 20 days ago | 7 comments

- https://news.ycombinator.com/item?id=45174724 -- 64 points | 9 months ago | 40 comments

blog post 3 days ago: https://news.ycombinator.com/item?id=48712122

jbverschoor•1d ago

I’m more interested in a generic (offline) caching layer for example SMB or NFS. Preferably with doing conflict resolution like Dropbox

dangoodmanUT•23h ago

If you actually bench it locally (local S3, actually writes to disk for "staged" operations), ZeroFS performs horrifically. Ceph blows it out of the water. I don't have the exact numbers, but when I was building a toy CoW distributed block device and filesystem I did a perf matrix, and ZeroFS (even with an hour of codex tuning it) was never within the same 1 or 2 orders of magnitude perf-wise.

Sure, Ceph isn't "S3-backed", but when you're talking about an actual filesystem or block device (the thing that does lots of small-IO), you care more about io-perf than sequential.

Large sequential jobs that everyone is targeting now (ie AI workloads) can use s3 directly just fine, because they don't have decades of code built on top of the filesystem.

neverartful•22h ago

There are a couple of different ways of using Ceph with a filesystem: (1) CephFS, or (2) RBD (Ceph block device) volume mounted and then create filesystem on the mounted RBD volume. Historically, the RBD approach would likely have been the more common of the 2. Which of these 2 ways were you referring to?

gdevenyi•22h ago

There is some kind of advertising push for this going on. I got Twitter ads for this repo.

Show HN: ctx – Search the coding agent history already on your machine

Show HN: Mcpsnoop – Wireshark for MCP (transparent proxy and live TUI)

Show HN: Dockside – I turned unused space around the macOS Dock into a workspace

Show HN: Pieces – Social network for people

Show HN: TaskPeace – a task queue my AI coding agents pull work from over MCP

Show HN: Hacker News but as Tweets

Show HN: Ultracodex – Run Claude Ultracode Dynamic Workflows with Codex Agents

Show HN: OM Core – multidimensional models without spreadsheet cell formulas

Show HN: zkGolf – Competitive optimization of formally verified circuits

Show HN: I got tired of messy PDF bank statements, so I built a RAM-only parser

Show HN: AI latent space with overlapping manifolds

Show HN: Inkwell – An RSS reader for e-ink devices

Show HN: SigRank – Competitive Stat Screen and Operator Performance Evals O7

Show HN: Finding better quantum error correction codes using ILP

Show HN: I made a tool that prevents websites from tracking you

Show HN: CLI tool for detecting non-exact code duplication with embedding models

Show HN: Gitstock–Transform you GitHub commit history into K-line and animations

Show HN: An assertion library for E2E testing and real user monitoring

Show HN: A graph paper generator that renders vector PDFs in the browser

Show HN: I measured the half-life of 41,301 Show HN launches. It's 7 hours

Show HN: Claudoro, Pomodoro timer embedded in the Claude Code statusline

Show HN: I built an open-source alternative to Claude Cowork

Show HN: CLI that helps AI agents avoid vulnerable dependencies

Show HN: Bramble – Local-first password manager

Show HN: ZeroFS – A log-structured filesystem for S3

Show HN: I built a declarative layout engine for SVG, Canvas, WebGL

Show HN: Imagent – agentic image/video/speech generation

Show HN: GeoSpoof – your VPN hides your IP, but the browser leaks your location

Show HN: GolemUI – Declarative Form Engine

Show HN: Ordered dithering command-line tool

Show HN: ctx – Search the coding agent history already on your machine

Show HN: Mcpsnoop – Wireshark for MCP (transparent proxy and live TUI)

Show HN: Dockside – I turned unused space around the macOS Dock into a workspace

Show HN: Pieces – Social network for people

Show HN: TaskPeace – a task queue my AI coding agents pull work from over MCP

Show HN: Hacker News but as Tweets

Show HN: Ultracodex – Run Claude Ultracode Dynamic Workflows with Codex Agents

Show HN: OM Core – multidimensional models without spreadsheet cell formulas

Show HN: zkGolf – Competitive optimization of formally verified circuits

Show HN: I got tired of messy PDF bank statements, so I built a RAM-only parser

Show HN: AI latent space with overlapping manifolds

Show HN: Inkwell – An RSS reader for e-ink devices

Show HN: SigRank – Competitive Stat Screen and Operator Performance Evals O7

Show HN: Finding better quantum error correction codes using ILP

Show HN: I made a tool that prevents websites from tracking you

Show HN: CLI tool for detecting non-exact code duplication with embedding models

Show HN: Gitstock–Transform you GitHub commit history into K-line and animations

Show HN: An assertion library for E2E testing and real user monitoring

Show HN: A graph paper generator that renders vector PDFs in the browser

Show HN: I measured the half-life of 41,301 Show HN launches. It's 7 hours

Show HN: Claudoro, Pomodoro timer embedded in the Claude Code statusline

Show HN: I built an open-source alternative to Claude Cowork

Show HN: CLI that helps AI agents avoid vulnerable dependencies

Show HN: Bramble – Local-first password manager

Show HN: ZeroFS – A log-structured filesystem for S3

Show HN: I built a declarative layout engine for SVG, Canvas, WebGL

Show HN: Imagent – agentic image/video/speech generation

Show HN: GeoSpoof – your VPN hides your IP, but the browser leaks your location

Show HN: GolemUI – Declarative Form Engine

Show HN: Ordered dithering command-line tool

Show HN: ZeroFS – A log-structured filesystem for S3

Comments