S3 Files and the changing face of S3

https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html

104•werner•2h ago

Comments

themafia•1h ago

> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked.

That's one way to do it.

> When you create or modify files, changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT. Sync runs in both directions, so when other applications modify objects in the bucket, S3 Files automatically spots those modifications and reflects them in the filesystem view automatically.

That sounds about right given the above. I have trouble seeing this as something other than a giant "hack." I already don't enjoy projecting costs for new types of S3 access patterns and I feel like has the potential to double the complication I already experience here.

Maybe I'm too frugal, but I've been in the cloud for a decade now, and I've worked very hard to prevent any "surprise" bills from showing up. This seems like a great feature; if you don't care what your AWS bill is each month.

avereveard•1h ago

There is a staggering number of user doing this with extra steps using fsx for lustre, their life greatly simplified today (unless they use gpu direct storage I guess)

themafia•1h ago

Good point. There's a wide gulf between being able to design your workflow for S3 and trying to map an existing workflow to it.

DenisM•1h ago

TLDR: Eventually consistent file system view on top of s3 with read/write cache.

mgaunard•1h ago

Zero mention of s3fs which already did this for decades.

luke5441•1h ago

A more solid (especially when it comes to caching) solution would be appreciated.

I thought that would be their https://github.com/awslabs/mountpoint-s3 . But no mention about this one either.

S3 files does have the advantage of having a "shared" cache via EFS, but then that would probably also make the cache slower.

PunchyHamster•37m ago

I'd assume you can still have local cache in addition to that.

rowanG077•1h ago

I was thinking: "No way this has existed for decades". But the earliest I can find it existing is 2008. Strictly speaking not decades but much closer to it than I expected.

huntaub•59m ago

This is pretty different than s3fs. s3fs is a FUSE file system that is backed by S3.

This means that all of the non-atomic operations that you might want to do on S3 (including edits to the middle of files, renames, etc) are run on the machine running S3fs. As a result, if your machine crashes, it's not clear what's going to show up in your S3 bucket or if would corrupt things.

As a result, S3fs is also slow because it means that the next stop after your machine is S3, which isn't suitable for many file-based applications.

What AWS has built here is different, using EFS as the middle layer means that there's a safe, durable place for your file system operations to go while they're being assembled in object operations. It also means that the performance should be much better than s3fs (it's talking to ssds where data is 1ms away instead of hdds where data is 30ms away).

ChocolateGod•50m ago

You can also use something like JuiceFS to make using S3 as a shared filesystem more sane, but you're moving all the metadata to a shared database.

CrzyLngPwd•1h ago

If there is ever a post that needs a TLDR or an AI summary it is that one.

Sell the benefits.

I have around 9 TB in 21m files on S3. How does this change benefit me?

jz-amz•1h ago

Check out the "what's new": https://aws.amazon.com/about-aws/whats-new/2026/04/amazon-s3...

dijksterhuis•55m ago

not everything should or needs to be some article geared towards the audience's convenience, or selling something to the audience. pretty much all allthingsdistributed articles are long form articles covering highly technical systems and contain a decent whack of detail/context. in my mind, they veer closer to "computer scientist does blog posts" compared to "5 ways React can boost your page visits" listicles.

edited slightly ... i really need to turn 10 minute post delay back on.

nvartolomei•1h ago

> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT

Single PUT per file I assume?

LazyMans•1h ago

Based on docs, correct.

gonzalohm•1h ago

I cannot 100% confirm this, but I believe AWS insisted a lot in NOT using S3 as a file system. Why the change now?

LazyMans•1h ago

They found a way to make money on it by putting a cache in front of it. Less load for them, better performance for you. Maybe you save money, maybe you dont.

yandie•1h ago

It appears that they put an actual file system in front of S3 (AWS EFS basically) and then perform transparent syncing. The blog post discusses a lot of caveats (consistency, for example) or object namings (incosistencies are emitted as events to customers).

Having been a fan of S3 for such a long time, I'm really a fan of the design. It's a good compromise and kudos to whoever managed to push through the design.

PunchyHamster•56m ago

Because people will use it as filesystem regardless of the original intent because it is very convenient abstraction. So might as well do it in optimal and supported way I guess ?

jitl•26m ago

Because without significant engineering effort (see the blog post), the mismatch between object store semantics and file semantics mean you will probably Have A Bad Time. In much earlier eras of S3, there were also some implementation specifics like throughput limits based on key prefixes (that one vanished circa 2016) that made it even worse to use for hierarchical directory shapes.

gervwyk•57m ago

any recommendations for a lambda based sftp sever setup?

ovaistariq•49m ago

TLDR: EFS as a eventually consistent cache in front of S3.

PunchyHamster•44m ago

Eagerly awaiting on first blogpost where developers didn't read the eventually consistent part, lost the data and made some "genius" workaround with help of the LLM that got them in that spot in the first place

MontyCarloHall•38m ago

This is essentially S3FS using EFS (AWS's managed NFS service) as a cache layer for active data and small random accesses. Unfortunately, this also means that it comes with some of EFS's eye-watering pricing:

— All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker.

— Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

— Cache is charged at $0.30/GB/month. Even though everything is written to the cache (for consistency purposes), it seems like it's only used for persistent storage of small files (<128kB), so this shouldn't cost too much.

rdtsc•38m ago

Synchronization bits is what I was wondering about: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-fil...

> For example, suppose you edit /mnt/s3files/report.csv through the file system. Before S3 Files synchronizes your changes back to the S3 bucket, another application uploads a new version of report.csv directly to the S3 bucket. When S3 Files detects the conflict, it moves your version of report.csv to the lost and found directory and replaces it with the version from the S3 bucket.

> The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.

mbana•29m ago

Werner Vogels is awesome. I first discovered about his writing when I learnt about Dynamo DB.

koolba•28m ago

If you though locking semantics over NFS were wonky, just wait till we through a remote S3 backend in the mix!

nyc_pizzadev•22m ago

This is very close to its first official release: https://fiberfs.io/

Built in cache, CDN compatible, JSON metadata, concurrency safe and it targets all S3 compatible storage layers.

jitl•19m ago

I wish they offered some managed bridging to local NVMe storage. AWS NVMe is super fast compared to EBS, and EBS (node-exclusive access as block device) is faster than EFS (multi-node access). I imagine this can go fast if you put some kind of further-cache-to-NVMe FS on top, but a completely vertically integrated option would be much better.

mritchie712•15m ago

tldr: this caches your S3 data in EFS.

we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly.

minutesmith•9m ago

The pricing math is the actual story here. What AWS is doing is moving the "decision point" from "should I use S3?" to "what's my read/write ratio and cache hit rate?".

For most applications, this is actually a better outcome than the old model. It forces you to think about your actual access patterns instead of choosing a service based on name recognition.

The dangerous case is when teams deploy this without properly profiling. If you provision a large EFS cache assuming "everything will be cached efficiently" without validating that assumption, the bill surprise is real.

This is an architectural pattern that works great once, and then becomes a gotcha once. AWS's fault here isn't the product, it's the documentation — they need a pricing impact section that shows representative costs for different workload types, not just the raw rates.

Same lesson as many AWS products: the service works well when you've thought it through. It works badly when you haven't.

Goo That Does Things

Did Airbnb, Medium, Beats, and Flipboard Rip Off Their Logos? (2016)

List of programing languages that compile to Go

Fox to Integrate Kalshi Forecasts Across Fox News Media and Fox One Platforms

RSoC 2026: A new CPU scheduler for Redox OS

How to Generate Text in One Step

Amazon S3 Files

Appwrite 1.9.0: Self-hosting with MongoDB as the underlying database

Towards Post-Quantum Cryptography in TLS (2019)

Show HN: Open-source GDPR router for LLMs detects PII, forces EU-only inference

From Hierarchy to Intelligence

The Most Effective Weapon on the Modern Battlefield Is Concrete (2016)

Switzerland builds most powerful redox-flow battery

Earth Has Fallen. The Amiga Dungeon Crawler Is Back

Tabsdata: Like a Database for Dataflows

Mythos escapes a secure sandbox [in test]

Show HN: Mo – checks GitHub PRs against decisions approved in Slack

Llambada – make vibe-coded mini-apps and earn when paid users use them

Apollo 11's journey to the moon, annotated [video]

USD Purchasing Power in Real Time Since 2000

From Iran – an invitation to physicists to look at my paper about Alpha

Scientists invented a fake disease. AI told people it was real

What the heck is wrong with our AI overlords?

FundaAI: Building kids in Africa to engineers [video]

Geo-Strategy #1: Iran's Strategy Matrix [video]

Show HN: Mo – checks GitHub PRs against decisions approved in Slack

Interpretability Findings on Claude Mythos Preview

Sports bets on prediction markets ruled to be "swaps," exempt from state laws

GPT 5.4 in practice – Stinks?

Do the math: How much Trump's Iran war will cost you at the pump

S3 Files and the changing face of S3

Comments

Goo That Does Things

Did Airbnb, Medium, Beats, and Flipboard Rip Off Their Logos? (2016)

List of programing languages that compile to Go

Fox to Integrate Kalshi Forecasts Across Fox News Media and Fox One Platforms

RSoC 2026: A new CPU scheduler for Redox OS

How to Generate Text in One Step

Amazon S3 Files

Appwrite 1.9.0: Self-hosting with MongoDB as the underlying database

Towards Post-Quantum Cryptography in TLS (2019)

Show HN: Open-source GDPR router for LLMs detects PII, forces EU-only inference

From Hierarchy to Intelligence

The Most Effective Weapon on the Modern Battlefield Is Concrete (2016)

Switzerland builds most powerful redox-flow battery

Earth Has Fallen. The Amiga Dungeon Crawler Is Back

Tabsdata: Like a Database for Dataflows

Mythos escapes a secure sandbox [in test]

Show HN: Mo – checks GitHub PRs against decisions approved in Slack

Llambada – make vibe-coded mini-apps and earn when paid users use them

Apollo 11's journey to the moon, annotated [video]

USD Purchasing Power in Real Time Since 2000

From Iran – an invitation to physicists to look at my paper about Alpha

Scientists invented a fake disease. AI told people it was real

What the heck is wrong with our AI overlords?

FundaAI: Building kids in Africa to engineers [video]

Geo-Strategy #1: Iran's Strategy Matrix [video]

Show HN: Mo – checks GitHub PRs against decisions approved in Slack

Interpretability Findings on Claude Mythos Preview

Sports bets on prediction markets ruled to be "swaps," exempt from state laws

GPT 5.4 in practice – Stinks?

Do the math: How much Trump's Iran war will cost you at the pump