High-performance read-through cache for object storage

https://github.com/s2-streamstore/cachey

82•pranay01•4mo ago

Comments

toomuchtodo•4mo ago

https://old.reddit.com/r/databasedevelopment/comments/1nh1go...

_1tan•4mo ago

Can someone explain when this would be good solution? We currently store loads of files in S3 and directly ingest them on demand in our Java app API pods. Seems interesting if we could speed up retrievals for sure.

thinkharderdev•4mo ago

The basic tradeoff is that you are paying an extra tax on all requests that are not served by the cache, so you something like this would help if you are reading the same data repeatedly. So, for example, a database built on object storage or something like that.

Havoc•4mo ago

If you can have the cache onsite then it'll likely benefit many things just by virtue of not going through a slow internet link

onethumb•4mo ago

This looks super interesting for single-AZ systems (which are useful, and have their place).

But I can't find anything to support the use case for highly available (multi-AZ), scalable, production infrastructure. Specifically, a unified and consistent cache across geos (AZs in the AWS case, since this seems to be targeted at S3).

Without it, you're increasing costs somewhere in your organization - cross-AZ networking costs, increased cache sizes in each AZ to be available, increased compute and cache coherency costs across AZs to ensure the caches are always in sync, etc etc.

Any insight from the authors on how they handle these issue on their production systems at scale?

jimbohn•4mo ago

Assuming the "Designed for caching immutable blobs", I guess the approach is to indeed increase the cache size in each AZ or eat the cross-AZ networking costs.

shikhar•4mo ago

Yes, that's how we are running it at s2.dev, auto-scaled per-AZ deployments. https://www.reddit.com/r/databasedevelopment/comments/1nh1go...

trueismywork•4mo ago

Not the author but. Its a user side read through cache, so no need for pre-emptive cache coherence as such. But there will be a performance penalty for fetching data under write contention irrespective of whether you have single az/multiple AZ. The only way to mitigate the performance penalty here is to have accurate predictive fetching which works for usage patterns.

immibis•4mo ago

Not the author, but my suggestion is to use a real infrastructure provider. You will save tons of money.

OutOfHere•4mo ago

Frankly, any web app I develop has configurable in-memory caching built in to it, so I would rather increase its size than add an extrinsic cache. By keeping my cache internal to my application, it's also easier for me to invalidate keys accurately.

perbu•4mo ago

It's about scalability. If you have 100 instances you really want them to share the cache so you increase hitrate and keep egress costs low.

OutOfHere•4mo ago

> If you have 100 instances you really want them to share the cache

I think that assumes decoupled compute and storage. If instead I couple compute and storage, I can shard the input, and then I won't share the cache across the instances. I don't think there is one approach that wins every time.

As for egress fees, that is an orthogonal concern.

rwmj•4mo ago

It'd be cool to put a simple NBD front end on it with an nbdkit plugin. That'd let you trivially turn the immutable objects into Linux devices or use them as backing for qemu virtual disks. (https://libguestfs.org/nbdkit-rust-plugin.3.html)

shikhar•4mo ago

Check out ZeroFS (https://www.zerofs.net), which is using SlateDB (https://slatedb.io/)

ED: Now I catch your drift, it would indeed be cool. ZeroFS requires a commitment to the SlateDB LSM data format.

mertleee•4mo ago

s2 is one of the coolest technologies that more people need to be talking about - I'm still begging them to move one layer lower -> turning s2 into an incredible middleware for edge IOT deployments!

PLEASE if someone from the team sees this - I would pay so much for a ephemeral object store using your same edge protocol (seen in the sensor example from your blog).

Cheers!

shikhar•4mo ago

Hi mertletee, I'd like to understand the request better, mind dropping me an email? It's in my profile

paulon•4mo ago

lets go!

A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

High-performance read-through cache for object storage

Comments