If I could pipe text content to my terminal with confidence, I would.
[0]: https://aws.amazon.com/s3/storage-classes/express-one-zone/
Even for any single query it will likely run on multiple nodes with distributed workers gathering and processing data from storage layer, that is whole idea behind MapReduce after all.
It will depend on your needs though, since some use cases won't want to trade off the scalability of S3's ability to serve arbitrary amounts of throughput.
You are limited anyway by the network capacity of the instance you are fetching the data from .
We built a S3 read-through cache service for s2.dev so that multiple clients could share a Foyer hybrid cache with key affinity, https://github.com/s2-streamstore/cachey
Being able to set an S3 client’s endpoint to proxy traffic straight through this would be quite useful.
It is also possible to stop requiring the header, but I think it would complicate the design around coalescing reads – the layer above foyer would have to track concurrent requests to the same object.
I wonder. Given how cheap are S3 GET requests you need a massive number of requests to make provisioning and maintaining the cache server cheaper than the alternative.
Materialize.com switched from explicit disk cache management to “just use swap” and saw substantial performance improvement. https://materialize.com/blog/scaling-beyond-memory/
To get good performance from this strategy your memory layout already needs to be optimized to pages boundaries etc to have good sympathy for underlying swap system, and you can explicitly force pages to stay in memory with a few syscalls.
Actually, there's already two. The other one: just read from disk (and let OS manage caches).
1. Leverage asynchronous capabilities: Foyer exposes async interfaces so that while waiting for IO and other operations, the worker can still perform other tasks, thereby increasing overall throughput. If swap is used, a page fault will cause synchronous waiting, blocking the worker thread and resulting in performance degradation.
2. Fine-grained control: As a dedicated cache system, foyer has a better understanding than a general proposed system like the operating system's page cache of which data should be cached and which should not. This is also why foyer has supported direct I/O to avoid duplication of abilities with the page cache since day one. Foyer can use its own strategies to know earlier when data should be cached or evicted.
> foyer draws inspiration from Facebook/CacheLib, a highly-regarded hybrid cache library written in C++, and ben-manes/caffeine, a popular Java caching library, among other projects.
https://github.com/chroma-core/chroma/blob/2cb5c00d2e97ef449...
For example, the application may decide that all files are read-only, until expired a few days later.
Not clear about write-cache. My guess is that you will want some sort of redundancy when caching writes, so this goes beyond a library and becomes a service. Unless the domain level can absolve you of this concern by having redundancy elsewhere in the system (eg feed data from a durable store and replay if you lost some s3 writes).
CacheLib requires entries to be copied to the CacheLib managed memory when it's inserted. It simplified some design trade-offs, but may affect the overall throughput when in-memory cache is involved more than nvm cache. FYI: https://cachelib.org/docs/Cache_Library_User_Guides/Write_da...
Foyer only requries entries to be serialized/deserialized when writing/reading from disk. The in-memory cache doesn't force a deep memory copy.
Eikon•4mo ago
The only quirk I’ve experienced is that in-memory and hybrid modes don’t share the same invalidation behavior. In hybrid mode, there’s no way to await a value being actually discarded after deletion, while in-memory mode shows immediate deletion.
[0] https://github.com/Barre/ZeroFS
oulipo2•4mo ago
Eikon•4mo ago
ofek•4mo ago
Eikon•4mo ago
dexterdog•4mo ago
Eikon•4mo ago
mmastrac•4mo ago
Some napkin math suggests this could be a few dollars a month to keep a few TB of precious data nearline.
Restore costs are pricy but hopefully this is something that's only hit in case of true disaster. Are there any techniques for reducing egress on restore?
Eikon•4mo ago