Sticking something with 2 second lifespan on disk to shoehorn it into aws serverless paradigm created problems and cost out of thin air here
Good solution moving at least partially to a in memory solution though
Regardless, I enjoyed the article and I appreciate that people are still finding ways to build systems tailored to their workflows.
Without cloud, saving a file is as simple as "with open(...) as f: f.write(data)" + adding a record to DB. And no weird network issues to debug.
Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...
Without on-prem, saving a file is as simple as s3.put_object() !
> Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...
Wow that's a lot to learn before using s3... I wonder how much it costs in salaries.
> With what network topology?
You don't need to care about this when using SSDs/HDDs.
> With what access policies?
Whichever is defined in your code, no restrictions unlike in S3. No need to study complicated AWS documentation and navigate through multiple consoles (this also costs you salaries by the way). No risk of leaking files due to misconfigured cloud services.
> With what backup strategy?
Automatically backed up with rest of your server data, no need to spend time on this.
You do need to care when you move beyond a single server in a closet that runs your database, webserver and storage.
> No risk of leaking files due to misconfigured cloud services.
One misconfigured .htaccess file for example, could result in leaking files.
> Save where? With what redundancy? With what access policies? With what backup strategy? With what network topology? With what storage equipment and file system and HVAC system and...
Most of these concerns can be addressed with ZFS[0] provided by FreeBSD systems hosted in triple-A data centers.
See also iSCSI[1].
Question: How do you save a small fortune in cloud savings?
Answer: First start with a large fortune.
I think you mean a small fraction of 3 engineers. And small fractions aren't that small.
The end of the article has this:
> Consider custom infrastructure when you have both: sufficient scale for meaningful cost savings, and specific constraints that enable a simple solution. The engineering effort to build and maintain your system must be less than the infrastructure costs it eliminates. In our case, specific requirements (ephemeral storage, loss tolerance, S3 fallback) let us build something simple enough that maintenance costs stay low. Without both factors, stick with managed services.
Seems they were well aware of the tradeoffs.
Also, just take an old phone from your drawer full of old phones, slap some free camera app on it, zip tie a car phone mount to the crib, and boom you have a free baby monitor.
We found that implementing proper data durability (3+ replicas, corruption detection, automatic repair) added ~40% overhead to our initial estimates. The engineering time spent building and maintaining custom tooling for multi-region replication, access controls, and monitoring ended up being substantial - about 1.5 FTE over 18 months.
For high-throughput workloads (>500 req/s), we actually saw better cost efficiency with S3 due to their economies of scale on bandwidth. The breakeven point seems to be around 100-200TB of relatively static data with predictable access patterns. Below that, the operational overhead of running your own storage likely exceeds S3's markup.
The key is to be really honest about your use case. Are you truly at scale? Do you have the engineering resources to build AND maintain this long-term? Sometimes paying the AWS premium is worth it for the operational simplicity.
That said the article seems to be more about an optimization of their pipeline to reduce the S3 usage by holding some objects in memory instead. That's very different than trying to build your own object store to replace S3.
Have you ever thought of using a postgresql db (also on aws) to store those files and use CDC to publish messages about those files to a kafka topic? In your original way, we need 3 aws services: s3, lambda and sqs. With this way, we need 2: postgresql and kafka. I'm not sure how well this method works though :-)
ch2026•2h ago
OsrsNeedsf2P•1h ago
[0] https://www.techradar.com/pro/security/the-south-korean-gove...
codedokode•54m ago
PartiallyTyped•40m ago
senectus1•42m ago