frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Yt-dlp: Upcoming new requirements for YouTube downloads

https://github.com/yt-dlp/yt-dlp/issues/14404
282•phewlink•2h ago•141 comments

US Airlines Push to Strip Away Travelers' Rights by Rolling Back Key Protections

https://www.travelandtourworld.com/news/article/american-joins-delta-southwest-united-and-other-u...
242•duxup•2h ago•217 comments

That Secret Service SIM farm story is bogus

https://cybersect.substack.com/p/that-secret-service-sim-farm-story
524•sixhobbits•6h ago•261 comments

Just Let Me Select Text

https://aartaka.me/select-text.html
30•ayoisaiah•39m ago•20 comments

EU age verification app not planning desktop support

https://github.com/eu-digital-identity-wallet/av-doc-technical-specification/issues/22
161•sschueller•2h ago•99 comments

Learning Persian with Anki, ChatGPT and YouTube

https://cjauvin.github.io/posts/learning-persian/
38•cjauvin•1h ago•12 comments

How to Lead in a Room Full of Experts

https://idiallo.com/blog/how-to-lead-in-a-room-full-of-experts
27•jnord•1h ago•2 comments

My Ed(1) Toolbox

https://aartaka.me/my-ed.html
30•mooreds•2h ago•10 comments

Rights groups urge UK PM Starmer to abandon plans for mandatory digital ID

https://bigbrotherwatch.org.uk/press-releases/rights-groups-urge-starmer-to-abandon-plans-for-man...
86•Improvement•2h ago•64 comments

S3 scales to petabytes a second on top of slow HDDs

https://bigdata.2minutestreaming.com/p/how-aws-s3-scales-with-tens-of-millions-of-hard-drives
86•todsacerdoti•4h ago•27 comments

Preparing for the .NET 10 GC

https://maoni0.medium.com/preparing-for-the-net-10-gc-88718b261ef2
34•benaadams•3h ago•27 comments

Huntington's disease treated for first time

https://www.bbc.com/news/articles/cevz13xkxpro
139•_zie•2h ago•44 comments

My game's server is blocked in Spain whenever there's a football match on

https://old.reddit.com/r/gamedev/comments/1np6kyn/my_games_server_is_blocked_in_spain_whenever/
236•greazy•4h ago•110 comments

Exploring GrapheneOS secure allocator: Hardened Malloc

https://www.synacktiv.com/en/publications/exploring-grapheneos-secure-allocator-hardened-malloc
37•r4um•4h ago•0 comments

I Spent Three Nights Solving Listen Labs Berghain Challenge (and Got #16)

https://kuber.studio/blog/Projects/How-I-Spent-Three-Nights-Solving-Listen-Labs-Berghain-Challenge
28•kuberwastaken•3d ago•8 comments

Everyone's trying vectors and graphs for AI memory. We went back to SQL

20•Arindam1729•2d ago•17 comments

Find SF parking cops

https://walzr.com/sf-parking/
776•alazsengul•20h ago•417 comments

Baldur's Gate 3 Steam Deck – Native Version

https://larian.com/support/faqs/steam-deck-native-version_121
541•_JamesA_•14h ago•378 comments

Deep researcher with test-time diffusion

https://research.google/blog/deep-researcher-with-test-time-diffusion/
62•simonpure•3d ago•10 comments

WiGLE: Wireless Network Mapping

https://wigle.net/index
15•dp-hackernews•2h ago•2 comments

Libghostty is coming

https://mitchellh.com/writing/libghostty-is-coming
764•kingori•1d ago•239 comments

Qwen3-VL

https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancement...
396•natrys•17h ago•130 comments

How Neural Super Sampling Works: Architecture, Training, and Inference

https://semiengineering.com/how-neural-super-sampling-works-architecture-training-and-inference/
14•PaulHoule•3d ago•0 comments

Markov chains are the original language models

https://elijahpotter.dev/articles/markov_chains_are_the_original_language_models
423•chilipepperhott•4d ago•149 comments

Getting AI to work in complex codebases

https://github.com/humanlayer/advanced-context-engineering-for-coding-agents/blob/main/ace-fca.md
431•dhorthy•1d ago•360 comments

Top Programming Languages 2025

https://spectrum.ieee.org/top-programming-languages-2025
223•jnord•14h ago•345 comments

From Rust to reality: The hidden journey of fetch_max

https://questdb.com/blog/rust-fetch-max-compiler-journey/
236•bluestreak•17h ago•50 comments

Podman Desktop celebrates 3M downloads

https://podman-desktop.io/blog/3-million
206•twelvenmonkeys•17h ago•60 comments

A webshell and a normal file that have the same MD5

https://github.com/phith0n/collision-webshell
81•shlomo_z•3d ago•39 comments

New study shows plants and animals emit a visible light that expires at death

https://pubs.acs.org/doi/10.1021/acs.jpclett.4c03546
150•ivewonyoung•11h ago•122 comments
Open in hackernews

S3 scales to petabytes a second on top of slow HDDs

https://bigdata.2minutestreaming.com/p/how-aws-s3-scales-with-tens-of-millions-of-hard-drives
86•todsacerdoti•4h ago

Comments

EwanToo•2h ago
I think a more interesting article on S3 is "Building and operating a pretty big storage system called S3"

https://www.allthingsdistributed.com/2023/07/building-and-op...

giancarlostoro•2h ago
Really nice read, thank you for that.
enether•1h ago
Author of the 2minutestreaming blog here. Good point! I'll add this as a reference at the end. I loved that piece. My goal was to be more concise and focus on the HDD aspect
littlesnugblood•17m ago
Andy Warfield is a narcissistic asshole. I speak from experience.
gostsamo•11m ago
Can you share some anecdotes?
crabique•2h ago
Is there an open source service designed with HDDs in mind that achieves similar performance? I know none of the big ones work that well with HDDs: MinIO, Swift, Ceph+RadosGW, SeaweedFS; they all suggest flash-only deployments.

Recently I've been looking into Garage and liking the idea of it, but it seems to have a very different design (no EC).

giancarlostoro•1h ago
Doing some light googling aside from Ceph being listed, there's one called Gluster as well. Hypes itself as "using common off-the-shelf hardware you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks."

It's open source / free to boot. I have no direct experience with it myself however.

https://www.gluster.org/

a012•56m ago
I’ve used GlusterFS before because I was having tens of old PCs and it worked for me very well. It’s basically a PoC to see how it work than production though
epistasis•9m ago
A decade ago where I worked we used gluster for ~200TB of shared file system on a SLURM compute cluster, as a much better clustered version of NFS. And we used ceph for its S3 interface (RadowGW) for tens of petabytes of back storage after the high IO stages of compute were finished.

We would occasionally try cephFS, the POSIX shared network filesystem, but it couldn't match our gluster performance for our workload. But also, we built the ceph long term storage to maximize TB/$, so it was at a disadvantage compared to our gluster install. Still, I never heard of cephFS being used anywhere despite it being the original goal in the papers back at UCSC. Keep an eye on CERN for news about one of the bigger ceph installs with public info.

I love both of the systems, and am glad to see that gluster is still around.

elitepleb•1h ago
Any of them will work just as well, but only with many datacenters worth of drives, which very few deployments can target.

It's the classic horizontal/vertical scaling trade off, that's why flash tends to be more space/cost efficient for speedy access.

bayindirh•1h ago
Lustre and ZFS can do similar speeds.

However, if you need high IOPS, you need flash on MDS for Lustre and some Log SSDs (esp. dedicated write and read ones) for ZFS.

crabique•1h ago
Thanks, but I forgot to specify that I'm interested in S3-compatible servers only.

Basically, I have a single big server with 80 high-capacity HDDs and 4 high-endurance NVMes, and it's the S3 endpoint that gets a lot of writes.

So yes, for now my best candidate is ZFS + Garage, this way I can get away with using replica=1 and rely on ZFS RAIDz for data safety, and the NVMEs can get sliced and diced to act as the fast metadata store for Garage, the "special" device/small records store for the ZFS, the ZIL/SLOG device and so on.

Currently it's a bit of a Frankenstein's monster: using XFS+OpenCAS as the backing storage for an old version of MinIO (containerized to run as 5 instances), I'm looking to replace it with a simpler design and hopefully get a better performance.

foobarian•1h ago
Do you know if some of these systems have components to periodically checksum the data at rest?
bayindirh•43m ago
ZFS/OpenZFS can do scrub and do block-level recovery. I'm not sure about Lustre, but since Petabyte sized storage is its natural habitat, there should be at least one way to handle that.
bayindirh•41m ago
It might not be the most ideal solution, but did you consider installing TrueNAS on that thing?

TrueNAS can handle the OpenZFS (zRAID, Caches and Logs) part and you can deploy Garage or any other S3 gateway on top of it.

It can be an interesting experiment, and 80 disk server is not too big for a TrueNAS installation.

creiht•20m ago
It is probably worth noting that most of the listed storage systems (including S3) are designed to scale not only in hard drives, but horizontally across many servers in a distributed system. They really are not optimized for a single storage node use case. There are also other things to consider that can limit performance, like what does the storage back plane look like for those 80 HDDs, and how much throughput can you effectively push through that. Then there is the network connectivity that will also be a limiting factor.
olavgg•50m ago
SeaweedFS has evolved a lot the last few years, with RDMA support and EC.
nerdjon•1h ago
So is any of S3 powered by SSD's?

I honestly figured that it must be powered by SSD for the standard tier and the slower tiers were the ones using HDD or slower systems.

MDGeist•1h ago
I always assumed the really slow tiers were tape.
hobs•14m ago
Not even the higher tiers of Glacier were tape afaict (at least when it was first created), just the observation that hard drives are much bigger than you can reasonably access in useful time.
wg0•1h ago
Does anyone know what is the technology stack of S3? Monolith or multiple services?

I assume would have lots of queues, caches and long running workers.

jyscao•1h ago
> conway’s law and how it shapes S3’s architecture (consisting of 300+ microservices)
Twirrim•1h ago
Amazon biases towards Systems Oriented Architecture approach that is in the middle ground between monolith and microservices.

Biasing away from lots of small services in favour of larger ones that handle more of the work so that as much as possible you avoid the costs and latency of preparing, transmitting, receiving and processing requests.

I know S3 has changed since I was there nearly a decade ago, so this is outdated. Off the top of my head it used to be about a dozen main services at that time. A request to put an object would only touch a couple of services en route to disk, and similar on retrieval. There were a few services that handled fixity and data durability operations, the software on the storage servers themselves, and then stuff that maintained the mapping between object and storage.

hnexamazon•39m ago
I was an SDE on the S3 Index team 10 years ago, but I doubt much of the core stack has changed.

S3 is comprised primarily of layers of Java-based web services. The hot path (object get / put / list) are all served by synchronous API servers - no queues or workers. It is the best example of how many transactions per second a pretty standard Java web service stack can handle that I’ve seen in my career.

For a get call, you first hit a fleet of front-end HTTP API servers behind a set of load balancers. Partitioning is based on the key name prefixes, although I hear they’ve done work to decouple that recently. Your request is then sent to the Indexing fleet to find the mapping of your key name to an internal storage id. This is returned to the front end layer, which then calls the storage layer with the id to get the actual bits. It is a very straightforward multi-layer distributed system design for serving synchronous API responses at massive scale.

The only novel bit is all the backend communication uses a home-grown stripped-down HTTP variant, called STUMPY if I recall. It was a dumb idea to not just use HTTP but the service is ancient and originally built back when principal engineers were allowed to YOLO their own frameworks and protocols so now they are stuck with it. They might have done the massive lift to replace STUMPY with HTTP since my time.

js4ever•23m ago
"It is the best example of how many transactions per second a pretty standard Java web service stack can handle that I’ve seen in my career."

can you give some numbers? or at least ballpark?

hnexamazon•11m ago
Tens of thousands of TPS per node.
dgllghr•55m ago
I enjoyed this article but I think the answer to the headline is obvious: parallelism