There’s a massive it depends, not all about basic performance.
Are you planning to publish CH benchmarks (TPC-C and TPC-H combined)? I'd expect Aurora to perform much worse on CH than on TPC-C/H. That's because Aurora pushes the WAL logs to replicated shared storage. Since you only need quorum on a write, you get a fast ack on the write (TPC-C). The way you've run TPC-H doesn't modify the data that much, so you also get baseline Postgres performance.
However, when you're pushing writes and you have a sequential scan over the data, then Aurora needs to reconcile the WAL writes, manage locks, etc. CH benchmark exercises that path and I'd expect it to notably slow down Aurora.
(Disclaimer: ex-Citus and current Ubicloud founder)
We have plans for publishing a CH benchmark results on a follow up blog post. However, we didn't want to do that for now to not put misleading results.
I've always held that if you want resilience, you just cannot rely on local storage. No matter how many times you've got data replicated locally, you're still at risk of the whole machine failing - best case falling off the network, worst case trashing all its disks in some weird failure state as the RAID firmware decides today is the day to Just Not. And while you might technically still be able to recover the data, you're still offline.
You just need your data to be off the machine already when that happens. Not to say that all access needs to go over the network - local caching ought to go a long way here - but the default should be to switch to another machine and recycle the failed one.
Relevant to the article, this is independent of the speed and reliability of the actual hardware. It was true in 2010, it's true now.
So were they using the GD instances? (With XXgd instances, local NVMe-based SSDs are physically connected to the host server..) or something else?
Beside that IMVHO the future of cloud, meaning the modern mainframe, is the cluster, or decentralized applications run from homes and sheds with p.v. and local storage, the desktop at the center. Because we can't live with such centralization, we can't evolve and we can't even be democracies with such model where information in in the hands of very few at such level of details. With FTTH, p.v., energy storage, IT development we could came back to the original interconnected desktop model sparing resources instead of consuming more.
fake-name•1d ago
Otherwise, continue as normal.
It turns out that local drives continue to be faster then remote drives. Who would have thought?