I know it's a server but I'd be so ready to use all of that as RAM disk. Crazy amount at a crazy high speed. Even 1% would be enough just to play around with something.
This has been the basic pattern for ages, particularly with large C++ projects. C++ builds, specially with the introduction of multi-CPU and multi-core systems, turns builds into IO-bound workflows, specially during linking.
Creating RAM disks to speed up builds is one of the most basic and low effort strategies to improve build times, and I think it was the main driver for a few commercial RAM drive apps.
RAMsan line for example started in 2000 with 64GB DRAM-based SSD with up to 15 1Gbit FC interfaces, providing a shared SAN SSD for multiple hosts (very well utilized by some of the beefier cluster SQL databases like Oracle RAC) but the company itself has been providing high speed specialized DRAM-based SSDs since 1978
Last time I saw one was with a mainframe, which kind of makes sense if adding cheaper third party memory to the machine would void warranties or breach support contracts. People really depend on company support for those machines.
A fast scratch pad that can be shared between multiple machines can be ideal at times.
Still seems like a kludge - The One Right Way to do it would be to add that memory directly to a CPU addressable space rather than across a SCSI (or channel, or whatever) link. Might as well be added to the RAM in the storage server and let it manage the memory optimally (with hints from the host).
You are arguing hypotheticals, whereas for decades the world had to deal with practicals. I recommend you spend a few minutes looking into how to create RAM drives on, say, Windows, and think through how to achieve that when your build workstation has 8GB of RAM and you need a scratchpad memory of, say, 16GB of RAM.
Recommended reading: https://en.wikipedia.org/wiki/RAM_drive
These are only for when the OS and the machine itself can't deal with the extra memory and wouldn't know what to do with it, things you buy when you run out of sensible options (such as adding more memory to your machine and/or configuring a RAM disk).
A) this technique precedes the existence of Linux.
B) Linux is far from the most popular OS in use today.
C) some software development projects are developed and target non-Linux platforms (see Windows)
I assume the same would be true for any project that is configure-heavy.
Nowadays nvmes might indeed be able to get close - but we'd probably need to still span over multiple SSDs (reducing the cost savings), and the developers there are incredible sensitive to build times. If a 5 minute build suddenly takes 30 seconds more we have some unhappy developers.
Another reason is that it'd eat SSDs like candy. Current enterprise SSDs have something like a 10000 TBW rating, which we'd exceed in the first month. So we'd either get cheap consumer SSDs and replace them every few days, or enterprise SSDs and replace them every few months - or stick with the RAM setup, which over the live of the build system will be cheaper than constantly buying SSDs.
Wow. What’s your use case?
We actually did try with SSDs about 15 years ago, and had a lot of dead SSDs in a very short time. After that we went for estimating data written, it's cheaper. While SSD durability increased a lot since then everything else got faster as well - so we'd have SSDs last a bit longer now (back then it was a weekly thing), but still nowhere near where it'd be a sensible thing to do.
They sound incredibly spoiled. Where should I send my CV?
They indeed are quite spoiled - and that's not necessarily a good thing. Part of the issue is that our CI was good and fast enough that at some point a lot of the new hires never bothered to figure out how to build the code - so for quite a few the workflow is "commit to a branch, push it, wait for CI, repeat". And as they often just work on a single problem the "wait" is time lost for them, which leads to the unhappiness if we are too slow.
Running the numbers to verify: a read-write-mixed enterprise SSD will typically have 3 DWPD (drive writes per day), across it's 5 year warranty. At 2TB, that would be 10950 TBW, so that sort of checks out. If endurance was a concern, upgrading to a higher capacity would linearly increase the endurance. For example the Kioxia CD8P-V. https://americas.kioxia.com/en-us/business/ssd/data-center-s...
Finding it a bit hard to imagine build machines working that hard, but I could believe it!
I don't know where you're buying your NVMe drives, but mine usually respond within a hundred microseconds.
this kit? https://www.newegg.com/nemix-ram-1tb/p/1X5-003Z-01930
I also have M920Q 8500t, HP prodesk with 10500t, and a lenovo P520 -> these three are truly for home purposes.
IF i were to do the pricetracker machine again, i'd go much smaller, and get a jbod + and probably a P520.
So just those components would be just over $12k.
That's just from regular consumer shops, and includes 25% VAT. Without the VAT it's about $9800.
Problem for consumers is that a just about all the shops that sells such and you might get a deal from would be geared towards companies, and not interested in deal with consumers due to consumer protection laws.
I found a used server with 768 GB DDR4 and dual Intel Gold 6248 CPUs for $4200 including 25% VAT.
That's a complete 2U server, the CPUs are a bit weak but not too bad all in all.
That's 300GB/s slower than my old Mac Studio (M1 Ultra). Memory speeds in 2025 remain thouroughly unimpressive outside of high-end GPUs and fully integrated systems.
The M1 Ultra doesn't have 800GB/s because it's "integrated", it simply has 16 channels of DDR5-6400, which it could have whether it was soldered or not. And none of the more recent Apple chips have any more than that.
It's the GPUs that use integrated memory, i.e. GDDR or HBM. That actually gets you somewhere -- the RTX 5090 has 1.8TB/s with GDDR7, the MI300X has 5.3TB/s with HBM3. But that stuff is also more expensive which limits how much of it you get, e.g. the MI300X has 192GB of HBM3, whereas normal servers support 6TB per socket.
And it's the same problem with Apple even though there's no great reason for it to be. The 2019 Intel Xeon Mac Pro supported 1.5TB of RAM -- still in slots -- but the newer ones barely reach a third of that at the top end.
The M1 Ultra has LPDDR5, not DDR5. And the M1 Ultra was running its memory at 6400MT/s about two and a half years before any EPYC or Xeon parts supported that speed—due in part to the fact that the memory on a M1 Ultra is soldered down. And as far as I can tell, neither Intel nor AMD has shipped a CPU socket supporting 16 channels of DRAM; they're having enough trouble with 12 channels per socket often meaning you need the full width of a 19-inch rack for DIMM slots.
Existing servers typically have 12 channels per socket, but they also have two DIMMs per channel, so you could double the number of channels per socket without taking up any more space for slots. You could also use CAMM which takes up less space.
They don't currently use more than 12 channels per socket even though they could because that's enough to not be a constraint for most common workloads, more channels increase costs, and people with workloads that need more can get systems with more sockets. Apple only uses more because they're using the same memory for the GPU and that is often constrained by memory bandwidth.
Usually this comes at a pretty sizable hit to MHz available. For example STH notes that their Zen5 ASRock Rack EPYC4000D4U goes from DDR5-5600 down to DDR5-3600 with the second slot populated, a 35% drop in throughput. https://www.servethehome.com/amd-epyc-4005-grado-is-great-an...
(It's also because of servers being ultra-cautious again. The desktops say the same thing in the manual but then don't enforce it in the BIOS and people run two sticks per channel at the full speed all over the place.)
So they have been really optimising that IO die for latency.
NUMA is already workload sensitive, you need to benchmark your exact workload to know if it’s worth enabling or not, and this change is probably going to make it even less worthwhile. Sounds like you will need a workload that really pushes total memory bandwidth to make NUMA worthwhile.
flumpcakes•4mo ago
It says 16 cores per die with up 16 zen 5 dies per chip. For zen 5 it's 8 cores per die, 16 dies per chip giving a total of 128 cores.
For zen 5c it's 16 cores per die, 12 dies per chip giving a total of 192 cores.
Weirdly it's correct on the right side of the image.