Ok, there's the first part, the Garlic bus, which gives the GPU its own access to the DRAM request controller, instead of going through the CPU's memory controller.
Since the GPU is mostly going to miss, it's great that it's not wasting energy trying to go to the CPU's cache. But it means if you do want to share memory now you need a whole other access path for the GPU to read from the CPU memory, even though it's literally the same RAM (but maybe different cache). So, add a new Onion link, that lets the GPU go through the crossbar, and get handled by the memory controller. And this one is slower.
Infinity Fabric seems conceptually so much easier, to keep things in sync. But the work to snoop the bus, to maintain coherency: it has to be pretty massive effort.
It's so so different a thing, but I wonder how AMD deal with coherency (or not?) on the 6 Memory Control Die (MCD) in the 6800xt GPU. Having separate chips whose job is to be cache and dram controller, that must need at least some understanding of who has what memory, that has to be wild.
One other comment, on:
> modern games struggle or won’t launch at all on Trinity, so I’ve selected a few older workloads
I wonder how many more games would run under Linux? Theres an absurd amount of work still going into the radeonsi driver. The driver just switched to the newer ACO compiler pipeline by default, last December, for example. That said, Trinity is (2012) using a (2010) TeraScale3 (gfx4). This is old! But the improvements have been ongoing, in a way commercial systems would unlikely to ever be; there's so many wins over such a long time; not compatibility but getting multi threaded driver support (2017) also comes to mind as a big leap! https://www.phoronix.com/news/RadeonSI-ACO-Default-Pre-GFX10 https://www.phoronix.com/news/RadeonSI-G3D-Threads https://www.google.com/search?q=site%3Aphoronix.com+radeonsi
I wonder how granular the breakdown/fallback modes are for running ; I suspect if there's an unsupportable feature somewhere in the graphics pipeline the whole pipeline will usually need to fallback to CPU rendering, but perhaps perhaps perhaps there's some ability to fill in some GPU features via CPU while running most of the pipeline on CPU (and not having the latency destroy everything, perhaps using that Onion link/cacheable host memory)?
With the company facing bankruptcy, I'd imagine that a small team hacking together the different GPU and CPU interconnects was cheaper and faster than designing a whole new interconnect and coherency then implementing and testing it everywhere.
Having separate, non-coherent memory is status quo for GPUs. Bringing the GPU onto the die means you've got to share the path to memory, but access patterns are different.
Designing for the typical case where the addresses used are distinct is totally reasonable, it's not wild at all. After that works, you can try to maie shared use faster, too, but from the article, that didn't really happen in this design; the features are there, but the bandwidth isn't.
> "AMD’s BIOS and Kernel Developer’s Guide (BKDG) indicates there’s a 4-bit read pointer for a sideband signal FIFO between the GMC and DCT, so the “Garlic” link may have a queue with up to 16 entries."
Should maybe swap DCT in for MCT (memory controller)?
bee_rider•7mo ago
hajile•7mo ago
AMD made a killer design with Athlon64 that should have taken over the entire industry and made them the largest hardware company on the planet. Instead, Intel leveraged their market position to make it economically infeasible for computer manufacturers to buy AMD chips.
AMD was out of money which limited options. Denard Scaling had just failed, but Moore's Law was still in effect and multithreading was hyped as the future of everything. This made a big argument for lots of smaller cores and the most area-efficient way to do this was sharing less-used resources resulting in AMD betting big on small-core CMT.
At the same time, AMD's ATI division was under pressure to make a new, flexible GPU design (that became GCN) and the cult of Nvidia (even knowingly shipping massive numbers of defective chips then having a worse GPU than GCN still wasn't enough to lose market dominance).
The interconnect was a lower-priority redesign, so they slapped a bandaid on it and pushed the redesign down the road.
ahartmetz•7mo ago
Today, Intel is still selling more CPUs than AMD in most market segments even though they are usually worse.
Zardoz84•7mo ago
From a proud ex user of a FX8370E
Tuna-Fish•7mo ago
The reason for AMD's resurgence right now is not that they have more cores, but that they have better cores. If they had even faster cores, and fewer of them per die, they'd be selling even better.
ahartmetz•7mo ago
My most important workload - compiling C++ - is atypically parallel, but even there, single-core is important, too.
reginald78•7mo ago
ahartmetz•7mo ago
ahartmetz•7mo ago
https://www.osnews.com/story/135785/bulldozer-amds-crash-mod...
mlinhares•7mo ago
Tuna-Fish•7mo ago
adgjlsfhk1•7mo ago
bcrl•7mo ago
AMD's Bulldozer was their attempt at a P4 style core: increase clocks at all costs. Again, aiming for ridiculous clock rates without considering the power cost was a mistake. However, some of the design techniques AMD came up with to hit higher clock speeds live on in today's Ryzen designs (just as ideas from the P4 live on in today's Intel CPUs).
Tabula made the same design mistake in their FPGA fabric: the thinking was that registers are cheap and memory blocks can be run at ridiculous clock speeds in order to share, multiplex and reuse transistors across LUTs. Great in theory if it wasn't for the ridiculous cost of power and the complexity of the software to make it work.
The power wall is real. Not every hardware design team makes the right choices early in a design when estimating power use and constraining the design space appropriately. The difference is that the tools used to estimate power consumption of a hardware design today are far better than they were 20 years ago as a direct consequence of these (and other) failures.
toast0•7mo ago
[1] https://chipsandcheese.com/p/bulldozer-amds-crash-modernizat...