Meta, OpenAI, Crusoe, and xAI recently announced large purchases of MI300 chips for inference.
MI400, which will be available next year, also looks to be at least on par with Nvidia's roadmap.
Most importantly, models are maturing, and this means less custom optimization is required.
I suspect we will (or already are?) at a point where 95%+ of GPUs are used for inference, not training.
They've been playing catch up after "the bad old days" when they had to let a bunch of people go to avoid going under but it looks like they are catching back up to speed. Now it's just a matter of giving all those new engineers a few years to get their software world in order.
The point is, someone might join AMD because they believe in the mission, not just for the paycheck. I followed that with: “It isn’t always about the money,” which is consistent with my original comment.
The real subtext is something I care deeply about: Nvidia is a monopoly. If AI is truly a transformative technology, we can’t rely on a single company for all the hardware and software. Viable alternatives are essential. I believe in this vision so strongly that I started a company to give developers access to enterprise grade AMD compute, back when no one was taking AMD seriously in AI. (Queue the HN troll saying that nobody still does.)
If the stock goes up while they’re there, great, that’s a bonus.
Every time I have tried this previously it has failed with some cryptic errors.
So from this very small test it has got way better recently.
*Did have problems enabling the WMMA extensions though. So not perfect yet.
BTW, this kind of dev experience does really matter. I'm sure it was possible to get working previously; but I didn't have the level of interest to make it work - even if it was somewhat trivial. Being able to compile out of the box makes a big difference. And AFIAK this new version is the first to properly support WSL2, which means I don't have to dual boot to even try and get it working. It's a big improvement.
For example, to this day installing MSVC doesn’t make a default sane compiler available in a terminal - you have to open their shortcut that sets up environment variables and you have to just know this is how MSVC works. Is this a user problem or Microsoft failing to follow same conventions ever other toolchain installer follows?
Supercharging the Local Data Share (LDS) that's shared by threads is really cool to hear about. 64 -> 160KB size. Writes into LDS go from 32B max to 128B, increasing throughout. Transposes, to help get the data in the right shape for its next use.
Really really curious to see what the UDNA unified next gen architectures look like, if they really stick to merging Compute and Radeon CDNA and RDNA, as promised. If consumers end up getting multi-die compute solutions that would be neat & also intimidatingly hard (lots of energy spent keeping bits in sync across cores/coherency). After Navi 4X ended up having its flagship cancelled way back now, been wondering. I sort of expect that this won't scale as nicely as Epyc being a bunch of Ryzen dies. https://wccftech.com/amd-enthusiast-radeon-rx-8000-gpus-alle...
BEWARE: I was running fully patched ubuntu 24 LTS and I needed to upgrade to ubuntu 24.10 and then ubuntu 25 before the drivers worked. Painful.
bee_rider•7mo ago
But, does AMD just own the whole HPC stack at this point? (Or would they, if the software was there?).
At least the individual nodes. What’s their equivalent to Infiniband?
phonon•7mo ago
https://www.tomshardware.com/networking/amd-deploys-its-firs...
https://semianalysis.com/2025/06/11/the-new-ai-networks-ultr...
OneDeuxTriSeiGo•7mo ago
https://ultraethernet.org/
jauntywundrkind•7mo ago
wmf•7mo ago
Now that Nvidia is removing FP64 I assume AMD will have 100% of the HPC market until Fujitsu Monaka comes out.
curt15•7mo ago
wmf•7mo ago
latchkey•7mo ago
Externally, it is 8x400G NICs, which is the limitation of PCIeV5 anyway.
We had a guy training SOTA models on 9 of our MI300x boxes just fine. Networking wasn't the slow bit.