Their MI300s already beat them, 400s coming soon.
Know any LLMs that are implemented in CUDA?
Show me one single CUDA kernel on Llama's source code.
(and that's a really easy one, if one knows a bit about it)
It is the same PyTorch whether it runs on an AMD or an NVIDIA GPU.
The exact same PyTorch, actually.
Are you're trying to suggest that the machine code that runs on the GPU is the one that is different?
If you knew a bit more, you would know that this is the case even between different generations of GPUs of the same vendor; making that argument completely absurd.
And here is pretty damning evidence that you're full of shit: https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/g...
The ggml-hip backend references the ggml-cuda kernels. The "software is the same" (as in, it is CUDA) and yet AMD is still behind.
Note: previous testing I did was on a single (8x) MI300X node, currently I'm doing testing on just a single MI300X GPU, so not quite apples-to-apples, multi-GPU/multi-node training is still a question mark, just a single data point.
OTOH, by emphasizing datacenter hardware, they can cover a relatively small portfolio and maximize access to it via cloud providers.
As much as I'd love to see an entry-level MI350-A workstation, that's not something that will likely happen.
And as of 2025, there is a Python CUDA JIT DSL as well.
Also, even if not the very latest version, the fact that CUDA SDK works on any consumer laptop with NVidia hardware, anyone can slowly get into CUDA, even if their hardware isn't that great.
They should also be dropping free AMD GPUs off helicopters, as Nvidia did a decade or so ago, in order to build up an academic userbase. Academia is getting totally squeezed by industry when it comes to AI compute. We're mostly running on hardware that's 2 or 3 generations out of date. If AMD came with a well supported GPU that cost half what an A100 sells for, voila you'd have cohort after cohort of grad students training models on AMD and then taking that know-how into industry.
For example OTOY OctaneRender, one of the key renders in Hollywood.
They have improved since that article, by a decent amount from my understanding. But by now, it isn't enough to have "a backend". The historical efforts have spoiled that narrative so badly that it won't be enough to just have a pytorch-rocm pypi package; some of that flak is unfair though not completely unsubstantiated. But frankly they need to deliver better software, across all their offerings, for multiple successive generations before the bad optics around their software stack will start fading. Their competitors are already on their next gen architecture since that article was written.
You are correct that people don't really invoke CUDA APIs much, but that's partially because those APIs actually work and deliver good performance, so things can actually be built on top of them.
Which SDKs do they offer that can do neural network inference and/or training? I'm just asking because I looked into this a while ago and felt a bit overwhelmed by the number of options. It feels like AMD is trying many things at the same time, and I’m not sure where they’re going with all of it.
Plus their consumer card support is questionable to say the least. I really wish it was a viable alternative, but swapping to CUDA really saved me some headaches and a ton or time.
Having to run MiOpen benchmarks for HIP can take forever.
NVIDIA has a moat for smaller systems, but that is not true for clusters.
As long as you have a team to work with the hardware you have, performance beats mindshare.
Nvidia of course has a shitload more money, and they've been doing this for longer, but that's just life.
> smaller systems
El Capitan is estimated to cost around $700 million or something with like 50k deployed MI300 GPUs. xAI's Colossus cluster alone is estimated to be north of $2 billion with over 100k GPUs, and that's one of ~dozens of deployed clusters Nvidia has developed in the past 5 years. AI is a vastly bigger market in every dimension, from profits to deployments.
halJordan•7mo ago
AMD deserves exactly zero of the credulity this writer heaps onto them. They just spent four months not supporting their rdna4 lineup in rocm after launch. AMD is functionally capable of day120 support. None of the benchmarks disambiguated where the performance is coming from. 100% they are lying on some level, representing their fp4 performance against fp 8/16.
pclmulqdq•7mo ago
caycep•7mo ago
fooblaster•7mo ago
fc417fc802•7mo ago
zombiwoof•7mo ago
tormeh•7mo ago
viewtransform•7mo ago
"25 complimentary GPU hours (approximately $50 US of credit for a single MI300X GPU instance), available for 10 days. If you need additional hours, we've made it easy to request additional credits."
archerx•7mo ago
zombiwoof•7mo ago
archerx•7mo ago
booder1•7mo ago
They should care about the availability of their hardware so large customers don't have to find and fix their bugs. Let consumers do that...
echelon•7mo ago
Makes it a little hard to develop for without consumer GPU support...
stingraycharles•7mo ago
cma•7mo ago
7speter•7mo ago
Heck I’ve been able to work through the early chapters of the FastAI book using a lowly Quadro p1000
wmf•7mo ago
selectodude•7mo ago
jiggawatts•7mo ago
stingraycharles•7mo ago
This is why it’s so important AMD gets their act together quickly, as the benefits of these kind of things are measured in years, not months.
danielheath•7mo ago
moffkalast•7mo ago
lhl•7mo ago
Flash Attention: academia, 2y behind for AMD support
bitsandbytes: academia, 2y behind for AMD support
Marlin: academia, no AMD support
FlashInfer: acadedmia/startup, no AMD
ThunderKittens: academia, no AMD support
DeepGEMM, DeepEP, FlashMLA: ofc, nothing from China supports AMD
Without the long tail AMD will continue to always be in a position where they have to scramble to try to add second tier support years later themselves, while Nvidia continues to get all the latest and greatest for free.
This is just off the top of my head on the LLM side where I'm focused on, btw. Whenever I look at image/video it's even more grim.
jimmySixDOF•7mo ago
pjmlp•7mo ago
littlestymaar•7mo ago
I mean of they want to stay at a fraction of the market value and profit of their direct competitor, good for them.
dummydummy1234•7mo ago
It's Nvidia, AMD, and maybe Intel.
shmerl•7mo ago
Different architectures was probably a big reason for the above issue.
fooker•7mo ago
pjmlp•7mo ago
jchw•7mo ago
It's baffling that AMD is the same company that makes both Ryzen and Radeon, but the year-to-date for Radeon has been very good, aside from the official ROCm support for RDNA4 taking far too long. I wouldn't get overly optimistic; even if AMD finally committed hard to ROCm and Radeon it doesn't mean they'll be able to compete effectively against NVIDIA, but the consumer showing wasn't so bad so far with the 9070 XT and FSR4, so I'm cautiously optimistic they've decided to try to miss some opportunities to miss opportunities. Let's see how long these promises last... Maybe longer than a Threadripper socket, if we're lucky :)
[1]: https://www.phoronix.com/news/AMD-ROCm-H2-2025
roenxi•7mo ago
I dunno; I suppose they can execute on server parts. But regardless, a good plan here is to let someone else go first and report back.
jchw•7mo ago
zombiwoof•7mo ago
AMD is a marketing company now
ethbr1•7mo ago
You mean Ryan Smith of late AnandTech fame?
https://www.anandtech.com/author/85/