Taking on CUDA with ROCm: 'One Step After Another'

https://www.eetimes.com/taking-on-cuda-with-rocm-one-step-after-another/

35•mindcrime•2h ago

Comments

blovescoffee•1h ago

Naive question, could agents help speed up building code for ROCm parity with CUDA? Outside of code, what are the bottlenecks for reaching parity?

jiggawatts•1h ago

Lack of focus from AMD management. See the sibling comment: https://news.ycombinator.com/item?id=47745611

They just don't care enough to compete.

WorldPeas•1h ago

to be honest, outside of fullstack and basic MCU stuff, these agents aren't very good. Whenever a sufficiently interesting new model comes out I test it on a couple problems for android app development and OS porting for novel cpu targets and we still haven't gotten there yet. I'd be happy to see a day where it was possible however

superkuh•1h ago

AMD hasn't signaled in behavior or words that they're going to actually support ROCm on $specificdevice for more than 4-5 years after release. Sometimes it's as little as the high 3.x years for shrinks like the consumer AMD RX 580. And often the ROCm support for consumer devices isn't out until a year after release, further cutting into that window.

Meanwhile nvidia just dropped CUDA/driver support for 1xxx series cards from their most recent drivers this year.

For me ROCm's mayfly lifetime is a dealbreaker.

canpan•1h ago

I was thinking to get 2x r9700 for a home workstation (mostly inference). It is much cheaper than a similar nvidia build. But still not sure if good value or more trouble.

chao-•1h ago

Talking to friends who have fought more homelab battles than I ever will, my sense is that (1) AMD has done a better job with RDNA4 than the past generations, and (2) it seems very workload-dependent whether AMD consumer gear is "good value", "more trouble", or both at the same time.

Edit: I misread the "2x r9700" as "2 rx9700" which differs from the topic of this comment (about RNDA4 consumer SKUs). I'll keep my comment up, but anyone looking to get Radeon PRO cards can (should?) disregard.

KennyBlanken•42m ago

Given RDNA3 was a pathetic joke, it wouldn't be hard for them to do a better job.

cyberax•1h ago

I have this setup, with 2x 32Gb cards. It's perfect for my needs, and cheaper than anything comparable from NV.

stephlow•57m ago

I own a single R9700 for the same reason you mentioned, looking into getting a second one. Was a lot of fiddling to get working on arch but RDNA4 and ROCm have come a long way. Every once in a while arch package updates break things but that’s not exclusive to ROCm.

LLM’s run great on it, it’s happily running gemma4 31b at the moment and I’m quite impressed. For the amount of VRAM you get it’s hard to beat, apart from the Intel cards maybe. But the driver support doesn’t seem to be that great there either.

Had some trouble with running comfyui, but it’s not my main use case, so I did not spent a lot of time figuring that out yet

canpan•5m ago

Thanks for the answer. Brings my hope up. Looking in my local shops, I can get 3 cards for the price of one 5090.

May I ask, what kind of tok/s you are getting with the r9700? I assume you got it fully in vram?

hotstickyballs•1h ago

Driver support eats directly into driver development

lrvick•1h ago

ROCm is open source and TheRock is community maintained, and in a minute the first Linux distro will have native in-tree builds. It will be supported for the foreseeable future due to AMDs open development approach.

It is Nvidia that has the track record of closed drivers and insisting on doing all software dev without community improvements to expected results.

KennyBlanken•51m ago

> expected results

The defacto GPU compute platform? With the best featureset?

lrvick•43m ago

And the worst privacy, transparency, and FOSS integration due to their insistence on a heavily proprietary stack.

Also pretty hard to beat a Strix Halo right now in TPS for the money and power consumption.

Even that aside there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.

KennyBlanken•32m ago

> And the worst privacy, transparency, and FOSS integration due to their insistence on a heavily proprietary stack.

The market doesn't care about any of that. The consumer market doesn't care, and the commercial market definitely does not. The consumer market wants the most Fortnite frames per second per dollar. The commercial market cares about how much compute they can do per watt, per slot.

> there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.

The four percent share of the datacenter market and five percent of the desktop GPU market say (very strongly) otherwise.

I have a 100% AMD system in front of me so I'm hardly an NVIDIA fanboy, but you thinking you represent the market is pretty nuts.

lrvick•20m ago

I did not claim to represent the market as a whole, but I feel I likely represent a significant enough segment of it that AMD is going to be just fine.

I think local power efficient LLMs are going to make those datacenter numbers less relevant in the long run.

mindcrime•1h ago

Last year, AMD ran a GitHub poll for ROCm complaints and received more than 1,000 responses. Many were around supporting older hardware, which is today supported either by AMD or by the community, and one year on, all 1,000 complaints have been addressed, Elangovan said. AMD has a team going through GitHub complaints, but Elangovan continues to encourage developers to reach out on X where he’s always happy to listen.

Seems like they're making some effort in that direction at least. If you have specific concerns, maybe try hitting up Anush Elangovan on Twitter?

SwellJoe•32m ago

Is it really that short? This support matrix shows ROCm 7.2.1 supporting quite old generations of GPUs, going back at least five or six years. I consider longevity important, too, but if they're actively supporting stuff released in 2020 (CDNA), I can't fault them too much. With open drivers on Linux, where all the real AI work is happening, I feel like this is a better longevity story than nvidia...where you're dependent on nvidia for kernel drivers in addition to CUDA.

https://rocm.docs.amd.com/en/latest/compatibility/compatibil...

shmerl•1h ago

Side question, but why not advance something like Rust GPU instead as a general approach to GPU programming? https://github.com/Rust-GPU/rust-gpu/

From all the existing examples, it really looks the most interesting.

I.e. what I'm surprised about is lack of backing for it from someone like AMD. It doesn't have to immediately replace ROCm, but AMD would benefit from it advancing and replacing the likes of CUDA.

MobiusHorizons•1h ago

From the readme:

> Note: This project is still heavily in development and is at an early stage.

> Compiling and running simple shaders works, and a significant portion of the core library also compiles.

> However, many things aren't implemented yet. That means that while being technically usable, this project is not yet production-ready.

Also projects like rust gpu are built on top of projects like cuda and ROCm they aren’t alternatives they are abstractions overtop

shmerl•14m ago

I think Rust GPU is built on top of Vulkan + SPIR-V as their main foundation, not on top of CUDA or ROCm.

What I meant more is the language of writing GPU programs themselves, not necessarily the machinery right below it. Vulkan is good to advance for that.

I.e. CUDA and ROCm focus on C++ dialect as GPU language. Rust GPU does that with Rust and also relies on Vulkan without tying it to any specific GPU type.

HarHarVeryFunny•1h ago

If you don't want/need to program at lowest level possible, then Pytorch seems the obvious option for AMD support, or maybe Mojo. The Triton compiler would be another option for kernel writing.

shmerl•17m ago

I don't think that's something that can be pitched as a CUDA alternative. Just different level.

lrvick•1h ago

Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.

It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.

Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.

jauntywundrkind•34m ago

https://github.com/ROCm/TheRock/issues/3477 makes me quite sad for a variety of reasons. It shouldn't be like this. This work should be usable.

nixosbestos•32m ago

You know what fixes this?

I cannot get over how much of the software world is (1) fine with this from the user side, just suffering individually all the whole knowing everyone else is suffering the exact same way, and (2) fine with shipping basically unusable software and hoping users suffer through this shit.

lrvick•16m ago

Oh I fully abandoned TheRock in my stagex ROCm build stack. It is not worth salvaging, but it was an incredibly useful reference for me to rewrite it.

alecco•1h ago

Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.

I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.

ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh

caycep•1h ago

Granted, I feel like NVIDIA GPU pricing is such that Mac minis will be way less than 10x cheaper if not already, so one might still get ahead purchasing a bulk order of Mac minis....

KennyBlanken•54m ago

A 5090 will cost you about the same amount of money as a Mac Studio M3 Ultra with eight times the RAM.

It's pretty insane how overpriced NVIDIA hardware is.

corndoge•14m ago

But the 5090 can run Crysis

LoganDark•6m ago

Yes but the 5090 can run games.

Running games on my loaded M4 Max is worse than on my 3090 despite the over-four-year generational gap.

Like, Pacific Drive will reach maybe 30fps at less than 1080p whereas the 3090 will run it better even in 4K.

That could just be CrossOver's issue with Unreal Engine games, but "just play different games" is not a solution I like.

bsder•28m ago

> I really wish AMD and Intel boards get replaced by competent people.

Intel? Agreed. But AMD is making money hand over fist with enterprise AI stuff.

Right now, any effort that AMD or NVIDIA expend on the consumer sector is a waste of money that they could be spending making 10x more at the enterprise level on AI.

p1esk•19m ago

Someone from AMD posted this a few minutes ago, then deleted it:

"Anush's success is due to opting out of internal bureaucracy than anything else. most Claude use at AMD goes through internal infrastructure that can take hundreds of seconds per response due to throttling. Anush got us an exemption to use Anthropic directly. he is also exempt from normal policies on open source and so I can directly contribute to projects to add AMD support. He's an effective leader and has turned ROCm into a internal startup based in California. Definitely worth joining the team even if you've heard bad things about AMD as a whole."

This kind of bullshit is why I don't want to join AMD, even if this particular team is temporarily exempt from it.

hurricanepootis•8m ago

I've been using ROCm on my Radeon RX 6800 and my Ryzen AI 7 350 systems. I've only used it for GPU-accelerated rendering in Cycles, but I am glad that AMD has an option that isn't OpenCL now.

roenxi•7m ago

> Challenger AMD’s ability to take data center GPU share from market leader Nvidia will certainly depend on the success or failure of its AI software stack, ROCm.

I don't think this is true. ROCm is a huge advantage for Nvidia but as far as I can tell it is more a set of R&D libraries than anything else, so all the Hot New Stuff keeps being Nvidia first and only (to start with) as the library ecosystem for the hotness doesn't exist yet. Then eventually new libraries are created that are CUDA independent and AMD turns out to make pretty good graphics cards.

I wouldn't be surprised of ROCm withered on the vine and AMD still does fine.

Taking on CUDA with ROCm: 'One Step After Another'

Bring Back Idiomatic Design

DIY Soft Drinks

Ask HN: What Are You Working On? (April 2026)

Most people can't juggle one ball

A Perfectable Programming Language

I gave every train in New York an instrument

Google removes "Doki Doki Literature Club" from Google Play

Show HN: Oberon System 3 runs natively on Raspberry Pi 3 (with ready SD card)

The peril of laziness lost

Show HN: boringBar – a taskbar-style dock replacement for macOS

Uncharted island soon to appear on nautical charts

Tell HN: docker pull fails in spain due to football cloudflare block

Tech valuations are back to pre-AI boom levels

Anthropic downgraded cache TTL on March 6th

Seven countries now generate 100% of their electricity from renewable energy

Investigating How Long-Distance Couples Use Digital Games to Facilitate Intimacy

JVM Options Explorer

Show HN: Claudraband – Claude Code for the Power User

Happy Map

Phyphox – Physical Experiments Using a Smartphone

EasyPost (YC S13) Is Hiring

Mark's Magic Multiply

Exploiting the most prominent AI agent benchmarks

A Tour of Oodi

Cooperative Vectors Introduction

Doom, Played over Curl

European AI. A playbook to own it

Textbooks and Methods of Note-Taking in Early Modern Europe (2008)

The Closing of the Frontier

Taking on CUDA with ROCm: 'One Step After Another'

Bring Back Idiomatic Design

DIY Soft Drinks

Ask HN: What Are You Working On? (April 2026)

Most people can't juggle one ball

A Perfectable Programming Language

I gave every train in New York an instrument

Google removes "Doki Doki Literature Club" from Google Play

Show HN: Oberon System 3 runs natively on Raspberry Pi 3 (with ready SD card)

The peril of laziness lost

Show HN: boringBar – a taskbar-style dock replacement for macOS

Uncharted island soon to appear on nautical charts

Tell HN: docker pull fails in spain due to football cloudflare block

Tech valuations are back to pre-AI boom levels

Anthropic downgraded cache TTL on March 6th

Seven countries now generate 100% of their electricity from renewable energy

Investigating How Long-Distance Couples Use Digital Games to Facilitate Intimacy

JVM Options Explorer

Show HN: Claudraband – Claude Code for the Power User

Happy Map

Phyphox – Physical Experiments Using a Smartphone

EasyPost (YC S13) Is Hiring

Mark's Magic Multiply

Exploiting the most prominent AI agent benchmarks

A Tour of Oodi

Cooperative Vectors Introduction

Doom, Played over Curl

European AI. A playbook to own it

Textbooks and Methods of Note-Taking in Early Modern Europe (2008)

The Closing of the Frontier

Taking on CUDA with ROCm: 'One Step After Another'

Comments